swectral.blend_samples#

swectral.blend_samples(n_samples, is_regression, use_validation_group=True, abs_tol=None, rel_tol=None, random_state=None)[source]#

Generator for creating a sample blending process using convex combinations.

The generator returns a callable that accepts and returns a list of tuple with the same structure:

(
    sample_id : str,
    sample_label : str,
    validation_group : str,
    test_mask : np.int8,
    train_mask : np.int8,
    original_shape : tuple of int,
    target_value : Any,
    predictors : array-like of shape (n_features,)
)

Synthetic predictors are computed as Dirichlet-weighted averages of an anchor sample and one or more valid neighbors. For regression, targets are blended using the same weights; for classification, the anchor target is retained.

Samples are generated either per validation group (restricted to groups with at least one non-lonely sample) or globally across the training pool.

It can be registered using add_process or used within build_pipelines with:

- ``input_data_level`` set to either ``7`` (``"spec1d"``) or ``8`` (``"assembly"``)
- ``output_data_level`` set to ``8`` (``"assembly"``)
Parameters:
n_samplesint

Total number of synthetic samples to generate. When use_validation_group=True, this value is distributed approximately evenly across eligible validation groups.

is_regressionbool

If True, regression mode is used and targets are blended numerically.

If False, classification mode is used and the synthetic target equals to the anchor target.

use_validation_groupbool, optional

If True, synthetic samples are generated independently within each validation group, restricted to groups containing at least one anchor with valid neighbors.

If False, the full training pool is used globally.

Default is True.

abs_tolfloat or int or None, optional

Absolute tolerance for regression neighbor selection.

If None, no absolute tolerance constraint is applied. Default is None.

rel_tolfloat or None, optional

Relative tolerance for regression neighbor selection.

If None, no relative tolerance constraint is applied. Default is None.

random_stateint or None, optional

Seed used to initialize the NumPy random number generator for reproducibility.

If None, a random seed is used. Default is None.

Returns:
Callable

A pipeline-compatible blending process callable.

Return type:

Callable

Examples

Incorporation into pipeline, for SpecPipe instance pipe:

>>> blend = blend_samples(n_samples=100, is_regression=False)
>>> pipe.add_process(7, 8, 0, blend)