swectral.blend_samples#
- swectral.blend_samples(n_samples, is_regression, use_validation_group=True, abs_tol=None, rel_tol=None, random_state=None)[source]#
Generator for creating a sample blending process using convex combinations.
The generator returns a callable that accepts and returns a list of tuple with the same structure:
( sample_id : str, sample_label : str, validation_group : str, test_mask : np.int8, train_mask : np.int8, original_shape : tuple of int, target_value : Any, predictors : array-like of shape (n_features,) )
Synthetic predictors are computed as Dirichlet-weighted averages of an anchor sample and one or more valid neighbors. For regression, targets are blended using the same weights; for classification, the anchor target is retained.
Samples are generated either per validation group (restricted to groups with at least one non-lonely sample) or globally across the training pool.
It can be registered using
add_processor used withinbuild_pipelineswith:- ``input_data_level`` set to either ``7`` (``"spec1d"``) or ``8`` (``"assembly"``) - ``output_data_level`` set to ``8`` (``"assembly"``)
- Parameters:
- n_samples
int Total number of synthetic samples to generate. When
use_validation_group=True, this value is distributed approximately evenly across eligible validation groups.- is_regressionbool
If
True, regression mode is used and targets are blended numerically.If
False, classification mode is used and the synthetic target equals to the anchor target.- use_validation_groupbool,
optional If
True, synthetic samples are generated independently within each validation group, restricted to groups containing at least one anchor with valid neighbors.If
False, the full training pool is used globally.Default is
True.- abs_tol
floatorintorNone,optional Absolute tolerance for regression neighbor selection.
If
None, no absolute tolerance constraint is applied. Default isNone.- rel_tol
floatorNone,optional Relative tolerance for regression neighbor selection.
If
None, no relative tolerance constraint is applied. Default isNone.- random_state
intorNone,optional Seed used to initialize the NumPy random number generator for reproducibility.
If
None, a random seed is used. Default isNone.
- n_samples
- Returns:
CallableA pipeline-compatible blending process callable.
- Return type:
Examples
Incorporation into pipeline, for SpecPipe instance
pipe:>>> blend = blend_samples(n_samples=100, is_regression=False) >>> pipe.add_process(7, 8, 0, blend)