swectral.SpecPipe#

class swectral.SpecPipe(spec_exp, space_wait_timeout=36000, reserve_free_pct=5.0)[source]#

Design and implement processing and modeling pipelines on spectral experiment datasets.

Attributes:

spec_expSpecExp: Instance of SpecExp configuring spectral experiment datasets. See SpecExp for details.
report_directorystr: Root directory where reports are stored. This value is automatically derived from the report_directory attribute of the provided spec_exp instance.
space_wait_timeoutint: Number of seconds to wait for disk space to become available before raising an error when the disk is full. Default is 36000 (10 hours).
reserve_free_pctfloat: Minimum percentage of free disk space required to proceed with processing. Default is 5.0 (5% of total storage capacity).
processlist of tuple: Added process items. Each tuple represents a process definition and contains:

process_id : str process_label : str input_data_level : str output_data_level : str application_sequence : int method : callable full_application_sequence : int alternative_number : int
process_stepslist of tuple of str: Processes of each pipeline step, each tuple represents a step. The processes are represented in process ID.
process_chainslist of tuple of str: Generated full-factorial processing chains, each tuple represents a processing chain. The processes are represented in process ID.
custom_chainslist of tuple of str: Customized subset of the full-factorial process_chains.
create_timestr: Creation date and time of this SpecPipe isntance.

Methods

`add_process`(input_data_level, ...[, ...])	Add a processing method with defined input/output data levels and application sequence to the pipeline.
`ls_process`([process_id, process_label, ...])	List process items based on filtering conditions.
`rm_process`([process_id, process_label, ...])	Remove process items based on filtering conditions.
`add_model`(model_method[, model_label, ...])	Add a model evaluation process to the processing pipeline.
`ls_model`([model_id, model_label, ...])	List added model evaluation processes based on filtering conditions.
`rm_model`([model_id, model_label, ...])	Remove added model evaluation processes from this SpecPipe instance based on filtering conditions.
`process_chains_to_df`([stage, print_label, ...])	List process chains.
`custom_chains_from_df`(process_chain_dataframe)	Customize processing chains and update chains using a chain dataframe.
`custom_chains_to_df`([stage, print_label, ...])	List customized process chains.
`ls_chains`([stage, print_label, return_label])	List process chains for the pipeline execution.
`save_pipe_config`([copy, save_spec_exp_config])	Save the current pipeline configuration files to the root of the report directory.
`load_pipe_config`([config_file_path])	Load SpecPipe configuration from a dill file.
`test_run`([test_modeling, return_result, ...])	Run the pipeline of all processing chains using simplified test data.
`preprocessing`([n_processor, resume, ...])	Run preprocessing steps of all processing chains on the entire dataset and output modeling-ready sample_list data to files.
`assembly`([n_processor, resume, dump_backup, ...])	Apply assembly process to introduce cross-sample interactions prior to modeling.
`model_evaluation`([n_processor, resume, ...])	Evaluate added models using processed sample data generated by all preprocessing chains.
`run`([result_directory, n_processor, ...])	Run entire pipelines of specified processes of this `SpecPipe` instance on provided `SpecExp` instance.
`report_summary`()	Retrieve summary of generated reports in the console.
`report_chains`()	Retrieve major model evaluation reports of every processing chain in the console.

See also

SpecExp

Examples

Create a SpecPipe instance using a prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)

__init__(spec_exp, space_wait_timeout=36000, reserve_free_pct=5.0)[source]#

Methods

`__init__`(spec_exp[, space_wait_timeout, ...])
`add_model`(model_method[, model_label, ...])	Add a model evaluation process to the processing pipeline.
`add_process`(input_data_level, ...[, ...])	Add a processing method with defined input/output data levels and application sequence to the pipeline.
`assembly`([n_processor, resume, dump_backup, ...])	Apply assembly process to introduce cross-sample interactions prior to modeling.
`build_pipeline`(step_methods)	Build pipelines by given structure and methods of each step.
`custom_chains_from_df`(process_chain_dataframe)	Customize processing chains and update chains using a chain dataframe.
`custom_chains_to_df`([stage, print_label, ...])	List customized process chains.
`load_config`([config_file_path])	Load SpecPipe configuration from a dill file.
`load_pipe_config`([config_file_path])	Load SpecPipe configuration from a dill file.
`ls_chains`([stage, print_label, return_label])	List process chains for the pipeline execution.
`ls_custom_chains`([stage, print_label, ...])	List customized process chains.
`ls_model`([model_id, model_label, ...])	List added model evaluation processes based on filtering conditions.
`ls_process`([process_id, process_label, ...])	List process items based on filtering conditions.
`ls_process_chains`([stage, print_label, ...])	List process chains.
`model_evaluation`([n_processor, resume, ...])	Evaluate added models using processed sample data generated by all preprocessing chains.
`preprocessing`([n_processor, resume, ...])	Run preprocessing steps of all processing chains on the entire dataset and output modeling-ready sample_list data to files.
`process_chains_to_df`([stage, print_label, ...])	List process chains.
`report_chains`()	Retrieve major model evaluation reports of every processing chain in the console.
`report_summary`()	Retrieve summary of generated reports in the console.
`rm_model`([model_id, model_label, ...])	Remove added model evaluation processes from this SpecPipe instance based on filtering conditions.
`rm_process`([process_id, process_label, ...])	Remove process items based on filtering conditions.
`run`([result_directory, n_processor, ...])	Run entire pipelines of specified processes of this `SpecPipe` instance on provided `SpecExp` instance.
`save_config`([copy, save_spec_exp_config])	Save the current pipeline configuration files to the root of the report directory.
`save_pipe_config`([copy, save_spec_exp_config])	Save the current pipeline configuration files to the root of the report directory.
`test_run`([test_modeling, return_result, ...])	Run the pipeline of all processing chains using simplified test data.
`update_spec_exp`(spec_exp)

Attributes

`create_time`
`custom_chains`
`process`
`process_chains`
`process_steps`
`report_directory`
`reserve_free_pct`
`space_wait_timeout`
`spec_exp`

property space_wait_timeout#

property reserve_free_pct#

property report_directory#

property spec_exp#

property process#

property process_steps#

property process_chains#

property custom_chains#

property create_time#

update_spec_exp(spec_exp)[source]#

Return type:: None

add_model(model_method, model_label='', input_data_level=None, test_error_raise=True, is_regression=None, validation_method='2-fold', unseen_threshold=0.0, x_shape=None, result_backup=False, data_split_config='default', validation_config='default', metrics_config='default', roc_plot_config='default', scatter_plot_config='default', residual_config='default', residual_plot_config='default', influence_analysis_config='default', save_application_model=True)[source]#

Add a model evaluation process to the processing pipeline.

The added model operates on 1D data (data level 7 / "spec1d") and producing model-level output (data level 9 / "model"). All models share a unified application sequence within the pipeline.

Parameters:

model_methodobject

Sklearn-style model object.

Regression models must implement fit and predict. Classification models must additionally implement predict_proba.

model_labelstr, optional

Custom label for the added model.

If empty string, a label is automatically generated.

input_data_levelint or str

Input data level for the process. Choose between:

7 or "spec1d"
If the callable is applied to 1D array-like sample spectra or flattened data, such as ROI spectral statistics.

8 or "assembly"
If the method is a model instance or secondary assembly function and applied following any custom assembly processes.

If None, the data level will be automatically determined according to the availablity of "assembly" process. Default is None.

See add_process for more details.

test_error_raisebool, optional

Whether to raise error when the model fails in validation using simplified mock data before added to the pipeline.

If True, an exception is raised, otherwise only a warning is issued. Default is True.

is_regressionbool, optional

Whether the model is a regression model.

If None, the model type is inferred from sample target values. Default is None.

validation_methodstr, optional

Validation strategy for model evaluation. Supported formats include:

"loo" for leave-one-out cross-validation
"k-fold" (e.g. "5-fold") for k-fold cross-validation
"m-n-split" (e.g. "70-30-split") for train-test split

Default is "2-fold".

unseen_thresholdfloat, optional

Classification-only parameter.

If the highest predicted class probability of a sample is below this threshold, the sample is assigned to an unknown class. Default is 0.0.

x_shapetuple of int, optional

Expected shape of independent variables for models requiring structured input. Default is None.

Currently ignored.

result_backupbool, optional

Whether to save timestamped backup copies of result files. Default is False.

data_split_configstr or dict, optional

Additional data splitting configuration.

If a dictionary of parameters is provided, it may include:

random_stateint
Random state for splitting and shuffling.

Default is "default", which uses the default data splitting behavior.

validation_configstr or dict, optional

Validation behavior configuration.

If a dictionary of parameters is provided, it may include:

unseen_thresholdfloat
If an unseen class for the training data exists, a test sample is predicted to the unseen class if the predicted probabilities of seen classes is lower than this threshold. Default is 0 (Only predict seen classes).

use_original_shapebool
Whether data shape is applied for the model. Currently no use. Default is False.

save_fold_modelbool
Whether models of the validation folds are saved to files. Default is True.

save_fold_databool
Whether models of the validation folds are saved to files. Default is True.

Default is "default", which uses the default validation behavior.

metrics_configstr or dict or None, optional

Metrics computation configuration.

If None, metric computation is skipped. Default is "default". Currently only "default" is supported.

roc_plot_configstr or dict or None, optional

Receiver Operating Characteristic (ROC) plotting configuration for classification models.

No use for regression models.

If None, ROC plot generation is skipped.

If a dictionary of parameters is provided, it may include:

plot_titlestr
title of the ROC plot. Default is ‘ROC Curve’.

title_sizeint or float
font size of the plot title. Default is 26.

title_padint or float or None
padding between the title and the plot. Default is None.

figure_sizetuple of 2 (float or int)
figure size as (width, height). Default is (8, 8).

plot_margintuple of 4 float
plot margins as (left, right, top, bottom). Default is (0.15, 0.95, 0.9, 0.13).

plot_line_widthint or float
line width of the ROC curve. Default is 3.

plot_line_alphafloat
alpha value of the ROC curve line. Default is 0.8.

diagnoline_widthint or float
line width of the diagonal reference line. Default is 3.

x_axis_limittuple of 2 (float or int) or None
x-axis limits as (min, max). Default is None.

x_axis_labelstr
label of the x-axis. Default is ‘False Positive Rate’.

x_axis_label_sizeint or float
font size of the x-axis label. Default is 26.

x_tick_sizeint or float
font size of x-axis tick labels. Default is 24.

x_tick_numberint
number of x-axis ticks. Default is 6.

y_axis_limittuple of 2 (float or int) or None
y-axis limits as (min, max). Default is None.

y_axis_labelstr
label of the y-axis. Default is ‘True Positive Rate’.

y_axis_label_sizeint or float
font size of the y-axis label. Default is 26.

y_tick_sizeint or float
font size of y-axis tick labels. Default is 24.

y_tick_numberint
number of y-axis ticks. Default is 6.

axis_line_size_leftint or float or None
line width of the left axis spine. Default is 1.5.

axis_line_size_rightint or float or None
line width of the right axis spine. Default is 1.5.

axis_line_size_topint or float or None
line width of the top axis spine. Default is 1.5.

axis_line_size_bottomint or float or None
line width of the bottom axis spine. Default is 1.5.

legendbool
whether to display the legend. Default is True.

legend_locationstr
legend location string accepted by matplotlib. Default is ‘lower right’.

legend_fontsizeint or float
font size of legend entries. Default is 20.

legend_titlestr
legend title text. Default is empty.

legend_title_fontsizeint or float
font size of the legend title. Default is 24.

background_gridbool
whether to show a background grid. Default is False.

show_plotbool
whether to display the plot interactively. Default is False.

Default is "default", which uses the default plotting behavior.

scatter_plot_configstr or dict or None, optional

Scatter plot configuration for regression models.

If None, scatter plot generation is skipped.

If a dictionary of parameters is provided, it may include:

plot_titlestr
plot title text. Default is ‘’.

title_sizeint or float
font size of the plot title. Default is 26.

title_padint or float or None
padding between the title and the plot. Default is None.

figure_sizetuple of 2 (float or int)
figure size in inches as (width, height). Default is (8, 8).

plot_margintuple of 4 float
plot margins as (left, right, top, bottom). Default is (0.2, 0.95, 0.95, 0.15).

plot_line_widthint or float
line width of plotted curves. Default is 3.

point_sizeint or float
size of plotted points. Default is 120.

point_colorstr
color of plotted points. Default is ‘firebrick’.

point_alphafloat
transparency of plotted points. Default is 0.7.

x_axis_limittuple of 2 (float or int) or None
limits of the x-axis. Default is None.

x_axis_labelstr
label of the x-axis. Default is ‘Predicted target values’.

x_axis_label_sizeint or float
font size of the x-axis label. Default is 26.

x_tick_valueslist of int or float or None
explicit tick values for the x-axis. Default is None.

x_tick_sizeint or float
font size of x-axis ticks. Default is 24.

x_tick_numberint
number of x-axis ticks. Default is 5.

y_axis_limittuple of 2 (float or int) or None
limits of the y-axis. Default is None.

y_axis_labelstr
label of the y-axis. Default is ‘Residuals’.

y_axis_label_sizeint or float
font size of the y-axis label. Default is 26.

y_tick_valueslist of int or float or None
explicit tick values for the y-axis. Default is None.

y_tick_sizeint or float
font size of y-axis ticks. Default is 24.

y_tick_numberint
number of y-axis ticks. Default is 5.

axis_line_size_leftint or float or None
line width of the left axis spine. Default is 1.0.

axis_line_size_rightint or float or None
line width of the right axis spine. Default is 1.5.

axis_line_size_topint or float or None
line width of the top axis spine. Default is 1.5.

axis_line_size_bottomint or float or None
line width of the bottom axis spine. Default is 1.5.

background_gridbool
whether to display background grid lines. Default is False.

show_plotbool
whether to display the plot immediately. Default is False.

Default is "default", which uses the default plotting behavior.

residual_configstr or dict or None, optional

Residual analysis configuration.

If None, residual analysis is skipped. Default is "default", which uses the default residual analysis behavior.

residual_plot_configstr or dict or None, optional

Residual plot configuration for regression models.

If None, residual plot generation is skipped.

If a dictionary of parameters is provided, the available parameters are same as scatter_plot_config.

Default is "default", which uses the default plotting behavior.

influence_analysis_configstr or dict or None, optional

Influence analysis configuration. When enabled, computes a Cook’s distance–like influence measure for each sample using a Leave-One-Out (LOO) approach.

If None, influence analysis is skipped.

Note: This computation can be very time-consuming for large datasets. For such cases, consider using a simple validation method or setting this option to None.

If a dictionary of parameters is provided, it may include:

validation_methodbool, optional
whether to use independent validation for leave-one-out influence analysis.

random_stateint or None, optional
random state for data splitting.

Default is "default", which uses the default influence analysis behavior.

save_application_modelbool, optional

Whether application model is trained on all data and stored in the chain report. Default is True.

Return type:

None

See also

add_process

Examples

Create a SpecPipe instance from an existing SpecExp object:

>>> pipe = SpecPipe(exp)

Add a model with a specified validation method:

>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> pipe.add_model(knn, validation_method="5-fold")

Use different validation strategies:

>>> pipe.add_model(knn, validation_method="60-40-split")
>>> pipe.add_model(knn, validation_method="loo")

ls_model(model_id=None, model_label=None, model_method=None, *, exact_match=True, print_result=True, return_result=False)[source]#

List added model evaluation processes based on filtering conditions.

If a filter criterion is not provided or None, the corresponding filter is not applied.

Parameters:

model_idstr, optional: Model evaluation process ID. Default is None.
model_labelstr, optional: Custom model label. Default is None.
model_methodstr or object, optional: Model object or method. Default is None.
exact_matchbool, optional: If False, any process with a property value containing the specified value is included. Default is True.
print_resultbool, optional: If True, simplified results are printed. Default is True.
return_resultbool, optional: If True, a complete resulting DataFrame is returned. Default is False.

Returns:

pandas.DataFrame or None

If return_result=True, returns a pandas DataFrame of matched model evaluation processes.

If return_result=False, returns None.

Return type:

Optional[DataFrame]

Examples

For a prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> pipe.add_model(knn, validation_method='2-fold')

List all models:

>>> pipe.ls_model()

Return model items as a DataFrame:

>>> df = pipe.ls_model(return_result=True)

Filter results by model label:

>>> pipe.ls_model(model_label='KNeighbor', exact_match=False)

rm_model(model_id=None, model_label=None, model_method=None, exact_match=True)[source]#

Remove added model evaluation processes from this SpecPipe instance based on filtering conditions.

If a filter criterion is not provided or None, the corresponding filter is not applied.

Parameters:

model_idstr, optional: Model evaluation process ID. Default is None.
model_labelstr, optional: Custom model label. Default is None.
model_methodstr or object, optional: Method object or model. Default is None.
exact_matchbool, optional: If False, any process with a property value containing the specified value is removed. Default is True.

Return type:

None

Examples

For a prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> pipe.add_model(knn, validation_method='2-fold')

Remove all models:

>>> pipe.rm_model()

Remove a specific model:

>>> pipe.rm_model(model_label='KNeighbor')

add_process(input_data_level, output_data_level, application_sequence, method, process_label='', *, test_error_raise=True, is_regression=None, validation_method='2-fold', unseen_threshold=0.0, x_shape=None, result_backup=False, data_split_config='default', validation_config='default', metrics_config='default', roc_plot_config='default', scatter_plot_config='default', residual_config='default', residual_plot_config='default', influence_analysis_config='default', save_application_model=True)[source]#

Add a processing method with defined input/output data levels and application sequence to the pipeline. A processing method can be a preprocessing function or a model for evaluation.

Parameters:

input_data_levelint or str

Input data level for the process. Available options:

0 or "image"
If the callable is applied to raster images. The corresponding callable must accept the input raster path as the first argument and the output path as the second argument.

1 or "pixel_spec"
If the callable is applied to 1D spectra of each pixel.

2 or "pixel_specs_array"
If the callable is applied to 2D numpy.ndarray of pixel spectra. Each row is a pixel spectrum.

3 or "pixel_specs_tensor"
If the callable is applied to 3D torch.tensor (Shape: (C, H, W)), with computation performed along axis 0.

4 or "pixel_hyperspecs_tensor"
If the callable is applied to 3D hyperspectral torch.tensor (Shape: (C, H, W)), with computation performed along axis 1.

5 or "image_roi"
If the callable is applied to a region of interest (ROI) within a raster image. The callable must receive the raster path and ROI coordinates provided by provided SpecExp instance.

6 or "roi_specs"
If the callable is applied to 2D numpy.ndarray of ROI spectra, each row is a pixel.

7 or "spec1d"
If the callable is applied to 1D array-like sample spectra or flattened data, such as ROI spectral statistics.

For assembly methods, the callable must instead accept a list of sample records. Each element must have the structure:
(
    sample_id,
    sample_label,
    validation_group,
    test_mask,
    train_mask,
    original_shape,
    target_value,
    predictors,
)
where:

sample_id : str

sample_label : str

validation_group : str

test_mask : numpy.int8

train_mask : numpy.int8

original_shape : tuple of int

target_value : Any

predictors : array-like of shape (n_features,)

For model methods, this 1D array-like data is automatically assembled across samples, and the model must accept a 2D numpy.ndarray of shape (n_samples, n_features).
8 or "assembly"
If the method is a model instance or secondary assembly function and applied following any custom assembly processes.

Note specific to this parameter:

Input data levels 0 through 4 share a single, common application_sequence scheme. These data levels do not maintain independent application sequence series, in contrast to input data levels 5, 6, and 7.

For example, a process defined with input data level 0 ("image") and application_sequence=0 and a process defined with input data level 2 ("pixel_specs_array") and application_sequence=0 are treated as parallel operations within the same image-processing step.

output_data_levelint or str

Output data level. Available options:

0 or "image"
If the callable returns raster image path.

1 or "pixel_spec"
Same as input “pixel_spec”.

2 or "pixel_specs_array"
Same as input “pixel_specs_array”.

3 or "pixel_specs_tensor"
Same as input “pixel_specs_tensor”.

4 or "pixel_hyperspecs_tensor"
Same as input “pixel_hyperspecs_tensor”.

5 or "image_roi"
Currently unavailable.

6 or "roi_specs"
If the callable returns 2D numpy.ndarray of ROI spectra.

7 or "spec1d"
If the callable returns 1D array-like spectral data.

8 or "assembly"
If the callable is a sample assembly function for global sample processing with cross-sample interactions before modeling.

The callable must return a list of sample records with structure and typing identical to its input, although values of tuple elements may differ.

See "spec1d" under input_data_level for the required record schema.

9 or "model"
Used for modeling, accepts "spec1d" or "assembly" input.

application_sequenceint

Sequence number of the method within the same input data level. Lower numbers execute first.

methodcallable() or object or list of (callable() or object)

Processing method, a processing function or sklearn-style estimator.

The callable or estimator must accept inputs and produce outputs that conform to the configured data levels.

If an estimator is provided, it must follow the sklearn estimator interface, implementing fit and predict, and predict_proba for classifiers.

A list of methods may be provided to specify multiple methods that share the same configuration.

process_labelstr or list of str, optional

Custom label(s) for the process.

If provided:

If a single method is provided, must be a single str of label.

If multiple methods are provided, must be a list of str with a length equal to the number of methods.

Default is an empty string, which automatically generates label(s) using the Callable name(s) or the estimator class name(s).

test_error_raisebool, optional

Whether to raise error when the process fails in validation using simplified mock data before added to the pipeline.

If True, an exception is raised, otherwise only a warning is issued. Default is True.

is_regressionbool, optional

Whether the model is a regression model. See add_model for details.

validation_methodstr, optional

Validation strategy for model evaluation. See add_model for details.

unseen_thresholdfloat, optional

Classification-only parameter. See add_model for details.

x_shapetuple of int, optional

Expected shape of independent variables for models requiring structured input. Currently ignored. See add_model for details.

result_backupbool, optional

Whether to save timestamped backup copies of result files. See add_model for details.

data_split_configstr or dict, optional

Additional data splitting configuration. See add_model for details.

validation_configstr or dict, optional

Validation behavior configuration. See add_model for details.

metrics_configstr or dict or None, optional

Metrics computation configuration. See add_model for details.

roc_plot_configstr or dict or None, optional

Receiver Operating Characteristic (ROC) plotting configuration for classification models. See add_model for details.

scatter_plot_configstr or dict or None, optional

Scatter plot configuration for regression models. See add_model for details.

residual_configstr or dict or None, optional

Residual analysis configuration. See add_model for details.

residual_plot_configstr or dict or None, optional

Residual plot configuration for regression models. See add_model for details.

influence_analysis_configstr or dict or None, optional

Influence analysis configuration. See add_model for details.

save_application_modelbool, optional

Influence analysis configuration. See add_model for details.

Return type:

None

See also

add_model
make_img_func
make_roi_func
make_array_func
pixel_apply

Examples

For prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)

Add an image processor accepting image path and returning processed path:

>>> pipe.add_process('image', 'image', 0, img_processor)

Or using numeric level indices:

>>> pipe.add_process(0, 0, 0, img_processor)

Customize method name:

>>> pipe.add_process(0, 0, 0, img_processor, process_label='img_proc')

Apply function to pixel spectra array:

>>> from swectral.functions import snv
>>> pipe.add_process('pixel_specs_array', 'pixel_specs_array', 0, snv)

GPU processing example:

>>> from swectral.functions import snv_hyper
>>> pipe.add_process(4, 4, 0, snv_hyper)

Denoiser on ROI spectra:

>>> from swectral.denoiser import LocalPolynomial
>>> pipe.add_process(6, 6, 0, LocalPolynomial(5, polynomial_order=2).savitzky_golay_filter)

Process 1D sample spectra:

>>> pipe.add_process(7, 7, 0, LocalPolynomial(5, polynomial_order=2).savitzky_golay_filter)

ls_process(process_id=None, process_label=None, input_data_level=None, output_data_level=None, application_sequence=None, method=None, full_application_sequence=None, *, exact_match=True, print_result=True, return_result=False)[source]#

List process items based on filtering conditions. If a filter criterion is None, the corresponding filter is not applied.

Parameters:

process_idstr, optional

Process ID. The default is None.

process_labelstr, optional

Custom process label. The default is None.

input_data_levelstr or int, optional

Input data level of the process.

See add_process for available options. The default is None.

output_data_levelstr or int, optional

Output data level of the process.

See add_process for available options. The default is None.

application_sequenceint or tuple of int, optional

Exact sequence number or a sequence number range within a data level.

Ranges must be specified as a tuple. The default is None.

methodstr or callable() or object, optional

Method function, method name, or method object. The default is None.

full_application_sequenceint or tuple of int, optional

Exact sequence number or a sequence number range within the entire pipeline. Ranges must be specified as a tuple. The default is None.

exact_matchbool, optional

If False, processes whose property values partially match the specified value are included. The default is True.

print_dfbool, optional

Whether to print simplified matched process items. The default is True.

return_dfbool, optional

Whether to return a dataframe of matched process items. The default is False.

Returns:

pandas.DataFrame or None

If return_df=True, returns a pandas DataFrames of matched process items.

If return_df=False, returns None.

Return type:

Optional[DataFrame]

See also

add_process

Examples

For prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)
>>> from swectral.functions import snv
>>> pipe.add_process(2, 2, 0, snv)

List all added processes:

>>> pipe.ls_process()

List processes by input data level:

>>> pipe.ls_process(input_data_level=2)

List processes by output data level:

>>> pipe.ls_process(output_data_level=2)

List processes by method:

>>> pipe.ls_process(method='snv')

List processes by partial method name:

>>> pipe.ls_process(method='nv', exact_match=False)

Return results instead of printing:

>>> df_process = pipe.ls_process(print_df=False, return_df=True)

rm_process(process_id=None, process_label=None, input_data_level=None, output_data_level=None, application_sequence=None, method=None, exact_match=True)[source]#

Remove process items based on filtering conditions. If a filter criterion is not provided, the corresponding filter is not applied.

Parameters:

process_idstr, optional: Process ID. The default is None.
process_labelstr, optional: Custom process label. The default is None.
input_data_levelstr or int, optional: Input data level of the process. See add_process for available options. The default is None.
output_data_levelstr or int, optional: Output data level of the process. See add_process for available options. The default is None.
application_sequenceint or tuple of int, optional: Exact sequence number or a sequence number range within a data level. Ranges must be specified as a tuple. The default is None.
methodstr or callable() or object, optional: Method function, method name, or method object. The default is None.
full_application_sequenceint or tuple of int, optional: Exact sequence number or a sequence number range within the entire pipeline. Ranges must be specified as a tuple. The default is None.
exact_matchbool, optional: If False, processes whose property values partially match the specified value are included. The default is True.
print_dfbool, optional: Whether to print simplified matched process items. The default is True.
return_dfbool, optional: Whether to return a dataframe of matched process items. The default is False.

Return type:

None

See also

add_process

Examples

For prepared SpecExp instance exp:

>>> pipe = SpecPipe(exp)
>>> from swectral.functions import snv
>>> pipe.add_process(2, 2, 0, snv)

Remove all added processes:

>>> pipe.rm_process()

Remove processes by input data level:

>>> pipe.rm_process(input_data_level=2)

Remove processes by output data level:

>>> pipe.rm_process(output_data_level=2)

Remove processes by method:

>>> pipe.rm_process(method='snv')

build_pipeline(step_methods)[source]#

Build pipelines by given structure and methods of each step.

This method constructs one or more processing pipelines directly from an explicit structural description. Each pipeline step is defined by its input/output data levels with one or more alternative callable(s) or objects responsible for processing at that step.

Parameters:

step_methodslist of ((str or int, str or int), callable() or object or list of (callable() or object) or dict of str to (callable() or object), None or dict of str to Any)

A list describing the pipeline structure and the processing logic for each step. Each element of the list has the form:

((input_data_level, output_data_level), methods, params)

where:

input_data_levelint or str
Input data level in number or name. See add_process for details.

output_data_levelint or str
Input data level in number or name. See add_process for details.

methodscallable or object or list or dict
A single callable or object defining one processing method. A list of callables or objects representing alternative methods for the step. A dictionary mapping method names to callables or objects, allowing multiple named methods.

paramsdict, optional
Optional dictionary of additional parameters applied to the methods at the step.

Return type:

None

See also

add_process
add_model

Examples

For an initialized SpecPipe instance pipe:

>>> from swectral import roi_mean
>>> from swectral.functions import snv, minmax, aucnorm

>>> pipe.build_pipeline(
...     [
...         ((2, 2), [snv, minmax, aucnorm]),
...         ((5, 7), roi_mean),
...         ((7, 8), {'RF': RandomForestRegressor(n_estimators=6), 'KNN': KNeighborsRegressor(n_neighbors=3)}, {'validation_method': '5-fold'})
...     ]
... )

ls_process_chains(stage=None, print_label=True, return_label=False)[source]#

List process chains. Returns the default full-factorial process chains.

Returns a dataframe where each row represents a processing chain with process IDs. For custom chains, use ls_custom_chains.

Parameters:

stagestr or None, optional

Processing stage, choose between:

- ``None``: list entire processing chains.
- ``preprocessing``: list unique preprocessing stage of the processing chains.
- ``assembly``: list unique assembly stage of the processing chains.
- ``model`` or ``modeling``: list unique preprocessing stage of the processing chains.

Default is None.

print_labelbool, optional

If True, prints chains using chain label. Default is True.

return_labelbool, optional

If True, returns an additional dataframe of process labels. Default is False.

Returns:

pandas.DataFrame or tuple of pandas.DataFrame or None

If return_label=False, returns a pandas.DataFrame of process chains in process IDs.

If return_label=True, returns a tuple of 2 pandas.DataFrame of process chains in IDs and labels.

If no process is added to this SpecPipe instance, returns None.

Return type:

Union[DataFrame, tuple[DataFrame, DataFrame], None]

See also

ls_custom_chains

Notes

This method is also available as process_chains_to_df.

Examples

For prepared SpecPipe instance pipe:

>>> pipe.ls_process_chains()

Or equivalent:

>>> pipe.process_chains_to_df()

Return label display in addition to process ID display:

>>> pipe.ls_process_chains(return_label=True)

process_chains_to_df(stage=None, print_label=True, return_label=False)#

List process chains. Returns the default full-factorial process chains.

Returns a dataframe where each row represents a processing chain with process IDs. For custom chains, use ls_custom_chains.

Parameters:

stagestr or None, optional

Processing stage, choose between:

- ``None``: list entire processing chains.
- ``preprocessing``: list unique preprocessing stage of the processing chains.
- ``assembly``: list unique assembly stage of the processing chains.
- ``model`` or ``modeling``: list unique preprocessing stage of the processing chains.

Default is None.

print_labelbool, optional

If True, prints chains using chain label. Default is True.

return_labelbool, optional

If True, returns an additional dataframe of process labels. Default is False.

Returns:

pandas.DataFrame or tuple of pandas.DataFrame or None

If return_label=False, returns a pandas.DataFrame of process chains in process IDs.

If return_label=True, returns a tuple of 2 pandas.DataFrame of process chains in IDs and labels.

If no process is added to this SpecPipe instance, returns None.

Return type:

Union[DataFrame, tuple[DataFrame, DataFrame], None]

See also

ls_custom_chains

Notes

This method is also available as process_chains_to_df.

Examples

For prepared SpecPipe instance pipe:

>>> pipe.ls_process_chains()

Or equivalent:

>>> pipe.process_chains_to_df()

Return label display in addition to process ID display:

>>> pipe.ls_process_chains(return_label=True)

custom_chains_from_df(process_chain_dataframe)[source]#

Customize processing chains and update chains using a chain dataframe.

Once custom chains are created, SpecPipe will prioritize their execution, bypassing the original full-factorial chains.

Parameters:

process_chain_dataframepandas.DataFrame-like

A process chain dataframe.

Must be a subset of the original full-factorial chains, and each chain must be complete. Columns must be [‘Step_1’, ‘Step_2’, …] and the length must match the column length of process_chains. And all values must be valid process IDs of SpecPipe.

It is recommended to modify the dataframe obtained from ls_process_chains or process_chains_to_df to construct a customized process chain dataframe. Users must retrieve the complete processing chain DataFrame by leaving stage at its default value or explicitly setting it to None.

Return type:

None

See also

ls_process_chains
process_chains_to_df

Examples

For prepared SpecPipe instance pipe:

>>> df_chain = pipe.process_chains_to_df()

After modification, load the modified dataframe:

>>> pipe.custom_chains_from_df(df_chain_modified)

ls_custom_chains(stage=None, print_label=True, return_label=False)[source]#

List customized process chains.

Returns a dataframe where each row represents a processing chain with process IDs.

Parameters:

stagestr or None, optional

Processing stage, choose between:

- ``None``: list entire processing chains.
- ``preprocessing``: list unique preprocessing stage of the processing chains.
- ``assembly``: list unique assembly stage of the processing chains.
- ``model`` or ``modeling``: list unique preprocessing stage of the processing chains.

Default is None.

print_labelbool, optional

If True, prints chains using chain label. Default is True.

return_labelbool, optional

If True, returns an additional dataframe of process labels. Default is False.

Returns:

pandas.DataFrame or None

If return_label=False, returns a pandas.DataFrame of process chains in process IDs.

If return_label=True, returns a tuple of 2 pandas.DataFrame of process chains in IDs and labels.

If no custom chain is specified in this SpecPipe instance, returns None.

Return type:

Union[DataFrame, tuple[DataFrame, DataFrame], None]

See also

ls_process_chains

Notes

This method is also available as custom_chains_to_df.

Examples

For prepared SpecPipe instance pipe:

>>> df_chain = pipe.ls_custom_chains()

custom_chains_to_df(stage=None, print_label=True, return_label=False)#

List customized process chains.

Returns a dataframe where each row represents a processing chain with process IDs.

Parameters:

stagestr or None, optional

Processing stage, choose between:

- ``None``: list entire processing chains.
- ``preprocessing``: list unique preprocessing stage of the processing chains.
- ``assembly``: list unique assembly stage of the processing chains.
- ``model`` or ``modeling``: list unique preprocessing stage of the processing chains.

Default is None.

print_labelbool, optional

If True, prints chains using chain label. Default is True.

return_labelbool, optional

If True, returns an additional dataframe of process labels. Default is False.

Returns:

pandas.DataFrame or None

If return_label=False, returns a pandas.DataFrame of process chains in process IDs.

If return_label=True, returns a tuple of 2 pandas.DataFrame of process chains in IDs and labels.

If no custom chain is specified in this SpecPipe instance, returns None.

Return type:

Union[DataFrame, tuple[DataFrame, DataFrame], None]

See also

ls_process_chains

Notes

This method is also available as custom_chains_to_df.

Examples

For prepared SpecPipe instance pipe:

>>> df_chain = pipe.ls_custom_chains()

ls_chains(stage=None, print_label=True, return_label=False)[source]#

List process chains for the pipeline execution.

Returns custom process chains if they are specified; otherwise, returns the default full-factorial process chains.

Returns a dataframe where each row represents a processing chain with process IDs.

Parameters:

stagestr or None, optional

Processing stage, choose between:

- ``None``: list entire processing chains.
- ``preprocessing``: list unique preprocessing stage of the processing chains.
- ``assembly``: list unique assembly stage of the processing chains.
- ``model`` or ``modeling``: list unique preprocessing stage of the processing chains.

Default is None.

print_labelbool, optional

If True, prints chains using chain label. Default is True.

return_labelbool, optional

If True, returns an additional dataframe of process labels. Default is False.

Returns:

pandas.DataFrame or None

If return_label=False, returns a pandas.DataFrame of process chains in process IDs.

If return_label=True, returns a tuple of 2 pandas.DataFrame of process chains in IDs and labels.

If no custom chain is specified in this SpecPipe instance, returns None.

Return type:

Union[DataFrame, tuple[DataFrame, DataFrame], None]

See also

ls_process_chains
ls_custom_chains

Examples

For created SpecPipe instance pipe:

>>> df_chain = pipe.ls_chains()

save_pipe_config(copy=False, save_spec_exp_config=True)[source]#

Save the current pipeline configuration files to the root of the report directory.

Parameters:

copybool, optional: Whether to create a backup copy of the configuration files. The default is True.
save_spec_exp_configbool, optional: Whether to save the data configuration of the associated SpecExp instance of this SpecPipe instance. The default is True.

Return type:

None

Notes

This method is also available as save_config.

Examples

For a created SpecPipe instance pipe:

>>> pipe.save_pipe_config()

Or equivalently:

>>> pipe.save_config()

Save a backup copy as well:

>>> pipe.save_pipe_config(copy=True)

save_config(copy=False, save_spec_exp_config=True)#

Save the current pipeline configuration files to the root of the report directory.

Parameters:

copybool, optional: Whether to create a backup copy of the configuration files. The default is True.
save_spec_exp_configbool, optional: Whether to save the data configuration of the associated SpecExp instance of this SpecPipe instance. The default is True.

Return type:

None

Notes

This method is also available as save_config.

Examples

For a created SpecPipe instance pipe:

>>> pipe.save_pipe_config()

Or equivalently:

>>> pipe.save_config()

Save a backup copy as well:

>>> pipe.save_pipe_config(copy=True)

load_pipe_config(config_file_path='')[source]#

Load SpecPipe configuration from a dill file.

Parameters:

config_file_pathstr, optional

Path to the SpecPipe configuration dill file.

Can be a file path or the file name in the report directory of this SpecPipe instance.

If not provided or empty, the path will be:

(SpecPipe.spec_exp.report_directory)/SpecPipe_configuration/SpecPipe_pipeline_configuration_created_at_(SpecExp.create_time).dill.

Default is empty string.

Return type:

None

See also

save_pipe_config

Notes

This method is also available as load_config.

Examples

For a created SpecPipe instance pipe:

>>> pipe.save_pipe_config()

Load from the default configuration path:

>>> pipe.load_pipe_config()

Or equivalently:

>>> pipe.load_config()

Load from a custom configuration file path:

>>> pipe.load_pipe_config("/pipe_config.dill")

load_config(config_file_path='')#

Load SpecPipe configuration from a dill file.

Parameters:

config_file_pathstr, optional

Path to the SpecPipe configuration dill file.

Can be a file path or the file name in the report directory of this SpecPipe instance.

If not provided or empty, the path will be:

(SpecPipe.spec_exp.report_directory)/SpecPipe_configuration/SpecPipe_pipeline_configuration_created_at_(SpecExp.create_time).dill.

Default is empty string.

Return type:

None

See also

save_pipe_config

Notes

This method is also available as load_config.

Examples

For a created SpecPipe instance pipe:

>>> pipe.save_pipe_config()

Load from the default configuration path:

>>> pipe.load_pipe_config()

Or equivalently:

>>> pipe.load_config()

Load from a custom configuration file path:

>>> pipe.load_pipe_config("/pipe_config.dill")

test_run(test_modeling=True, return_result=False, model_test_coverage=1.0, assembly_test_coverage=1.0, dump_result=True, dump_backup=False, save_preprocessed_images=False, num_type=<class 'numpy.float32'>)[source]#

Run the pipeline of all processing chains using simplified test data. This method is executed automatically prior to each formal run.

Parameters:

test_modelingbool, optional

Whether added models are tested. If False, only the first chain is tested. The default is True.

return_resultbool, optional

Whether results of the processes are returned. If True, results of all tested steps are returned in a list. The default is False.

model_test_coveragefloat, optional

Fraction of modeling pipelines to test. Set to a value < 1.0 to reduce test runtime by randomly sampling preprocessing results without replacement.

If 1.0, all pipelines will be tested.

If < 1.0, only the specified fraction of pipelines will be tested.

Default is 1.0.

assembly_test_coveragefloat, optional

Fraction of assembly pipelines to test. Ignored if no assembly process is configured. Set to a value < 1.0 to reduce test runtime by randomly sampling preprocessing results without replacement.

If 1.0, all pipelines will be tested.

If < 1.0, only the specified fraction of pipelines will be tested.

Default is 1.0.

dump_resultbool, optional

Whether test results are stored in the chains. The default is True.

dump_backupbool, optional

Whether backup of the step results is stored. The backup file is named with the datetime of dumping.

num_type: str or type, optional

Numeric data type for array-like data storage, supporting numeric numpy data types. Default is numpy.float32.

Returns:

list of dict of list or None

List of all tested steps in a dictionary.

Dictionary keys are the list of applied processes of a processing chain.

Dictionary values are lists of processing results of the applied processes for all steps of the processing chain.

Return type:

Optional[dict]

Examples

For a created SpecPipe instance pipe:

>>> pipe.test_run()

preprocessing(n_processor=1, resume=False, result_directory='', num_type=<class 'numpy.float32'>, dump_backup=False, step_result=False, to_csv=True, show_progress=True, save_config=True, summary=True, geo_reference_warning=False, skip_test=False, check_space=True)[source]#

Run preprocessing steps of all processing chains on the entire dataset and output modeling-ready sample_list data to files.

Parameters:

result_directorystr, optional

Directory for storing the preprocessing results. If not provided, the report_directory attribute of this SpecPipe instance is used.

For consistency with subsequent pipeline stages, using the default location is strongly recommended.

n_processorint

Number of processors to use in preprocessing.

Default is 1 (parallel processing is not applied).

Windows note: when using n_processor > 1 on Windows, all excecutable code in the working script must be placed within:

if __name__ == '__main__':

num_type: str or type, optional

Numeric data type for array-like data storage, supporting numeric numpy data types. Default is numpy.float32.

dump_backupbool, optional

Create backup files of results with timestamp. Default is False.

step_resultbool, optional

Whether to retain intermediate results for each processing chain.

If False, intermediate results are discarded immediately after processing.
If True, intermediate results are preserved. This may require substantial additional storage during processing.

Default is False.

resumebool

If True, computation resumes from preprocessing progress logs. Apply resume to avoid repeated preprocessing after interruption. Default is False.

to_csvbool

If True, final preprocessing results are also saved to CSV files in addition to dill files. Default is True.

show_progressbool, optional

Show processing progress. Default is True.

save_configbool, optional

Save SpecPipe configurations. Default is True.

summarybool, optional

Whether to summarize preprocessed data and target values. Default is True.

geo_reference_warningbool, optional

Whether to suppress GeoReferenceWarning. If False, the warning is suppressed. Default is False.

skip_testbool, optional

Whether to skip test execution completely. Test execution validates every processing chain and serves as a safeguard against runtime errors in long formal execution. Default is False.

check_spacebool, optional

Whether to validate available disk space against the estimated output size. If True, an error is raised when the estimate exceeds the available space. If False, a warning is issued instead. Default is True.

Return type:

None

See also

add_process
build_pipeline
assembly
model_evaluation
run

Examples

For created SpecPipe instance pipe:

>>> pipe.preprocessing()

Pipeline-level multiprocessing:

>>> pipe.preprocessing(n_processor=10)

assembly(n_processor=1, resume=False, dump_backup=False, step_result=False, show_progress=True)[source]#

Apply assembly process to introduce cross-sample interactions prior to modeling. This stage is skipped automatically if no assembly process have been added.

These operations directly modify the processed sample data and may alter both the composition of the sample set and the internal representation of individual samples.

Parameters:

n_processorint

Number of processors to use in preprocessing.

Default is 1 (parallel processing is not applied).

Windows note: when using n_processor > 1 on Windows, all excecutable code in the working script must be placed within:

if __name__ == '__main__':

resumebool

If True, computation resumes from preprocessing progress logs. Apply resume to avoid repeated preprocessing after interruption. Default is False.

dump_backupbool, optional

Create backup files of results with timestamp. Default is False.

step_resultbool, optional

Whether to retain intermediate results for each processing chain.

If False, intermediate results are discarded immediately after processing.
If True, intermediate results are preserved. This may require substantial additional storage during processing.

Default is False.

show_progressbool, optional

Show processing progress. Default is True.

Return type:

None

See also

add_process
build_pipeline
preprocessing
model_evaluation
run

Examples

For created SpecPipe instance pipe:

>>> pipe.preprocessing()
>>> pipe.assembly()
>>> pipe.model_evaluation()

Pipeline-level multiprocessing:

>>> pipe.assembly(n_processor=10)

model_evaluation(n_processor=1, resume=False, report_directory='', show_progress=True, save_config=True, summary=True, multitest_correction='fdr_bh', check_space=True)[source]#

Evaluate added models using processed sample data generated by all preprocessing chains. Modeling and evaluation behavior is configured when models are added to the pipeline.

The method automatically summarizes group-level statistics and computes marginal model performance metrics for alternative method options at each processing step.

Parameters:

n_processorint

Number of processors to use in preprocessing.

Default is 1 (parallel processing is not applied).

Windows note: when using n_processor > 1 on Windows, all excecutable code in the working script must be placed within:

if __name__ == '__main__':

resumebool

If True, computation resumes from preprocessing progress logs.

Use resume to avoid repeated preprocessing after interruption. Default is False.

result_directorystr, optional

Directory for storing the model evaluation results. If not provided, the report_directory attribute of this SpecPipe instance is used.

For consistency with subsequent pipeline stages, using the default location is strongly recommended.

show_progressbool, optional

Show processing progress. Default is True.

save_configbool, optional

Save SpecPipe configurations. Default is True.

summarybool, optional

Whether to summarize overall and marginal performance. Marginal performance metrics at each processing step is compared using the Mann–Whitney U test. Default is True.

multitest_correction: str or None

Method used for adjustment of significance test p-values. See statsmodels.stats.multitest.multipletests for available options.

check_spacebool, optional

Return type:

None

See also

add_process
add_model
build_pipeline
preprocessing
assembly
run
groupstats.performance_metrics_summary
groupstats.performance_marginal_stats
modelconnector.combined_model_marginal_stats

Examples

For a prepared SpecPipe instance pipe:

>>> pipe.preprocessing()
>>> pipe.model_evaluation()

Pipeline-level multiprocessing:

>>> pipe.model_evaluation(n_processor=10)

run(result_directory='', n_processor=-1, num_type=<class 'numpy.float32'>, test_model=True, model_parallel=True, dump_backup=False, step_result=False, resume=False, resume_modeling=True, sample_data_to_csv=True, show_progress=True, save_config=True, summary=True, multitest_correction='fdr_bh', geo_reference_warning=False, model_test_coverage=1.0, assembly_test_coverage=1.0, skip_test=False, check_space=True)[source]#

Run entire pipelines of specified processes of this SpecPipe instance on provided SpecExp instance. Processes are configured using method add_process and add_model.

Parameters:

result_directorystr, optional

Directory to save preprocessing and model evaluation reports. Default is the report_directory of the input SpecExp instance of this SpecPipe instance.

n_processorint, optional

Number of processors to use during pipeline execution.

Default is -1, which does not apply parallel execution on Windows and applies parallel execution using (maximum available CPUs - 1) processors on other operating systems.

Set to -2 to force (maximum available CPUs - 1) processors on Windows.

Windows note: when using n_processor > 1 or n_processor = -2 on Windows, all excecutable code in the working script must be placed within:

if __name__ == '__main__':

num_type: str or type, optional

Numeric data type for array-like data storage, supporting numeric numpy data types. Default is numpy.float32.

test_modelbool, optional

Whether to test added models before formal execution.

If False, model testing is skipped. Tests use minimal sample sizes, which may cause errors for some models. Default is True.

model_parallelbool, optional

Whether to enable pipeline-level parallelism during modeling.

Set to False when the modeling method already uses multiprocessing or GPU acceleration to avoid nested parallel execution. Default is True.

dump_backupbool, optional

Whether to create timestamped backup files of results. Default is False.

step_resultbool, optional

Whether to retain intermediate results for each processing chain.

If False, intermediate results are discarded immediately after processing.
If True, intermediate results are preserved. This may require substantial additional storage during processing.

Default is False.

resumebool, optional

Whether to resume execution from the last saved preprocessing checkpoint.

This avoids redundant processing after interruptions. Default is False.

resume_modelingbool, optional

If True, resume modeling; otherwise, rebuild and re-evaluate all models.

Effective only if resume=True, ignored if resume=False

Default is True.

sample_data_to_csvbool, optional

Whether to additionally save preprocessed sample data as CSV files. Default is True.

show_progressbool, optional

Whether to display execution progress. Default is True.

save_configbool, optional

Whether to save SpecPipe configuration files. Default is True.

summarybool, optional

Whether to summarize preprocessed data, performance metrics, and marginal performance metrics. Marginal performance at each step is compared using the Mann–Whitney U test. Default is True.

multitest_correction: str or None

Method used for adjustment of significance test p-values. See statsmodels.stats.multitest.multipletests for available options.

geo_reference_warningbool, optional

Whether to suppress GeoReferenceWarning messages. If False, warnings are suppressed. Default is False.

skip_testbool, optional

Whether to skip test execution entirely. Test execution validates all processing chains and serves as a safeguard against runtime errors during long executions. Default is False.

check_spacebool, optional

Return type:

None

See also

preprocessing
model_evaluation

Examples

For a prepared SpecPipe instance pipe:

>>> pipe.run()

Pipeline-level multiprocessing:

>>> pipe.run(n_processor=10)

Automatically determine CPU usage:

>>> pipe.run(n_processor=-1)

Windows multiprocessing:

>>> if __name__ == '__main__':
...     pipe.run(n_processor=10)
>>> if __name__ == '__main__':
...     pipe.run(n_processor=-2)

report_summary()[source]#

Retrieve summary of generated reports in the console. The summary includes performance summary and marginal performances among added processes of each pipeline step.

Returns:

dict

A dictionary of pandas DataFrames with the following contents:

For regression:

Performance summary.
Marginal R2 of the steps with multiple processes.

For classification:

Macro- and micro-average performance summary.
Marginal macro- and micro-average AUC of the steps with multiple processes.

Return type:

dict

Examples

For SpecPipe instance pipe after running:

>>> result_summary = pipe.report_summary()

report_chains()[source]#

Retrieve major model evaluation reports of every processing chain in the console.

Returns:

list of dict

Each dictionary contains the reports of one processing chain.

For regression pipelines, the reports include:

Processes of the chain
Validation results
Performance metrics
Residual analysis
Influence analysis (if available)
Scatter plot
Residual plot

For classification pipelines, the reports include:

Processes of the chain
Validation results
Performance metrics
Residual analysis
Influence analysis (if available)
ROC curves

Return type:

list[dict]

Examples

For SpecPipe instance pipe after running:

>>> chain_results = pipe.report_chains()