swectral.denoiser.replace_outlier#

swectral.denoiser.replace_outlier(data, test_method='iqr', to='neighbor', axis=0, *, dixon_alpha=0.05, iqr_multiplier=1.5, modified_z_threshold=3.5, numtype='float32', generate_report=False)[source]#

Replace outliers in a 2D array or dataframe of 1D data.

Parameters:
datanumpy.ndarray or pandas.DataFrame

2D array or dataframe of 1D data.

test_methodstr

The method of outlier test. See ArrayOutlier for details.

tostr

The outlier replacement strategy. See ArrayOutlier for details.

axisint

Calculate along the axis. The default is 0.

dixon_alphafloat

Two-tail significance level for Dixon’s Q test, the default is 0.05.

iqr_multiplierfloat

Multiplier applied to the interquartile range (IQR) to define the lower and upper bounds for outlier detection. The default is 1.5.

modified_z_thresholdfloat

Threshold value used in modified z-score–based outlier detection. Observations with an absolute modified z-score exceeding this value are classified as outliers. The default is 3.5.

numtypestr

Numpy-supported numeric data type for test computation and output. Default is "float32".

generate_reportbool

Whether to generate reports of outlier tests. If True, the generation can be time-consuming for large datasets. The default is False.

Returns:
numpy.ndarray or tuple of (numpy.ndarray, list)

Data with outlier replaced or data with outlier replaced and outlier detection reports.

Return type:

Union[ndarray, tuple[ndarray, list]]

Examples

Basic usage for outlier replacement using default settings:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]])

Specify outlier detection method:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon')

Customize outlier detection method:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon', dixon_alpha=0.1)

Specify replacement strategy:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], to='median')

Retrieve report in addition to result of replacement:

>>> result, report = replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], generate_report=True)