swectral.denoiser.replace_outlier#

swectral.denoiser.replace_outlier(data, test_method='iqr', to='neighbor', axis=0, *, dixon_alpha=0.05, iqr_multiplier=1.5, modified_z_threshold=3.5, numtype='float32', generate_report=False)[source]#

Replace outliers in a 2D array or dataframe of 1D data.

Parameters:

datanumpy.ndarray or pandas.DataFrame: 2D array or dataframe of 1D data.
test_methodstr: The method of outlier test. See ArrayOutlier for details.
tostr: The outlier replacement strategy. See ArrayOutlier for details.
axisint: Calculate along the axis. The default is 0.
dixon_alphafloat: Two-tail significance level for Dixon’s Q test, the default is 0.05.
iqr_multiplierfloat: Multiplier applied to the interquartile range (IQR) to define the lower and upper bounds for outlier detection. The default is 1.5.
modified_z_thresholdfloat: Threshold value used in modified z-score–based outlier detection. Observations with an absolute modified z-score exceeding this value are classified as outliers. The default is 3.5.
numtypestr: Numpy-supported numeric data type for test computation and output. Default is "float32".
generate_reportbool: Whether to generate reports of outlier tests. If True, the generation can be time-consuming for large datasets. The default is False.

Returns:

numpy.ndarray or tuple of (numpy.ndarray, list): Data with outlier replaced or data with outlier replaced and outlier detection reports.

Return type:

Union[ndarray, tuple[ndarray, list]]

See also

ArrayOutlier
ArrayOutlier.replace

Examples

Basic usage for outlier replacement using default settings:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]])

Specify outlier detection method:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon')

Customize outlier detection method:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon', dixon_alpha=0.1)

Specify replacement strategy:

>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], to='median')

Retrieve report in addition to result of replacement:

>>> result, report = replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], generate_report=True)