swectral.denoiser.replace_outlier#
- swectral.denoiser.replace_outlier(data, test_method='iqr', to='neighbor', axis=0, *, dixon_alpha=0.05, iqr_multiplier=1.5, modified_z_threshold=3.5, numtype='float32', generate_report=False)[source]#
Replace outliers in a 2D array or dataframe of 1D data.
- Parameters:
- data
numpy.ndarrayorpandas.DataFrame 2D array or dataframe of 1D data.
- test_method
str The method of outlier test. See
ArrayOutlierfor details.- to
str The outlier replacement strategy. See
ArrayOutlierfor details.- axis
int Calculate along the axis. The default is 0.
- dixon_alpha
float Two-tail significance level for Dixon’s Q test, the default is 0.05.
- iqr_multiplier
float Multiplier applied to the interquartile range (IQR) to define the lower and upper bounds for outlier detection. The default is 1.5.
- modified_z_threshold
float Threshold value used in modified z-score–based outlier detection. Observations with an absolute modified z-score exceeding this value are classified as outliers. The default is 3.5.
- numtype
str Numpy-supported numeric data type for test computation and output. Default is
"float32".- generate_reportbool
Whether to generate reports of outlier tests. If True, the generation can be time-consuming for large datasets. The default is False.
- data
- Returns:
numpy.ndarrayortupleof(numpy.ndarray,list)Data with outlier replaced or data with outlier replaced and outlier detection reports.
- Return type:
See also
Examples
Basic usage for outlier replacement using default settings:
>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]])
Specify outlier detection method:
>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon')
Customize outlier detection method:
>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], test_method='dixon', dixon_alpha=0.1)
Specify replacement strategy:
>>> replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], to='median')
Retrieve report in addition to result of replacement:
>>> result, report = replace_outlier([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]], generate_report=True)