swectral.denoiser.ArrayOutlier#

class swectral.denoiser.ArrayOutlier(test_method='iqr', to='neighbor', axis=0, *, dixon_alpha=0.05, iqr_multiplier=1.5, modified_z_threshold=3.5, numtype='float32', generate_report=False)[source]#

Identify and replace outliers in 1D data lines in the dataframe or 2D array.

Attributes:

test_methodstr

The method of outlier test. Available options:

“dixon” - Dixon’s Q test,
“iqr” - interquartile range,
“modified_z” - Modified Z-score.

The default is “iqr”.

tostr

The outlier replacement strategy. The outlier can be replaced by:

“nan” - the outlier is removed and not calculated.
“mean” - mean value of nonoutliers.
“median” - median of nonoutliers.
“neighbor” - the closest nonoutlier value of the outlier. If two are availble, average of the two neighbors are used.

The default is “neighbor”.

axisint

Calculate along the axis. The default is 0.

dixon_alphafloat

Two-tail significance level for Dixon’s Q test, the default is 0.05.

iqr_multiplierfloat, optional

Multiplier applied to the interquartile range (IQR) to define the lower and upper bounds for outlier detection.

The default is 1.5.

modified_z_thresholdfloat, optional

Threshold value used in modified z-score–based outlier detection. Observations with an absolute modified z-score exceeding this value are classified as outliers.

The default is 3.5.

numtypestr

Numpy-supported numeric data type for test computation and output, default is “float32”.

generate_reportbool

Whether to generate reports of outlier tests.

The generation can be time-consuming for large datasets. Repeated calls to ArrayOutlier.replace() accumulate reports in the ArrayOutlier.report, which can lead to significant memory growth.

The default is False.

reportlist or list

List of reports of each “replace” exection if generate_report is True.

Methods

`iqr`(data_series)	Identify outliers using the Interquartile Range (IQR) criterion and return their indices.
`modified_z`(data_series)	Identify outliers using the modified z score approach and return their indices.
`replace`(data)	Replace outliers in a 2D array or dataframe of 1D data series.

dixon_q

Apply Dixon’s Q test to get outlier and nonoutlier indices of 1D data series.

Examples

Use default settings:

>>> outlier = ArrayOutlier()

Specify outlier detection method:

>>> outlier = ArrayOutlier(test_method='dixon')

Customize outlier detection method:

>>> outlier = ArrayOutlier(test_method='dixon', dixon_alpha=0.1)

Specify replacement strategy:

>>> outlier = ArrayOutlier(to='median')

Retrieve report in addition to result of replacement:

>>> outlier = ArrayOutlier(generate_report=True)
>>> report = outlier.report

__init__(test_method='iqr', to='neighbor', axis=0, *, dixon_alpha=0.05, iqr_multiplier=1.5, modified_z_threshold=3.5, numtype='float32', generate_report=False)[source]#

Methods

`__init__`([test_method, to, axis, ...])
`dixon`(data_series)	Perform Dixon's Q test to identify outliers in a dataset.
`iqr`(data_series)	Identify outliers using the Interquartile Range (IQR) criterion and return their indices.
`modified_z`(data_series)	Identify outliers using the modified z score approach and return their indices.
`replace`(data)	Replace outliers in a 2D array or dataframe of 1D data series.

dixon(data_series)[source]#

Perform Dixon’s Q test to identify outliers in a dataset.

Parameters:

data_serieslist or 1D array_like: Series of data for outlier detection. Dixon’s Q Test requires a sample size between 3~30.

Returns:

A tuple of:

outlier_indicesnumpy.ndarray: Numpy arrays of outlier indices.
nonoutlier_indicesnumpy.ndarray: Numpy arrays of non-outlier indices.
test_reportlist or None: List of test report if generated.

Raises:

ValueError: If sample size beyond range 3~30.

Return type:

tuple[ndarray, ndarray, Optional[list]]

Examples

>>> outlier = ArrayOutlier()
>>> outlier_ind, non_outlier_ind, report = outlier.dixon([1, 2, 3, 99, 5, 6])

iqr(data_series)[source]#

Identify outliers using the Interquartile Range (IQR) criterion and return their indices.

Parameters:

data_serieslist or 1D array_like: List or 1D array of a data series for outlier detection. The length must be at least 5.

Returns:

A tuple of:

outlier_indicesnumpy.ndarray: Numpy arrays of outlier indices.
nonoutlier_indicesnumpy.ndarray: Numpy arrays of non-outlier indices.
test_reportlist or None: List of test report if generated.

Raises:

ValueError: If sample size < 5.

Return type:

tuple[ndarray, ndarray, Optional[list]]

Examples

>>> outlier = ArrayOutlier()
>>> outlier_ind, non_outlier_ind, report = outlier.iqr([1, 2, 3, 99, 5, 6])

modified_z(data_series)[source]#

Identify outliers using the modified z score approach and return their indices.

Parameters:

data_series1D list or numpy array

The data series to test for outliers. The lenth should be at least 12.

Please be aware that the function does not check for data normality that is required by the approach.

Returns:

A tuple of:

outlier_indicesnumpy.ndarray: Numpy arrays of outlier indices.
nonoutlier_indicesnumpy.ndarray: Numpy arrays of non-outlier indices.
test_reportlist or None: List of test report if generated.

Raises:

ValueError: If sample size < 5.

Return type:

tuple[ndarray, ndarray, Optional[list]]

Warning

UserWarning: If sample size >= 5 but < 12. Applicable but result may not be reliable due to normality identification.

Examples

>>> outlier = ArrayOutlier()
>>> outlier_ind, non_outlier_ind, report = outlier.modified_z([1, 2, 3, 99, 5, 6])

replace(data)[source]#

Replace outliers in a 2D array or dataframe of 1D data series.

Parameters:

datanumpy.ndarray or pandas.DataFrame: 2D array or dataframe of 1D data.

Returns:

numpy.ndarray: Data with outlier replaced.

Raises:

ValueError: If input data is not 2D numpy array or pandas dataframe.
ValueError: Unknown absence of replace value in outlier replacement.

Return type:

ndarray

Examples

>>> outlier = ArrayOutlier()
>>> outlier.replace([[1, 2, 3, 99, 5, 6], [2, 2, 4, 4, 6, 6]])