swectral.modelcombiners.BaggingEnsembler#

class swectral.modelcombiners.BaggingEnsembler(base_estimator, n_estimators=50, max_samples=1.0, replace_sample=True, oversampling=False, feature_subset=None, replace_feature=False, random_state=None, regressor_aggregate='mean', limit_proba=None, is_classifier=None)[source]#

Bagging ensemble for regression and classification models with options of feature resampling.

Unlike sklearn.ensemble.BaggingClassifier and sklearn.ensemble.BaggingRegressor, this implementation is designed for flexibile custom models and robustness on small dataset. Stratified resampling with optional replacement is applied for classifiers, and optional oversampling can boost rare classes or underrepresented target regions.

This class creates a bagging (bootstrap aggregating) model ensemble from base estimators implementing fit and predict. It supports both regression and classification. If the base estimator exposes a predict_proba method, the ensemble is treated as a classifier and probability averaging is used for prediction. Classifier must supports classes_ and predict_proba.

Attributes:
base_estimatorobject

Any estimator implementing fit and predict following the scikit-learn API. If the estimator implements predict_proba, the ensemble will operate in classification mode.

n_estimatorsint, optional

Number of base estimators to train in the ensemble. Default is 20.

max_samplesfloat, optional

Fraction of the training samples to draw for each base estimator. Must be in the interval (0, 1]. Default is 1.

replace_samplebool, optional

Whether sampling is performed with replacement. If False, sampling is performed without replacement. Default is True.

oversamplingbool, optional

Whether to apply oversampling for rare cases in the training data.

  • For categorical targets, rare classes are upsampled to reduce class imbalance.

  • For continuous targets, underrepresented target regions are upsampled. The target space is divided into adaptive bins, with a maximum of 10 bins.

Default is False.

feature_subsetstr, float, int, or None

Strategy for selecting a subset of features for each base estimator. Options are:

  • "sqrt" : Use the square root of the total number of features.

  • "log" : Use log2 of the total number of features.

  • float between 0 and 1 : Use this fraction of the total features.

  • int : Use this exact number of features (must be positive).

  • None : Use all features, no resampling is applied.

If resampled, features are selected randomly according to the specified strategy. Default is None.

replace_featurebool

Whether feature resampling is performed with replacement. If False, feature resampling is performed without replacement. Default is False.

random_stateint or None, optional

Seed used by the random number generator for reproducible bootstrap sampling. Default is None.

regressor_aggregate: str, optional

Aggregate type for regressors. Choose between:

  • "mean": Use the average of base estimator predictions.

  • "median": Use the median of base estimator predictions.

  • tuple of two float: Use a trimmed mean, keeping only predictions within the given quantile range (e.g., (0.1, 0.9)).

Default is “mean”.

limit_proba: None or tuple of two float, optional

Limit probability in ensemble. Any probability from base models will be restricted to this range. If None, no limit of probability is applied. Default is None.

is_classifierbool or None, optional

Whether the base estimator should be treated as a classifier.

If None, the ensemble will automatically detect the type by inspecting the base estimator for attributes _estimator_type or classes_, or method predict_proba. Default is None.

nfeatureint

Number of features actually used for each base estimator. Derived from feature_subset.

estimators_dict of numpy.integer to object

The collection of fitted base estimators.

classes_numpy.ndarray of shape (n_classes,), optional

Class labels known to the classifier. Only present if the base estimator supports predict_proba.

Methods

fit(X, y)

Fit the bagging ensemble on the training data.

predict(X)

Predict regression targets or class labels for X.

predict_proba(X)

Predict class probabilities for X. Only available if the base estimator supports predict_proba.

Examples

Bagged regressor:

from sklearn.cross_decomposition import PLSRegression

model = BaggingEnsembler(
    base_estimator=PLSRegression(n_components=5),
    n_estimators=100
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Bagged classifier:

from sklearn.linear_model import LogisticRegression

model = BaggingEnsembler(
    base_estimator=LogisticRegression(max_iter=1000),
    n_estimators=50
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

Specify fraction of training sample and random state used for base estimators:

model = BaggingEnsembler(
    base_estimator=PLSRegression(n_components=5),
    n_estimators=100,
    max_samples=0.8,
    random_state=42
)

Use without replacement:

model = BaggingEnsembler(
    base_estimator=PLSRegression(n_components=5),
    n_estimators=100,
    replace_sample=False
)

With oversampling:

model = BaggingEnsembler(
    base_estimator=PLSRegression(n_components=5),
    n_estimators=100,
    oversampling=True
)
__init__(base_estimator, n_estimators=50, max_samples=1.0, replace_sample=True, oversampling=False, feature_subset=None, replace_feature=False, random_state=None, regressor_aggregate='mean', limit_proba=None, is_classifier=None)[source]#

Methods

__init__(base_estimator[, n_estimators, ...])

fit(X, y)

Fit the bagging ensemble on the training data.

predict(X)

Predict regression targets or class labels.

predict_proba(X)

Predict class probabilities.

fit(X, y)[source]#

Fit the bagging ensemble on the training data.

Parameters:
Xarray_like of shape (n_samples, n_features)

Training input samples.

yarray_like of shape (n_samples,)

Target values.

Returns:
selfBaggingEnsembler

Fitted ensemble.

Return type:

object

predict(X)[source]#

Predict regression targets or class labels.

Parameters:
Xarray_like of shape (n_samples, n_features)

Input samples.

Returns:
y_predndarray

Predicted values or class labels.

Return type:

ndarray

predict_proba(X)[source]#

Predict class probabilities.

Parameters:
Xarray_like of shape (n_samples, n_features)

Input samples.

Returns:
probanumpy.ndarray of shape (n_samples, n_classes)

Averaged class probabilities.

Raises:
AttributeError

If the base estimator does not support predict_proba.

Return type:

ndarray