swectral.modelcombiners.BaggingEnsembler#
- class swectral.modelcombiners.BaggingEnsembler(base_estimator, n_estimators=50, max_samples=1.0, replace_sample=True, oversampling=False, feature_subset=None, replace_feature=False, random_state=None, regressor_aggregate='mean', limit_proba=None, is_classifier=None)[source]#
Bagging ensemble for regression and classification models with options of feature resampling.
Unlike
sklearn.ensemble.BaggingClassifierandsklearn.ensemble.BaggingRegressor, this implementation is designed for flexibile custom models and robustness on small dataset. Stratified resampling with optional replacement is applied for classifiers, and optional oversampling can boost rare classes or underrepresented target regions.This class creates a bagging (bootstrap aggregating) model ensemble from base estimators implementing
fitandpredict. It supports both regression and classification. If the base estimator exposes apredict_probamethod, the ensemble is treated as a classifier and probability averaging is used for prediction. Classifier must supportsclasses_andpredict_proba.- Attributes:
- base_estimator
object Any estimator implementing
fitandpredictfollowing the scikit-learn API. If the estimator implementspredict_proba, the ensemble will operate in classification mode.- n_estimators
int,optional Number of base estimators to train in the ensemble. Default is 20.
- max_samples
float,optional Fraction of the training samples to draw for each base estimator. Must be in the interval
(0, 1]. Default is 1.- replace_samplebool,
optional Whether sampling is performed with replacement. If
False, sampling is performed without replacement. Default is True.- oversamplingbool,
optional Whether to apply oversampling for rare cases in the training data.
For categorical targets, rare classes are upsampled to reduce class imbalance.
For continuous targets, underrepresented target regions are upsampled. The target space is divided into adaptive bins, with a maximum of 10 bins.
Default is False.
- feature_subset
str,float,int,orNone Strategy for selecting a subset of features for each base estimator. Options are:
"sqrt": Use the square root of the total number of features."log": Use log2 of the total number of features.float between 0 and 1 : Use this fraction of the total features.
int : Use this exact number of features (must be positive).
None : Use all features, no resampling is applied.
If resampled, features are selected randomly according to the specified strategy. Default is None.
- replace_featurebool
Whether feature resampling is performed with replacement. If
False, feature resampling is performed without replacement. Default is False.- random_state
intorNone,optional Seed used by the random number generator for reproducible bootstrap sampling. Default is None.
- regressor_aggregate: str, optional
Aggregate type for regressors. Choose between:
"mean": Use the average of base estimator predictions."median": Use the median of base estimator predictions.tuple of two float: Use a trimmed mean, keeping only predictions within the given quantile range (e.g., (0.1, 0.9)).
Default is “mean”.
- limit_proba: None or tuple of two float, optional
Limit probability in ensemble. Any probability from base models will be restricted to this range. If None, no limit of probability is applied. Default is
None.- is_classifierbool or
None,optional Whether the base estimator should be treated as a classifier.
If
None, the ensemble will automatically detect the type by inspecting the base estimator for attributes_estimator_typeorclasses_, or methodpredict_proba. Default is None.- nfeature
int Number of features actually used for each base estimator. Derived from
feature_subset.- estimators_
dictofnumpy.integertoobject The collection of fitted base estimators.
- classes_
numpy.ndarrayofshape(n_classes,),optional Class labels known to the classifier. Only present if the base estimator supports
predict_proba.
- base_estimator
Methods
fit(X, y)
Fit the bagging ensemble on the training data.
predict(X)
Predict regression targets or class labels for
X.predict_proba(X)
Predict class probabilities for
X. Only available if the base estimator supportspredict_proba.Examples
Bagged regressor:
from sklearn.cross_decomposition import PLSRegression model = BaggingEnsembler( base_estimator=PLSRegression(n_components=5), n_estimators=100 ) model.fit(X_train, y_train) y_pred = model.predict(X_test)
Bagged classifier:
from sklearn.linear_model import LogisticRegression model = BaggingEnsembler( base_estimator=LogisticRegression(max_iter=1000), n_estimators=50 ) model.fit(X_train, y_train) y_pred = model.predict(X_test) y_proba = model.predict_proba(X_test)
Specify fraction of training sample and random state used for base estimators:
model = BaggingEnsembler( base_estimator=PLSRegression(n_components=5), n_estimators=100, max_samples=0.8, random_state=42 )
Use without replacement:
model = BaggingEnsembler( base_estimator=PLSRegression(n_components=5), n_estimators=100, replace_sample=False )
With oversampling:
model = BaggingEnsembler( base_estimator=PLSRegression(n_components=5), n_estimators=100, oversampling=True )
- __init__(base_estimator, n_estimators=50, max_samples=1.0, replace_sample=True, oversampling=False, feature_subset=None, replace_feature=False, random_state=None, regressor_aggregate='mean', limit_proba=None, is_classifier=None)[source]#
Methods
__init__(base_estimator[, n_estimators, ...])fit(X, y)Fit the bagging ensemble on the training data.
predict(X)Predict regression targets or class labels.
Predict class probabilities.
- fit(X, y)[source]#
Fit the bagging ensemble on the training data.
- Parameters:
- Xarray_like
ofshape(n_samples,n_features) Training input samples.
- yarray_like
ofshape(n_samples,) Target values.
- Xarray_like
- Returns:
- self
BaggingEnsembler Fitted ensemble.
- self
- Return type:
- predict(X)[source]#
Predict regression targets or class labels.
- Parameters:
- Xarray_like
ofshape(n_samples,n_features) Input samples.
- Xarray_like
- Returns:
- y_pred
ndarray Predicted values or class labels.
- y_pred
- Return type:
- predict_proba(X)[source]#
Predict class probabilities.
- Parameters:
- Xarray_like
ofshape(n_samples,n_features) Input samples.
- Xarray_like
- Returns:
- proba
numpy.ndarrayofshape(n_samples,n_classes) Averaged class probabilities.
- proba
- Raises:
AttributeErrorIf the base estimator does not support
predict_proba.
- Return type: