swectral.modelcombiners.create_bagging_model#

swectral.modelcombiners.create_bagging_model(base_estimator, n_estimators=50, max_samples=1.0, replace_sample=True, oversampling=False, feature_subset=None, replace_feature=False, random_state=None, regressor_aggregate='mean', limit_proba=None, is_classifier=None, name=None)[source]#

Create a bagging model instance from specified base_estimator.

Parameters:
base_estimatorobject

Any estimator implementing fit and predict following the scikit-learn API. If the estimator implements predict_proba, the ensemble will operate in classification mode.

n_estimatorsint, optional

Number of base estimators to train in the ensemble. Default is 20.

max_samplesfloat, optional

Fraction of the training samples to draw for each base estimator. Must be in the interval (0, 1]. Default is 1.

replace_samplebool, optional

Whether sampling is performed with replacement. If False, sampling is performed without replacement. Default is True.

oversamplingbool, optional

Whether to apply oversampling for rare cases in the training data.

  • For categorical targets, rare classes are upsampled to reduce class imbalance.

  • For continuous targets, underrepresented target regions are upsampled. The target space is divided into adaptive bins, with a maximum of 10 bins.

Default is False.

feature_subsetstr, float, int, or None

Strategy for selecting a subset of features for each base estimator. Options are:

  • "sqrt" : Use the square root of the total number of features.

  • "log" : Use log2 of the total number of features.

  • float between 0 and 1 : Use this fraction of the total features.

  • int : Use this exact number of features (must be positive).

  • None : Use all features, no resampling is applied.

If resampled, features are selected randomly according to the specified strategy. Default is None.

replace_featurebool

Whether feature resampling is performed with replacement. If False, feature resampling is performed without replacement. Default is False.

random_stateint or None, optional

Seed used by the random number generator for reproducible bootstrap sampling. Default is None.

regressor_aggregate: str, optional

Aggregate type for regressors. Choose between:

  • "mean": Use the average of base estimator predictions.

  • "median": Use the median of base estimator predictions.

  • tuple of two float: Use a trimmed mean, keeping only predictions within the given quantile range (e.g., (0.1, 0.9)).

Default is “mean”.

limit_proba: None or tuple of two float, optional

Limit probability in ensemble. Any probability from base models will be restricted to this range. If None, no limit of probability is applied. Default is None.

is_classifierbool or None, optional

Whether the base estimator should be treated as a classifier.

If None, the ensemble will automatically detect the type by inspecting the base estimator for attributes _estimator_type or classes_, or method predict_proba. Default is None.

namestr or None, optional

Name of the created model class. If None, the class name is 'Bagging<BaseEstimatorClassName>'. Default is None.

Returns:
object

An bagging model instance with a sklearn-style model interface.

Return type:

object

See also

BaggingEnsembler

Examples

Basic Usage:

from sklearn.cross_decomposition import PLSRegression

model = create_bagging_model(
    base_estimator=PLSRegression(n_components=5),
    n_estimators=100
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)