Ensemble Methods

The skfb.ensemble module implements cascade and routing ensemble methods for efficient and selective multi-model inference.

Threshold Cascade Classifiers

These ensemble estimators learn confidence thresholds for efficient cascading decisions.

class skfb.ensemble.ThresholdCascadeClassifier(estimators, thresholds, response_method='predict_proba', return_earray=True, prefit=False, n_jobs=None, verbose=False)[source]

Cascade of classifiers w/ deferrals based on predefined thresholds.

During inference, runs the first estimator and if a predicted score is lower than thresholds[0], tries the second, and so on. The last estimator always makes predictions on the samples deferred by the previous estimators. If every estimator is fitted, it is not necessary to run fit to make predictions.

Parameters:
  • estimators (array-like of object, length n_estimators) – Base estimators. Preferrably, from weakest (e.g., rule-based or linear) to strongest (e.g., gradient boosting).

  • thresholds (float or array-like of float, length n_estimators - 1) – Deferral thresholds for each base estimator except the last. If only one number is specified, every estimator (except the last) will have the same threshold (i.e., the threshold will be global).

  • response_method ({"predict_proba", "decision_function"}, default="predict_proba") – Methods by estimators for which we want to find return deferral thresholds. For "decision_function", thresholds can be negative.

  • return_earray (bool, default=False) – Whether to return ENDArray of predicted classes / scores or plain numpy ndarray.

  • prefit (bool, default=False) – Whether estimators are fitted. If True, checks their classes_ attributes for intercompatibility.

  • n_jobs (int, default=None) – Number of parallel jobs used during training.

  • verbose (int, default=False) – Verbosity level.

Examples

>>> import numpy as np
>>> from skfb.ensemble import ThresholdCascadeClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> X = np.array([
...     [0, 0], [4, 4], [1, 1], [3, 3], [2.5, 2], [2., 2.5], [2., 2.], [2.5, 2.5]
... ])
>>> y = np.array([0, 1, 0, 1, 0, 1, 1, 0])
>>> maxent = LogisticRegression(random_state=0)
>>> rf = RandomForestClassifier(random_state=0)
>>> cascade = ThresholdCascadeClassifier([maxent, rf], [0.8]).fit(X, y)
>>> cascade.score(X, y)
1.0
>>> cascade.set_estimators(0).score(X, y)  # Use only LogisticRegression
0.75

Notes

If you want to have a fallback option (for the last estimator), consider rejectors from skfb.estimators.

Methods

decision_function(X)

Predicts decision scores using one or more base estimators.

fit(X, y[, sample_weight])

Fits base estimators and sets meta-estimator attributes.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predicts classes using one or more base estimators.

predict_log_proba(X)

Predicts log-probabilities using one or more base estimators.

predict_proba(X)

Predicts probabilities using one or more base estimators.

reset_estimators()

Reactivates all the base estimators.

score(X, y[, sample_weight])

Computes accuracy score on true labels and cascade predictions.

set_estimators(index)

Sets the estimators and thresholds to use for prediction and scoring.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_params(**params)

Sets the parameters of the cascade.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

decision_function(X)[source]

Predicts decision scores using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_score – Decision scores predicted by the base estimators.

Return type:

ndarray of shape n_samples

fit(X, y, sample_weight=None)[source]

Fits base estimators and sets meta-estimator attributes.

Parameters:
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_outputs)) – The target values (class labels).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.

Returns:

self – Returns self.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]

Predicts classes using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_pred – Classes predicted by the base estimators.

Return type:

ndarray of shape (n_samples,)

predict_log_proba(X)[source]

Predicts log-probabilities using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_score – Log-probabilities predicted by the base estimators.

Return type:

ndarray of shape (n_samples, n_classes)

predict_proba(X)[source]

Predicts probabilities using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_prob – Probabilities predicted by the base estimators.

Return type:

ndarray of shape (n_samples, n_classes)

reset_estimators()[source]

Reactivates all the base estimators.

Same as set_estimators("all"). Use if you previously set to skip some estimators and thresholds, and want to activate all estimators again.

Returns:

self – Returns self.

Return type:

object

score(X, y, sample_weight=None)[source]

Computes accuracy score on true labels and cascade predictions.

Parameters:
  • X (indexable, length n_samples) – Input samples to evaluate. Must fulfill the input assumptions of the underlying estimators.

  • y (array-like of shape (n_samples,)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.

Returns:

score – Accuracy score.

Return type:

float

set_estimators(index)[source]

Sets the estimators and thresholds to use for prediction and scoring.

If a single index passed, the corresponding threshold is set to 0.0 or -np.inf depending on the response_method attribute.

By default, uses all trained estimators (available by the estimators_ thresholds attribute).

Parameters:

index (int, or slice, or "all", or array-like of int) –

Returns:

self – Returns self.

Return type:

object

Raises:

TypeError: – If index is of unsupported type or value.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ThresholdCascadeClassifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)[source]

Sets the parameters of the cascade.

If thresholds are provided, the transformations are done accordingly, so there is no need to refit the cascade.

Parameters:

**params (dict) – Cascade parameters.

Returns:

self – Returns self.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ThresholdCascadeClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class skfb.ensemble.ThresholdCascadeClassifierCV(estimators, costs=None, cv_thresholds=None, min_score=None, max_cost=None, strategy='balanced', cv=5, scoring='accuracy', raise_error=False, response_method='predict_proba', return_earray=True, n_jobs=None, verbose=0)[source]

Cascade of classifiers with Pareto-optimized deferral thresholds.

Optimizes deferral thresholds via cross-validation grid search, identifying non-dominated (Pareto-optimal) threshold configurations that balance performance and computational cost. Users can select thresholds based on performance constraints or cost budgets (e.g., select the best threshold configuration s.t. it gives at least min_score classification score on validation).

During inference, runs the first estimator and if a predicted score is lower than thresholds[0], tries the second, and so on. The last estimator always makes predictions on deferred samples.

Parameters:
  • estimators (array-like of object, length n_estimators) – Base estimators. Preferably ordered from weakest (fast, low-accuracy) to strongest (slow, high-accuracy).

  • costs (array-like of shape (n_estimators,) or float, default=None) – Computational cost per estimator. Used to identify non-dominated threshold configurations along the cost-performance tradeoff. Defaults to uniform costs summing to 1.0.

  • cv_thresholds (array-like of shape (n_thresholds,) or int, default=None) – Candidate deferral thresholds for grid search. If None, defaults to 10 thresholds linearly spaced from 1/n_classes to 0.95. If int, generates that many thresholds in the same range.

  • cv (int, cross-validation generator or iterable, default=5) –

    Cross-validation splitting strategy. Accepts:

    • int: number of folds (uses StratifiedKFold for classification)

    • CV splitter object

    • Iterable yielding (train_idx, test_idx) splits

  • scoring (callable or str, default="accuracy") – Scorer for threshold evaluation. Can be a scikit-learn scorer name (e.g., “accuracy”, “f1”) or a callable with signature scorer(y_true, y_pred) -> float (higher is better).

  • min_score (float, default=None) – Minimum acceptable cross-validation score. If specified, selects the Pareto config with lowest cost meeting this accuracy constraint. If None (default), uses the highest-accuracy Pareto config.

  • max_cost (float, default=None) – Maximum acceptable computational cost. If specified, selects the Pareto config with highest accuracy within this cost budget. If None (default), uses the highest-accuracy Pareto config.

  • raise_error (bool, default=False) – Whether to raise CascadeParetoConfigException if no Pareto configuration satisfies the specified constraints (min_score, max_cost). If False (default), issues a warning and falls back to the highest-accuracy Pareto config.

  • response_method ({"predict_proba", "decision_function"}, default="predict_proba") – Method by estimators for computing deferral scores.

  • return_earray (bool, default=True) – Whether to return ENDArray with ensemble mask or plain numpy ndarray.

  • n_jobs (int, default=None) –

    Parallel jobs:

      1. model pre-training for each CV fold;

      1. score and cost evaluation for each fold and threshold configuration;

      1. and retraining on full data.

    -1 uses all processors. Defaults to one.

  • verbose (int, default=0) – Verbosity level for each stage of training.

best_thresholds_

Best selected thresholds.

Type:

list of float

all_cv_thresholds_

All generated threshold configurations.

Type:

ndarray, shape (n_configs, n_splits)

mean_cv_scores_

Average cross-validated classification scores.

Type:

ndarray, shape (n_configs,)

mean_cv_costs_

Average cross-validated computational costs.

Type:

ndarray, shape (n_configs,)

Examples

>>> from skfb.ensemble import ThresholdCascadeClassifierCV
>>> from sklearn.datasets import make_classification
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(
...     n_samples=300, n_features=100, n_redundant=95, class_sep=0.1,
...     random_state=0)
>>> cascading = ThresholdCascadeClassifierCV(
...     [LogisticRegression(l1_ratio=1.0, solver="liblinear", random_state=0),
...      RandomForestClassifier(random_state=0)],
...     costs=[1.0, 5.0],
...     cv_thresholds=5,
...     cv=3).fit(X, y)
>>> cascading.best_thresholds_

Notes

The Pareto front contains all non-dominated configurations: those where no other configuration achieves both strictly higher score AND strictly lower cost.

If you want a fallback option for the last estimator, consider rejectors from skfb.estimators.

Methods

decision_function(X)

Predicts decision scores using one or more base estimators.

fit(X, y[, sample_weight])

Fit estimators and identify Pareto-optimal threshold configurations.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predicts classes using one or more base estimators.

predict_log_proba(X)

Predicts log-probabilities using one or more base estimators.

predict_proba(X)

Predicts probabilities using one or more base estimators.

reset_estimators()

Reactivates all the base estimators.

score(X, y[, sample_weight])

Computes accuracy score on true labels and cascade predictions.

set_estimators(index)

Sets the estimators and thresholds to use for prediction and scoring.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_params(**params)

Sets the parameters of the cascade.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

decision_function(X)

Predicts decision scores using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_score – Decision scores predicted by the base estimators.

Return type:

ndarray of shape n_samples

fit(X, y, sample_weight=None)[source]

Fit estimators and identify Pareto-optimal threshold configurations.

Parameters:
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_outputs)) – Target class labels.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, samples are equally weighted.

Returns:

self – Fitted estimator. Use predict() and/or set_params() methods.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)

Predicts classes using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_pred – Classes predicted by the base estimators.

Return type:

ndarray of shape (n_samples,)

predict_log_proba(X)

Predicts log-probabilities using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_score – Log-probabilities predicted by the base estimators.

Return type:

ndarray of shape (n_samples, n_classes)

predict_proba(X)

Predicts probabilities using one or more base estimators.

Tries estimators in the order specified during initialization. If the first estimator doesn’t have a score higher or equal than the first threshold, switches to the second estimator, and so on. The last estimator always makes predictions if all the previous estimators deferred.

Parameters:

X (indexable, length n_samples) – Input samples to classify. Must fulfill the input assumptions of the underlying estimators.

Returns:

y_prob – Probabilities predicted by the base estimators.

Return type:

ndarray of shape (n_samples, n_classes)

reset_estimators()

Reactivates all the base estimators.

Same as set_estimators("all"). Use if you previously set to skip some estimators and thresholds, and want to activate all estimators again.

Returns:

self – Returns self.

Return type:

object

score(X, y, sample_weight=None)

Computes accuracy score on true labels and cascade predictions.

Parameters:
  • X (indexable, length n_samples) – Input samples to evaluate. Must fulfill the input assumptions of the underlying estimators.

  • y (array-like of shape (n_samples,)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.

Returns:

score – Accuracy score.

Return type:

float

set_estimators(index)

Sets the estimators and thresholds to use for prediction and scoring.

If a single index passed, the corresponding threshold is set to 0.0 or -np.inf depending on the response_method attribute.

By default, uses all trained estimators (available by the estimators_ thresholds attribute).

Parameters:

index (int, or slice, or "all", or array-like of int) –

Returns:

self – Returns self.

Return type:

object

Raises:

TypeError: – If index is of unsupported type or value.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ThresholdCascadeClassifierCV

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)[source]

Sets the parameters of the cascade.

If thresholds or new constraints are provided, the transformations are done accordingly, so there is no need to refit the cascade.

Parameters:

**params (dict) – Cascade parameters.

Returns:

self – Returns self.

Return type:

object

Raises:

ValueError – If all min_score, max_cost, and thresholds are passed.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ThresholdCascadeClassifierCV

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

Routing Classifiers

class skfb.ensemble.RoutingClassifier(estimators, router, costs=None, cv=None, return_earray=False, n_jobs=None, verbose=0)[source]

Defers input to the most appropriate classifier chosen through semantic routing.

Trains a pool of estimators and a router that learns to select the best estimator for each input based on a costs vector. The router is trained using cross-validated predictions from the estimators to determine which estimator is most appropriate for each input.

Parameters:
  • estimators (list of objects) – List of candidate estimators to choose from.

  • router (object) – Classifier used to route inputs to estimators.

  • costs (float or list of float, default=None) – List of costs associated with each estimator (positive, higher is more costly). If scalar, costs are uniform. If None, defaults to uniform 1.0.

  • cv (int, cross-validation generator or an iterable, default=None) – Cross-validation strategy for training estimators and router.

  • return_earray (bool, default=False) – Whether to return ENDArray of predicted classes or plain numpy ndarray. ENDArray tracks which estimator made each prediction.

  • n_jobs (int, default=None) – Number of jobs to run in parallel for cross-validation. If None, use 1.

  • verbose (int or bool, default=0) – Verbosity of parallel jobs.

router_

Router trained on estimators’ signals.

Type:

object

router_class_ratios_

Keys are estimator indices and values fraction of accepted samples.

Type:

dict, int -> float

Examples

>>> from skfb.ensemble import RoutingClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVC
>>> X, y = make_classification(
...     n_samples=300, n_features=100, n_redundant=90, class_sep=0.3,
...     random_state=0)
>>> maxent = LogisticRegression(random_state=0)
>>> nb = GaussianNB()
>>> svm = SVC(kernel="linear", probability=True, random_state=0)
>>> router = LogisticRegression(random_state=0)
>>> routing = RoutingClassifier(
...     estimators=[maxent, nb, svm],
...     router=router,
...     cv=3,
...     return_earray=True).fit(X, y)
>>> routing.router_class_ratios_
{np.int64(0): np.float64(0.05),
 np.int64(1): np.float64(0.8566666666666667),
 np.int64(2): np.float64(0.09333333333333334)}
>>> routing.predict(X[:5])
ENDArray([1, 0, 1, 0, 1])
>>> routing.set_params(return_earray=False).predict(X[:5])
array([1, 0, 1, 0, 1])
>>> routing.set_params(return_earray=True).predict(X).acceptance_rates
array([0.        , 0.97333333, 0.02666667])

Methods

decision_function(X)

Compute decision function for X.

fit(X, y[, sample_weight])

Trains estimators and router.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predicts class labels for samples in X.

predict_log_proba(X)

Predicts log class probabilities for X.

predict_proba(X)

Predicts class probabilities for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

decision_function(X)[source]

Compute decision function for X.

Parameters:

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Samples to evaluate.

Returns:

y_score – Decision function values.

Return type:

ndarray, shape (n_samples, n_classes) or (n_samples,)

fit(X, y, sample_weight=None)[source]

Trains estimators and router.

Steps: - Use cross-validated predictions from candidate estimators to

build routing targets (best estimator index per sample).

  • Train the router on full data and store it in self.router_ to predict chosen estimator index.

  • Fit all candidate estimators on full data and store them in self.estimators_ for inference.

Parameters:
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,)) – The target values.

  • sample_weight (array-like, shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted.

Returns:

self – Returns self.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]

Predicts class labels for samples in X.

Parameters:

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Samples to classify.

Returns:

y_pred – Predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_log_proba(X)[source]

Predicts log class probabilities for X.

Parameters:

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Samples to classify.

Returns:

y_log_proba – Log class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

predict_proba(X)[source]

Predicts class probabilities for X.

Parameters:

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Samples to classify.

Returns:

y_proba – Class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RoutingClassifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RoutingClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

Exceptions and Warnings

class skfb.ensemble.CascadeNotFittedWarning[source]

Raised if base estimators in cascade are not fitted or fitted incorrectly.

Attributes:
args

Methods

add_note

Exception.add_note(note) -- add a note to the exception

with_traceback

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.

class skfb.ensemble.CascadeParetoConfigWarning[source]

Raised if no Pareto configuration satisfies cost-performance constraints.

Attributes:
args

Methods

add_note

Exception.add_note(note) -- add a note to the exception

with_traceback

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.

class skfb.ensemble.CascadeParetoConfigException[source]

Raised if no Pareto configuration satisfies cost-performance constraints.

Attributes:
args

Methods

add_note

Exception.add_note(note) -- add a note to the exception

with_traceback

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.