Welcome to scikit-fallbackโs documentation!๏
๐ฏ Build Adaptive Pipelines: Orchestrate Models with Selective Prediction!
scikit-fallback is a scikit-learn-compatible Python package for selective machine learning. It lets you orchestrate multiple classifiers with fallback strategies, routing uncertain or anomalous samples to specialized models, human experts, or fallback handlers. Perfect for enabling reliable and intelligent decisions in high-stakes domains.
Why Fallbacks? ๐ค๏
To fall back (on) means to retreat from making predictions, to rely on other tools for support. scikit-fallback flips the paradigm of blind and uncontrolled predictions and offers functionality to enhance your machine learning solutions with selectiveness and a reject option:
๐คทโโ๏ธ Reject ambiguous predictions and reduce costly misclassifications (confidence < threshold; classifier + outlier detector)
๐ง Wrap your pipelines with rejectors tidily instead of handcrafting rejections out-of-pipeline
๐งฎ Measure combined metrics to understand how successful your model in acceptance and rejection is
๐ Choose only appropriate models from ensembles for optimal performance-efficiency tradeoff
๐ Track model decisions to see which samples a model rejected / accepted
Real-world scenarios where this matters:
๐ณ Finance: Fraud model โก๏ธ detect ambiguous transaction โก๏ธ escalate for manual review
๐ค Dialogue: Intent classifier โก๏ธ prefer smaller specialist LLM โก๏ธ route to generate response
๐ฅ Medical: Disease detector โก๏ธ reject uncertain prediction โก๏ธ defer to human doctor
Key Features โจ๏
- Rejection:
Wrap any scikit-learn classifier with a reject option:
Confidence threshold rejection (abstain when uncertain)
Per-class thresholds
Custom rule-based logic
Anomaly detection for deferral
- Ensembling:
Combine multiple models intelligently:
Semantic routing (select best model for each sample)
Threshold cascades (model pipeline with early rejection)
Track which model made each prediction
- Metrics:
Evaluate abstention and classification performance as combined metrics:
Acceptance/rejection confusion matrices
Accept/reject accuracy decompositions
Ranking metrics with fallback support
Quick Start ๐๏
Use a rejector to grant your classifier a reject option:
>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from skfb.estimators import ThresholdFallbackClassifierCV
>>> X = np.array([[0, 0], [4, 4], [1, 1], [3, 3], [2.5, 2], [2., 2.5]])
>>> y = np.array([0, 1, 0, 1, 0, 1])
>>> # Train LogisticRegression and let it fallback based on confidence scores.
>>> rejector = ThresholdFallbackClassifierCV(
... estimator=LogisticRegression(random_state=0),
... thresholds=(0.5, 0.55, 0.6, 0.65),
... ambiguity_threshold=0.0,
... cv=2,
... fallback_label=-1,
... fallback_mode="store").fit(X, y)
>>> # If probability is lower than this, predict `fallback_label` = -1.
>>> rejector.threshold_
0.55
>>> # Make predictions and see which inputs were accepted or rejected.
>>> y_pred = rejector.predict(X)
>>> # If `fallback_mode` == `"store", always accept but also mask rejections.
>>> y_pred, y_pred.get_dense_fallback_mask()
(FBNDArray([0, 1, 0, 1, 1, 1]),
array([False, False, False, False, True, False]))
>>> # This allows calculation of combined metrics (e.g., predict-reject accuracy).
>>> rejector.score(X, y)
1.0
>>> # Otherwise, allow fallbacks
>>> rejector.set_params(fallback_mode="return").predict(X)
array([ 0, 1, 0, 1, -1, 1])
>>> # and calculate accuracy only on accepted samples,
>>> rejector.score(X, y)
1.0
>>> # or just switch off rejections and fallback to a plain LogisticRegression.
>>> rejector.set_params(fallback_mode="ignore").score(X, y)
0.8333333333333334
>>>
Or use a router for multi-stage model routing:
>>> from skfb.ensemble import ThresholdCascadeClassifierCV
>>> from sklearn.datasets import make_classification
>>> from sklearn.ensemble import HistGradientBoostingClassifier
>>> X, y = make_classification(
... n_samples=1_000, n_features=100, n_redundant=97, class_sep=0.1, flip_y=0.05,
... random_state=0)
>>> weak = HistGradientBoostingClassifier(max_iter=10, max_depth=2, random_state=0)
>>> okay = HistGradientBoostingClassifier(max_iter=20, max_depth=3, random_state=0)
>>> buff = HistGradientBoostingClassifier(max_iter=99, max_depth=4, random_state=0)
>>> # Train all models and learn thresholds per model s.t. if the current model's max
>>> # confidence score is lower, it defers the decision to the next in the cascade.
>>> cascading = ThresholdCascadeClassifierCV(
... estimators=[weak, okay, buff],
... costs=[1.1, 1.2, 1.99],
... cv_thresholds=5,
... cv=3,
... scoring="accuracy",
... return_earray=True,
... response_method="predict_proba").fit(X, y)
>>> # Best thresholds for `weak` and `okay`
>>> # (`buff` will always predict if `weak` and `okay` fall back):
>>> cascading.best_thresholds_
array([0.6125, 0.8375])
>>> # If `return_earray` is True, predictions will be of type `skfb.core.FBNDArray`,
>>> # which store `acceptance_rate` w/ the ratios of accepted inputs per model.
>>> cascading.predict(X).acceptance_rates
array([0.659, 0.003, 0.338])
And see API Reference for more information.
Documentation ๐๏
Learn More โ๏ธ๏
๐ Code: Follow Github Repository for implementations, discussions, and updates
๐ Full Guide: See API Reference for estimators, metrics, and ensemble strategies
๐ Blog Series: Check out the Kaggle and Medium tutorials for deeper dives
๐ป Examples: Browse Examples for rejection analysis, cascading, and other demos
Note
Status: v0.2.0 stable release with production-ready APIs. Active development underway!
Inspiration & References ๐๏
scikit-fallback builds on decades of research in selective classification and rejection. Some inspirations include: