AdaBoost fit (decision stumps)
Background
AdaBoost turns a crowd of weak learners — here, one-level decision trees ("stumps") — into a strong classifier. It trains stumps in sequence; after each one it re-weights the training examples so the next stump focuses on those still misclassified, and assigns each stump a vote proportional to its accuracy. The final prediction is the sign of the weighted vote.
Problem statement
Implement adaboost_fit(X, y, n_clf) that fits n_clf decision stumps by AdaBoost and returns them. Start from uniform sample weights . Each round, pick the stump (feature, threshold, polarity) with the smallest weighted error , then update:
A stump predicts where and otherwise (flipped when polarity ). If a stump's weighted error exceeds , set polarity and use .
Input
X—np.ndarrayof shape(n_samples, n_features): numeric features.y—np.ndarrayof shape(n_samples,): labels in .n_clf—int: the number of stumps (boosting rounds).
Output
Returns a list of n_clf dicts, each a stump with keys feature_index (int), threshold (the split value), polarity (), and alpha (float, the stump's vote weight).
Examples
Example 1
Input: X = [[1, 2], [2, 3], [3, 4], [4, 5]], y = [1, 1, -1, -1], n_clf = 3
Output: clfs[0] = {'feature_index': 0, 'threshold': 3, 'polarity': -1, 'alpha': 11.5129}
Explanation: the stump "predict -1 when feature 0 >= 3" (polarity -1, threshold 3) separates the two classes with zero weighted error, so its vote is very large and all three rounds re-select the same perfect stump.
Constraints
- Initialise sample weights uniformly to .
- Candidate thresholds are the unique values of each feature; a sample predicts when .
- If the weighted error , set polarity and replace with .
- Apply the and weight-update formulas above, then renormalise the weights each round.
- Labels are (not ).
Notes
- The inside guards against division by zero when a stump is perfect (); then blows up and that stump dominates the vote.
- Re-weighting is the heart of boosting: misclassified points become heavier, forcing later stumps to attack the hard cases instead of re-solving the easy ones.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Returns n_clf stumps, each with the expected keys and valid polarity
- •First stump separates the toy data on feature 0 at threshold 3
- •Boosted ensemble classifies separable training data perfectly
- •A perfect weak learner gets a large positive alpha