Bernoulli Naive Bayes classifier
Background
Naive Bayes is a fast probabilistic classifier built on Bayes' rule plus a "naive" conditional-independence assumption: given the class, the features are independent. The Bernoulli variant models each feature as a binary present/absent event — the classic choice for text classification with binary bag-of-words features. Training is just counting; prediction picks the class with the highest log-posterior.
Problem statement
Implement the NaiveBayes class (Bernoulli, with Laplace smoothing ) exposing forward(X, y) to fit and predict(X) to classify. Fit log-priors and per-class feature probabilities, then predict the class that maximises the log-posterior:
Input
forward(X, y):Xisnp.ndarrayof shape(n_samples, n_features)with binary features in ;yisnp.ndarrayof integer class labels.predict(X):Xisnp.ndarrayof shape(m, n_features)of binary features.- The constructor takes
smoothing(the Laplace , default1.0).
Output
forwardfits the model in place (storing classes, log-priors, and log-likelihoods).predictreturns annp.ndarrayof shape(m,)with one predicted class label per sample.
Examples
Example 1
Input: X = [[1,0,1],[1,1,0],[0,0,1],[0,1,0],[1,1,1]], y = [1,1,0,0,1]
model.forward(X, y); model.predict([[1, 0, 1]])
Output: [1]
Explanation: class 1's fitted feature probabilities make the pattern more likely than under class 0, so its log-posterior is higher and the model predicts class 1.
Constraints
- Work in log-space (log-priors and log-likelihoods) to avoid underflow from multiplying many probabilities.
- Laplace smoothing: — the denominator covers the two Bernoulli outcomes.
- A present feature () contributes ; an absent one contributes .
predictreturns one label per input row.
Notes
- The independence assumption is usually false, yet Naive Bayes is remarkably effective: the ranking of class posteriors is often right even when the absolute probabilities are off.
- Smoothing is essential — without it, a feature never seen with a class gives probability , and destroys the whole posterior.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example predicts class 1 for [1, 0, 1]
- •predict returns one label per input row
- •Classifies its own training data with high accuracy
- •Smoothing keeps posteriors finite for unseen patterns (no log-zero crash)