Bernoulli Naive Bayes classifierMedium

Bernoulli Naive Bayes classifier

Background

Naive Bayes is a fast probabilistic classifier built on Bayes' rule plus a "naive" conditional-independence assumption: given the class, the features are independent. The Bernoulli variant models each feature as a binary present/absent event — the classic choice for text classification with binary bag-of-words features. Training is just counting; prediction picks the class with the highest log-posterior.

Problem statement

Implement the NaiveBayes class (Bernoulli, with Laplace smoothing α\alpha) exposing forward(X, y) to fit and predict(X) to classify. Fit log-priors and per-class feature probabilities, then predict the class that maximises the log-posterior:

logP(c)=logNcN,pc,f=(i:yi=cxi,f)+αNc+2α\log P(c) = \log\frac{N_c}{N}, \qquad p_{c,f} = \frac{\big(\sum_{i: y_i = c} x_{i,f}\big) + \alpha}{N_c + 2\alpha} y^=argmaxc    logP(c)+f[xflogpc,f+(1xf)log(1pc,f)]\hat{y} = \arg\max_{c}\;\; \log P(c) + \sum_{f}\Big[\, x_f \log p_{c,f} + (1 - x_f)\log(1 - p_{c,f}) \,\Big]

Input

  • forward(X, y): X is np.ndarray of shape (n_samples, n_features) with binary features in {0,1}\{0, 1\}; y is np.ndarray of integer class labels.
  • predict(X): X is np.ndarray of shape (m, n_features) of binary features.
  • The constructor takes smoothing (the Laplace α\alpha, default 1.0).

Output

  • forward fits the model in place (storing classes, log-priors, and log-likelihoods).
  • predict returns an np.ndarray of shape (m,) with one predicted class label per sample.

Examples

Example 1

Input:  X = [[1,0,1],[1,1,0],[0,0,1],[0,1,0],[1,1,1]], y = [1,1,0,0,1]
        model.forward(X, y); model.predict([[1, 0, 1]])
Output: [1]

Explanation: class 1's fitted feature probabilities make the pattern [1,0,1][1, 0, 1] more likely than under class 0, so its log-posterior is higher and the model predicts class 1.

Constraints

  • Work in log-space (log-priors and log-likelihoods) to avoid underflow from multiplying many probabilities.
  • Laplace smoothing: p=(count+α)/(Nc+2α)p = (\text{count} + \alpha) / (N_c + 2\alpha) — the 2α2\alpha denominator covers the two Bernoulli outcomes.
  • A present feature (xf=1x_f = 1) contributes logp\log p; an absent one contributes log(1p)\log(1 - p).
  • predict returns one label per input row.

Notes

  • The independence assumption is usually false, yet Naive Bayes is remarkably effective: the ranking of class posteriors is often right even when the absolute probabilities are off.
  • Smoothing is essential — without it, a feature never seen with a class gives probability 00, and log0=\log 0 = -\infty destroys the whole posterior.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example predicts class 1 for [1, 0, 1]
  • predict returns one label per input row
  • Classifies its own training data with high accuracy
  • Smoothing keeps posteriors finite for unseen patterns (no log-zero crash)