Pointwise mutual information (PMI)Medium

Pointwise mutual information (PMI)

Background

Pointwise Mutual Information (PMI) measures how much more — or less — two events co-occur than if they were independent. In NLP it scores word associations: high PMI means two words appear together far more than chance, the basis of collocation detection and early count-based word embeddings (PPMI matrices).

Problem statement

Implement compute_pmi(joint_counts, total_counts_x, total_counts_y, total_samples) returning the PMI in bits:

PMI(x;y)=log2p(x,y)p(x)p(y)\text{PMI}(x; y) = \log_2 \frac{p(x, y)}{p(x)\,p(y)}

with p(x,y)=joint/Np(x, y) = \text{joint}/N, p(x)=countx/Np(x) = \text{count}_x/N, p(y)=county/Np(y) = \text{count}_y/N. Round to 3 decimals.

Input

  • joint_countsint: number of times x and y co-occur.
  • total_counts_xint: number of times x occurs.
  • total_counts_yint: number of times y occurs.
  • total_samplesint: total observations NN.

Output

Returns a float (PMI in bits, rounded to 3 decimals); -\infty when the joint count is 0.

Examples

Example 1

Input:  compute_pmi(50, 200, 300, 1000)
Output: -0.263

Explanation: p(x,y)=0.05p(x,y)=0.05 and p(x)p(y)=0.2×0.3=0.06p(x)p(y)=0.2\times0.3=0.06; log2(0.05/0.06)0.263\log_2(0.05/0.06) \approx -0.263, so x and y co-occur slightly less than chance.

Constraints

  • Probabilities are counts divided by total_samples.
  • PMI>0\text{PMI} > 0 means more-than-chance association, <0< 0 less, 00 independent.
  • Return -\infty if joint_counts is 0; otherwise round to 3 decimals.
  • Inputs are non-negative integers with joint_counts <= min(count_x, count_y).

Notes

  • PMI is unbounded below (rare co-occurrences go very negative); NLP often uses PPMI =max(PMI,0)= \max(\text{PMI}, 0) for an interpretable, sparse association matrix.
  • It is the per-event term whose expectation (weighted by p(x,y)p(x,y)) is the mutual information I(X;Y)I(X;Y).
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: -0.263
  • Independent events give PMI 0
  • Zero joint count -> -inf
  • Stronger-than-chance co-occurrence is positive