Pairwise cosine-similarity matrixEasy

Pairwise cosine-similarity matrix

Background

Cosine similarity measures the angle between two vectors, ignoring their magnitudes: +1+1 for the same direction, 00 for orthogonal, 1-1 for opposite. It is the similarity for embeddings — semantic search, RAG retrieval, recommendation, and clustering all rank candidates by cosine similarity to a query.

Problem statement

Implement cosine_similarity_matrix(A, B) returning the matrix S of cosine similarities between every row of A and every row of B:

Sij=AiBjAiBjS_{ij} = \frac{A_i \cdot B_j}{\lVert A_i\rVert\,\lVert B_j\rVert}

A zero vector has no direction, so its similarity to anything is defined as 00.

Input

  • Anp.ndarray of shape (n, d).
  • Bnp.ndarray of shape (m, d).

Output

Returns an np.ndarray of shape (n, m) where S[i, j] is the cosine similarity of A[i] and B[j].

Examples

Example 1

Input:  A = [[1, 0], [0, 1]], B = [[1, 1], [1, 0]]
Output: [[0.7071, 1.0], [0.7071, 0.0]]

Explanation: [1,0][1,0] vs [1,1][1,1] is cos45=0.7071\cos 45^\circ = 0.7071 and vs [1,0][1,0] is 11; [0,1][0,1] vs [1,1][1,1] is 0.70710.7071 and vs [1,0][1,0] is 00 (orthogonal).

Constraints

  • Normalise each row to unit length, then take the dot products (equivalently divide each AiBjA_i \cdot B_j by the two norms).
  • A zero-norm row contributes similarity 00 — do not divide by zero.
  • Output shape (n, m); values in [1,1][-1, 1]; tests compare with atol=1e-6.

Notes

  • Pre-normalising the rows turns the whole computation into one matrix product A^B^\hat{A}\hat{B}^\top — the efficient way to score a query against a large corpus of embeddings.
  • Cosine ignores magnitude, so it is robust to document length and embedding scale; Euclidean distance on normalised vectors is monotonic with cosine.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example
  • Self-similarity has a unit diagonal for nonzero rows
  • Opposite directions give -1, orthogonal give 0
  • Zero vector yields zero similarity (no NaN)