Principal Component Analysis (PCA)Medium

Principal Component Analysis (PCA)

Background

Principal Component Analysis finds the orthogonal directions ("principal components") along which standardized data varies the most. Projecting onto the top kk components is the workhorse linear dimensionality-reduction technique — for compression, denoising, and visualisation. The components are the eigenvectors of the data's covariance matrix, ordered by eigenvalue (the variance each explains).

Problem statement

Implement pca(data, k) that returns the top kk principal components of data. Standardize each feature, form the covariance matrix, eigendecompose it, and return the kk eigenvectors with the largest eigenvalues:

Z=Xμσ,C=cov(Z),Cvi=λiviZ = \frac{X - \mu}{\sigma}, \qquad C = \operatorname{cov}(Z), \qquad C v_i = \lambda_i v_i

Sort by descending λi\lambda_i and return the matrix of the top-kk eigenvectors (as columns), rounded to 4 decimals.

Input

  • datanp.ndarray of shape (n_samples, n_features).
  • kint: the number of principal components to return.

Output

Returns an np.ndarray of shape (n_features, k): the top-kk unit eigenvectors as columns, rounded to 4 decimals.

Examples

Example 1

Input:  data = [[1, 2], [3, 4], [5, 6]], k = 1
Output: [[0.7071], [0.7071]]

Explanation: the two features are perfectly correlated, so after standardizing they are identical; all variance lies along the [1,1][1, 1] direction, whose unit vector is [0.7071,0.7071][0.7071, 0.7071] — the single principal component.

Constraints

  • Standardize each feature with its mean and standard deviation before computing the covariance.
  • Sort eigenvectors by descending eigenvalue and return the top kk as columns.
  • Round the components to 4 decimals.
  • Eigenvectors carry a sign ambiguity (vv and v-v are both valid); tests treat them as equivalent.

Notes

  • The eigenvalues are the variance captured by each component; their ratios give the "explained variance" used to pick kk.
  • Standardizing matters when features have different scales — otherwise the widest-range feature dominates the covariance and hijacks the first component.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Reference example: top PC is the [1,1]/sqrt(2) direction
  • Each principal component is a unit vector
  • Top-2 components are orthogonal
  • Returns the requested number of components, one row per feature