Principal Component Analysis (PCA)
Background
Principal Component Analysis finds the orthogonal directions ("principal components") along which standardized data varies the most. Projecting onto the top components is the workhorse linear dimensionality-reduction technique — for compression, denoising, and visualisation. The components are the eigenvectors of the data's covariance matrix, ordered by eigenvalue (the variance each explains).
Problem statement
Implement pca(data, k) that returns the top principal components of data. Standardize each feature, form the covariance matrix, eigendecompose it, and return the eigenvectors with the largest eigenvalues:
Sort by descending and return the matrix of the top- eigenvectors (as columns), rounded to 4 decimals.
Input
data—np.ndarrayof shape(n_samples, n_features).k—int: the number of principal components to return.
Output
Returns an np.ndarray of shape (n_features, k): the top- unit eigenvectors as columns, rounded to 4 decimals.
Examples
Example 1
Input: data = [[1, 2], [3, 4], [5, 6]], k = 1
Output: [[0.7071], [0.7071]]
Explanation: the two features are perfectly correlated, so after standardizing they are identical; all variance lies along the direction, whose unit vector is — the single principal component.
Constraints
- Standardize each feature with its mean and standard deviation before computing the covariance.
- Sort eigenvectors by descending eigenvalue and return the top as columns.
- Round the components to 4 decimals.
- Eigenvectors carry a sign ambiguity ( and are both valid); tests treat them as equivalent.
Notes
- The eigenvalues are the variance captured by each component; their ratios give the "explained variance" used to pick .
- Standardizing matters when features have different scales — otherwise the widest-range feature dominates the covariance and hijacks the first component.
This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.
- •Reference example: top PC is the [1,1]/sqrt(2) direction
- •Each principal component is a unit vector
- •Top-2 components are orthogonal
- •Returns the requested number of components, one row per feature