Gaussian process regression (RBF)Hard

Gaussian process regression (RBF)

Background

A Gaussian Process is a distribution over functions: rather than fitting parameters, it places a prior over functions through a kernel (covariance) and conditions on observed data to obtain a posterior. GP regression predicts, at any test input, a Gaussian with a mean and a variance — giving not just a value but calibrated uncertainty. It is a go-to model for small-data regression and Bayesian optimisation.

Problem statement

Implement gaussian_process_predict(X_train, y_train, X_test, length_scale, sigma, noise) that returns the GP posterior mean and standard deviation at the test points, using the RBF (squared-exponential) kernel:

k(x,x)=σ2exp ⁣(xx222)k(x, x') = \sigma^2 \exp\!\Big(-\frac{\lVert x - x'\rVert^2}{2\,\ell^2}\Big)

With K=k(X,X)+noiseIK = k(X, X) + \text{noise}\cdot I,   K=k(X,X)\;K_* = k(X, X_*),   K=k(X,X)\;K_{**} = k(X_*, X_*):

μ=KK1y,Σ=KKK1K\mu_* = K_*^\top K^{-1} y, \qquad \Sigma_* = K_{**} - K_*^\top K^{-1} K_*

Return μ\mu_* and the per-point standard deviations diag(Σ)\sqrt{\operatorname{diag}(\Sigma_*)}.

Input

  • X_trainnp.ndarray (n, d): training inputs.
  • y_trainnp.ndarray (n,): training targets.
  • X_testnp.ndarray (m, d): test inputs.
  • length_scalefloat: RBF length scale \ell.
  • sigmafloat: kernel output scale.
  • noisefloat: observation-noise variance added to the diagonal of KK.

Output

Returns a tuple (mu, std): mu is np.ndarray (m,) of posterior means and std is np.ndarray (m,) of posterior standard deviations.

Examples

Example 1

Input:  X_train=[[0],[2],[4]], y_train=[0,1,0], X_test=[[0],[100]],
        length_scale=1.0, sigma=1.0, noise=1e-8
Output: mu ~= [0.0, 0.0], std ~= [~0.0, 1.0]

Explanation: at x=0x=0 (a training point) the mean equals the observed y=0y=0 with near-zero uncertainty; at x=100x=100, far from all data, the RBF covariance vanishes, so the prediction reverts to the prior mean 00 with the prior std σ=1\sigma = 1.

Constraints

  • Use the RBF kernel; add noise to the diagonal of the train–train covariance before inverting.
  • Posterior mean is μ=KK1y\mu_* = K_*^\top K^{-1} y; the variance is the diagonal of KKK1KK_{**} - K_*^\top K^{-1} K_*.
  • Return per-point standard deviations (square root of the variance, clipped at 00).
  • Pure numpy (np.linalg.inv / np.linalg.solve); tests compare with atol=1e-3.

Notes

  • As noise -> 0 the GP interpolates the training data exactly (the mean passes through every observation); larger noise smooths the fit.
  • The length scale \ell controls wiggliness: small \ell gives fast-varying functions, large \ell gives smooth, slowly varying ones.
Python
Loading...

This problem ships 4 hidden tests. They run in your browser via Pyodide — no backend, no submission queue. Press ▶ Run tests to execute.

  • Posterior mean interpolates the training targets at training inputs
  • Far from the data, mean reverts to 0 and std to the prior sigma
  • Returns mean and std, one entry per test point
  • Predictive standard deviation is non-negative