Introduction

In machine learning, Bayesian Linear Regression offers a probabilistic framework to understand the relationships between variables. Unlike its classical counterpart, Bayesian regression provides a distribution over possible values of parameters, capturing uncertainty robustly. This tutorial delves into the heart of Bayesian Linear Regression, exploring the posterior distribution of parameters, and verifying it using the technique of completing the square.

Setting the Stage: Bayesian Linear Regression Model

In Bayesian Linear Regression, we begin with two crucial expressions:

Likelihood Function
Prior Distribution

$$p(t|w) = \prod_{n=1}^{N} N(t_n | w^T \phi(x_n), \beta^{-1})$$

Here, t represents the observed target values, w denotes the parameter vector, \phi(x_n) is a vector of basis functions, beta is the precision of the Gaussian noise, and N signifies a normal Gaussian distribution. The prior distribution encapsulates our initial beliefs about the parameters ( w ), with a mean ( m_0 ) and covariance matrix ( S_0 ).

Unfolding the Log Posterior Distribution

The journey to the posterior distribution starts by applying Bayes' theorem to combine our likelihood and prior, leading to the expression of the log posterior distribution as follows:

$$\log p(w|t) = -\frac{\beta}{2} \sum_{n=1}^N (t_n - w^T \phi(x_n))^2 - \frac{1}{2}(w - m_0)^T S_0^{-1} (w - m_0) + \text{const}$$

Expanding the squared terms and rearranging them, we get to a point where we need to organize terms quadratic in w, linear in w, and terms independent of w.

Completing the Square: A Gateway to Simplification

Completing the square is a pivotal step that aids in simplifying the expression and recognizing the parameters of the posterior distribution. By rearranging and grouping terms, we form an expression ready to be morphed into a complete square form:

$$\log p(w|t) = -\frac{1}{2} w^T (\beta \Phi^T \Phi + S_0^{-1}) w + w^T (\beta \Phi^T t + S_0^{-1} m_0) + \text{const}$$

The magic unfolds when we apply the formula for completing the square, transforming the above expression into:

$$\log p(w|t) = -\frac{1}{2} (w - m_N)^T S_N^{-1} (w - m_N) + \text{const}$$

Here,

$$m_N = S_N (\beta \Phi^T t + S_0^{-1} m_0)$$

and

$$S_N^{-1} = \beta \Phi^T \Phi + S_0^{-1}$$

Deriving the Posterior Distribution Parameters

The exercise of completing the square has led us to discern the expressions for the mean ( m_N ) and the inverse covariance ( S_N^{-1} ) of the posterior distribution. The expressions are as follows:

$$m_N = S_N (S_0^{-1} m_0 + \beta \Phi^T t) ] [ S_N^{-1} = S_0^{-1} + \beta \Phi^T \Phi$$

These expressions confirm the structure of the posterior distribution of parameters in the Bayesian Linear Regression model.

Conclusion

This tutorial provided a step-by-step walkthrough of deriving the posterior distribution in Bayesian Linear Regression, highlighting the pivotal role of completing the square technique. Through this analytical journey, we've unveiled the probabilistic essence of Bayesian Linear Regression, providing a robust foundation for understanding and implementing this powerful approach in machine learning endeavours.

By thoroughly understanding the derivation process, practitioners can better appreciate the nuances of Bayesian inference, which is fundamental in advancing machine learning solutions with a solid statistical foundation.

Unveiling Bayesian Linear Regression: A Step-by-Step Guide to Posterior Distribution