Two Class Quadratic Classifier

It is a good exercise to analyze how to Bayes Theorem to estimate the posterior probability for each class in a two class scenario.

The decision boundary can be identified as:

f (x) = lo g \frac{p ( y _{1} ∣ x )}{p ( y _{2} ∣ x )} = lo g 1

f (x) = lo g (y_{1} ∣ x) - lo g p (y_{2} ∣ x) = 0

Replacing the class conditional probabilities with the gaussian distribution, we can derive that:

p (y ∣ x) = \frac{p ( x ∣ y ) p ( y )}{p ( x )}

p (x ∣ y) = \frac{1}{( 2 π ) ^{p} d e t ( Σ _{y} )} e x p (- \frac{1}{2} (x - μ_{y}) Σ_{y}^{- 1} (x - μ_{y}))

Deriving the logarithm of the class posteriors:

lo g (p (y ∣ x) = \frac{- p}{2} 2 π - \frac{1}{2} d e t (Σ_{y}) - \frac{1}{2} (x - μ_{y}) Σ_{y}^{- 1} (x - μ_{y}) + lo g p (y) - lo g p (x)

Since $p (x)$ is independent of the class, it can be dropped, leaving us with:

g_{i} (x) = \frac{- p}{2} 2 π - \frac{1}{2} d e t (Σ_{y_{i}}) - \frac{1}{2} (x - μ_{y_{i}}) Σ_{y_{i}}^{- 1} (x - μ_{y_{i}}) + lo g p (y_{i})

Therefore, the classifier becomes $g_{i} (x) > g_{j} (x)$ . And the decision boundary becomes $f (x) = g_{2} (x) - g_{1} (x) = 0$ .

The function $f (x)$ can be written in the general form:

f (x) = x^{T} Wx + w^{T} x + w_{0}

where

W = \frac{1}{2} (Σ_{2}^{- 1} - Σ_{1}^{- 1})

W = μ_{1}^{T} Σ_{1}^{- 1} - μ_{2}^{T} Σ_{2}^{- 1}

w_{0} = - \frac{1}{2} lo g det Σ_{1} - \frac{1}{2} μ_{1}^{T} Σ_{1}^{- 1} μ_{1} + lo g p (y_{1}) + \frac{1}{2} lo g det Σ_{2} + \frac{1}{2} μ_{2}^{T} Σ_{2}^{- 1} μ_{2} - lo g p (y_{2})

This leaves us with a quadratic decision boundary:

What if the covariance matrix is non invertible?

In the case that one of the variances of a feature’s variance is 0: we can not write an inverse matrix for $Σ$ . So, instead we estimate a single covariance matrix for the entire dataset by averaging over the covariance matrices for all classes:

^= \frac{1}{N} Σ_{i = 1}^{N} Σ_{i}

Since we share a single covariance matrix across all classes, the matrix $W$ becomes 0, turning $f (x) = w^{T} x + w_{0}$ , and making our decision boundary linear.

No covariance matrix 🥺

If you are unable to even estimate a single covariance matrix, one can simply assume that the variance of each feature is the same and are independent from each other. This results in a covariance matrix in the form of: $\hat{Σ} = σ^{2} I$

The Yiki

Explorer

Two Class Quadratic Classifier

What if the covariance matrix is non invertible?

No covariance matrix 🥺

Graph View

Table of Contents

Backlinks