It is a good exercise to analyze how to Bayes Theorem to estimate the posterior probability for each class in a two class scenario.
The decision boundary can be identified as:
Replacing the class conditional probabilities with the gaussian distribution, we can derive that:
Deriving the logarithm of the class posteriors:
Since is independent of the class, it can be dropped, leaving us with:
Therefore, the classifier becomes . And the decision boundary becomes .
The function can be written in the general form:
where
This leaves us with a quadratic decision boundary:
What if the covariance matrix is non invertible?
In the case that one of the variances of a feature’s variance is 0: we can not write an inverse matrix for . So, instead we estimate a single covariance matrix for the entire dataset by averaging over the covariance matrices for all classes:
Since we share a single covariance matrix across all classes, the matrix becomes 0, turning , and making our decision boundary linear.
No covariance matrix 🥺
If you are unable to even estimate a single covariance matrix, one can simply assume that the variance of each feature is the same and are independent from each other. This results in a covariance matrix in the form of: