The probability density function
p = f(x, θ) tells you a probability of occurrence
of a random value near
x. Likelihood is a reverse operation of estimating unknown paramter
θ from the same equation using
Lead by example: pick the proper Gaussian¶
Probability of observations
Observed sample is considered the most likely one
Maximisation of probability allows to compute distribution parameters
Consider an experiment where we draw two random indepenent values
(x₁, x₂) from the same distribution, e.g. the weight of two mice in grams
(30, 50). There is some prior knowledge given to you that the distribution is normal with a center around
μ (average mouse weight) and standard deviation of
x ~ N(μ, σ²).
2. Probability of observations.¶
What was the probability of encountering
(x₁, x₂) = (30, 50)? It is the product of individual event probabilities
L = f(x₁) · f(x₂). This value itself depends on unknown
σ, so can be written as
L(μ, σ) = f(x₁ | μ, σ) · f(x₂ | μ, σ), where
f is probability density function.
3. Observed sample is considered the most likely one.¶
It is prudent to assume that observed
(x₁, x₂) = (30, 50) was the realisation of the most probable possible event. This way we take good use of available scarce information and make a better guess. If we decide we just saw an extreme event, we will be systematically wrong on this decision (a tourist sees a working fountain in town provides extra intuition).
4. Maximisation of probability allows to compute distribution parameters.¶
So, upon a fact of observation of
(x₁, x₂) we tend to believe this has to be an event with
maximum probability (likelihood) of happening. From this assumption we can find
L. Sometimes it is possible to do it analytically, as with normal
distributions, sometimes we have to search for solution numerically (hoping it is unique).
Generalisation of example above¶
We usualy denote a set of parameters like
θ, a vector of parameters.
Our task is to estimate parameter
a sample of observations of а random variable
X = (x₁, x₂, ..., xₙ), and
a pre-defined probability density function
X = (x₁, x₂, ..., xₙ)
Make a decision about which probability density function
f(x, θ)is appropriate for this data
Compose joint probability of observations as a function of
L(θ) = f(x₁, θ)·f(x₂, θ)· ...·f(xₙ, θ).
Come to terms with a principle “if we really did observe this event, it was the most probable outcome of all possible events given this distribution”
θmaiximises joint probability of observations
Python code below below relies on
to find parameters of a normal distribution based on two measurements
of mice weights rom example above. It can be easily applied to more observations.
""" Maximum likelihood with two mice. """ import numpy as np from scipy.optimize import minimize def dnorm(x, mu=0, sigma=1): """Normal distribution probability density fucntion.""" const = 1 / (sigma * np.sqrt(2 * np.pi)) power = - (x - mu)**2 / (2 * sigma**2) return const * np.exp(power) def log_likelihood(observed_x): """Sum of logs of probability densities at *observed_x*. Return: function of mu and sigma """ def foo(mu, sigma): logs = [np.log(dnorm(x, mu, sigma)) for x in observed_x] return sum(logs) return foo def maximise(f, start_mu, start_sigma): """Return mu and lambda, which maximise *f*.""" f = lambda p: -1 * l_func(mu=p, sigma=p) res = minimize(f, x0=[start_mu, start_sigma]) return res.x, res.x # two mice weigths are given, similar to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6143748/ events = [30, 50] # construct likelihood as a function of unknown mu and sigma l_func = log_likelihood(events) # run maximisation procedure # attention: need a sensible pick for start variables, eg (0, 1) will fail estimated_mu, estimated_sd = maximise(l_func, start_mu=30, start_sigma=3) # test outcomes #estimated_mu is 39.99999527669165 assert np.isclose(estimated_mu, 40) #estimated_sd is 9.999976480910071 assert np.isclose(estimated_sd, 10) # Result: observed values [30, 50] were most likely coming from # normal distribution with parameters μ=40 and σ=10.