EM Algorithm and its usage (Part 1)

Maximum likelihood estimate is a very powerful and convenient method of estimating parameters of a model. A detail of the same is given in an earlier article. This method is used in several statistical models to estimate the model parameters in an unbiased way. However, a key requirement of this method is to have complete information related to the data, e.g., the distribution to follow and other aspects related to the distribution. But, in real life cases, it is not unusual to have datasets for which some of the information are hidden and yet it is required to estimate parameters of a model to extract the pattern. A simple scenario is mentioned below.

Two coins problem:

Assume that there are two coins each of which is biased either towards head or tail. Let us assume that coin A is biased towards head and coin B is biased towards tail. At random, one coin is chosen and it is tossed for 10 times. This experiment is done 5 times. The below image shows how easy it is to calculate the level of biasness when all information are known, that is which coin was chosen and how many heads and tails are observed.

The estimates are nothing but Maximum Likelihood Estimates based on the available data and information. The same problem becomes quite complicated if one information is kept hidden, i.e., which coin was chosen. In such a situation, working with MLE is not advisable as MLE is bad at dealing with such situation. However, with Expectation Maximization (EM) algorithm, it is possible to estimate the parameters in an iterative way. As the name suggests, EM algorithm has an expectation step and a maximization step. The primary objective of the expectation step is calculating probabilities associated with the data points so that using the probabilities, an expression of likelihood can be found out. The likelihood function is then maximized by optimizing the parameters. The parameters then take part is recalculating the probabilities for each data point and the likelihood value is again maximized by optimizing the parameters. Thus, in an iterative manner the parameters keep updating until a breaking point (or threshold limit) is reached.

The problem of two coins can be solved with EM algorithm when the critical information is mission, i.e., which coin was chosen for tossing. For applying the algorithm, an initialization is needed which is mostly done in a random manner. Afterward, the iterative process commences. The process is shown in the diagram below.

In the above diagram, how probabilities are calculated is not mentioned explicitly. This is mentioned in the below diagram for the first data point where there were 5 heads and 5 tails. $\theta_A^{(0)}=0.6$ and $\theta_B^{(0)}=0.5$.

After 10 iterations, the values converged to 0.8 for coin A and 0.52 for coin B. Obviously, the number of experiments was less; hence, the iterative algorithm shows both coins as biased towards heads. The same experiment can be replicated with a Python routine. The code is given below:

import numpy as np
import scipy
import scipy

# Generate Data
'''
    Each coin is tossed 100 times and the experiment is done 120 times, 60 times for each coin
'''
coinA = np.random.binomial(n=1, p=0.8, size=(60,100)) # probability of head = 0.8
coinB = np.random.binomial(n=1, p=0.35, size=(60,100)) # probability of head = 0.35
data = np.vstack([coinA, coinB])

index = np.array(range(data.shape[0]))
np.random.shuffle(index)
data = data[index]

# Create frequency table for heads and tails counts
n_heads = data.sum(axis=1)
n_tails = data.shape[1] - n_heads
freq_table = np.hstack((n_heads.reshape(-1,1), n_tails.reshape(-1,1)))

# EM Algorithm starts
# --------------------

# Initialization
thetaA = 0.6
thetaB = 0.3
n_iter = 20

for it in range(n_iter):

    # E-Step starts:
    p_headsA = np.array([scipy.stats.binom.pmf(r, data.shape[1], thetaA) for r in freq_table[:,0]])
    p_headsB = np.array([scipy.stats.binom.pmf(r, data.shape[1], thetaB) for r in freq_table[:,0]])

    P_A = p_headsA/(p_headsA + p_headsB)
    P_B = p_headsB/(p_headsA + p_headsB)

    expected_count_A = P_A.reshape(-1,1) * freq_table
    expected_count_B = P_B.reshape(-1,1) * freq_table

    # E-Step ends

    # M-Step starts
    expected_total_A = expected_count_A.sum(axis=0)
    expected_total_B = expected_count_B.sum(axis=0)

    thetaA = expected_total_A[0]/ expected_total_A.sum()
    thetaB = expected_total_B[0]/ expected_total_B.sum()

    # M-Step ends

print(f"Probability of head in coin A: {thetaA}")
print(f"Probability of head in coin A: {thetaB}")

It will be seen that the values are reaching closer to the input parameters for creating the dataset. This is the EM algorithm in Python for solving the two coin problems.

I hope that the explanations given are understandable.

Comments are welcom.

Thursday, 19 December 2024

EM Algorithm and its usage (Part 1)

Two coins problem:

No comments:

Post a Comment

Let us understand Logistic Regression (Part 2)

Total Pageviews