EM Algorithm and its usage (Part 1)
Maximum likelihood estimate is a very powerful and convenient method of estimating parameters of a model. A detail of the same is given in an earlier article. This method is used in several statistical models to estimate the model parameters in an unbiased way. However, a key requirement of this method is to have complete information related to the data, e.g., the distribution to follow and other aspects related to the distribution. But, in real life cases, it is not unusual to have datasets for which some of the information are hidden and yet it is required to estimate parameters of a model to extract the pattern. A simple scenario is mentioned below.
Two coins problem:
Assume that there are two coins each of which is biased either towards head or tail. Let us assume that coin A is biased towards head and coin B is biased towards tail. At random, one coin is chosen and it is tossed for 10 times. This experiment is done 5 times. The below image shows how easy it is to calculate the level of biasness when all information are known, that is which coin was chosen and how many heads and tails are observed.
The estimates are nothing but Maximum Likelihood Estimates based on the available data and information. The same problem becomes quite complicated if one information is kept hidden, i.e., which coin was chosen. In such a situation, working with MLE is not advisable as MLE is bad at dealing with such situation. However, with Expectation Maximization (EM) algorithm, it is possible to estimate the parameters in an iterative way. As the name suggests, EM algorithm has an expectation step and a maximization step. The primary objective of the expectation step is calculating probabilities associated with the data points so that using the probabilities, an expression of likelihood can be found out. The likelihood function is then maximized by optimizing the parameters. The parameters then take part is recalculating the probabilities for each data point and the likelihood value is again maximized by optimizing the parameters. Thus, in an iterative manner the parameters keep updating until a breaking point (or threshold limit) is reached.
The problem of two coins can be solved with EM algorithm when the critical information is mission, i.e., which coin was chosen for tossing. For applying the algorithm, an initialization is needed which is mostly done in a random manner. Afterward, the iterative process commences. The process is shown in the diagram below.
import numpy as np import scipy import scipy # Generate Data ''' Each coin is tossed 100 times and the experiment is done 120 times, 60 times for each coin ''' coinA = np.random.binomial(n=1, p=0.8, size=(60,100)) # probability of head = 0.8 coinB = np.random.binomial(n=1, p=0.35, size=(60,100)) # probability of head = 0.35 data = np.vstack([coinA, coinB]) index = np.array(range(data.shape[0])) np.random.shuffle(index) data = data[index] # Create frequency table for heads and tails counts n_heads = data.sum(axis=1) n_tails = data.shape[1] - n_heads freq_table = np.hstack((n_heads.reshape(-1,1), n_tails.reshape(-1,1))) # EM Algorithm starts # -------------------- # Initialization thetaA = 0.6 thetaB = 0.3 n_iter = 20 for it in range(n_iter): # E-Step starts: p_headsA = np.array([scipy.stats.binom.pmf(r, data.shape[1], thetaA) for r in freq_table[:,0]]) p_headsB = np.array([scipy.stats.binom.pmf(r, data.shape[1], thetaB) for r in freq_table[:,0]]) P_A = p_headsA/(p_headsA + p_headsB) P_B = p_headsB/(p_headsA + p_headsB) expected_count_A = P_A.reshape(-1,1) * freq_table expected_count_B = P_B.reshape(-1,1) * freq_table # E-Step ends # M-Step starts expected_total_A = expected_count_A.sum(axis=0) expected_total_B = expected_count_B.sum(axis=0) thetaA = expected_total_A[0]/ expected_total_A.sum() thetaB = expected_total_B[0]/ expected_total_B.sum() # M-Step ends print(f"Probability of head in coin A: {thetaA}") print(f"Probability of head in coin A: {thetaB}")
No comments:
Post a Comment