hidden markov model (decoding)

application:
語音識別
分詞
1、隐藏状态 (天气):Sunny,Cloudy,Rainy;
2、观察状态(海藻湿度):Dry,Dryish,Damp,Soggy;
3、初始状态概率: Sunny(0.63), Cloudy(0.17), Rainy(0.20);
4、状态转移矩阵:
today weather
sunny
cloudy
rainy
yesterday weather
sunny
0.5
0.375
0.125
cloudy
0.25
0.125
0.625
rainy
0.25
0.375
0.375
5、混淆矩阵:
observed states
dry
dryish
damp
soggy
hidden states
sunny
0.6
0.2
0.15
0.05
cloudy
0.25
0.25
0.25
0.25
rainy
0.05
0.1
0.35
0.5

Finding most probable sequence of hidden states
brute force
p(hidden state|(dry,damp,soggy)) = max(p(dry|sunny,damp|sunny,soggy|sunny),p(dry|sunny,damp|sunny,soggy|cloudy),p(dry|sunny,damp|sunny,soggy|rainy)…… )

viterbi (dynamic programming)
calculate the partial best path based on each phase,  and record the partial best path we walk through
phase 1(dry):
p(sunny) = initial hidden state * p(dry|sunny)=0.63 *0.6 = 0.378
p(rainy) =  initial hidden state * p(dry|rainy) = 0.17 * 0.05 = 0.0425
p(cloudy) = initial hidden state * p(dry|cloudy) = 0.2 * 0.25 =0.01
phase 2(damp):
p(sunny) = max(p(previous phase sunny)*p(sunny|sunny),p(previous phase rainy)*p(sunny|rainy),p(previous phase cloudy)*p(sunny|cloudy)) *p(damp|sunny)=max(0.378*0.5,0.0425*0.25,0.01*0.25)*0.15=0.378*0.5*0.15 = 0.02835   prev-state : sunny
phase 3(soggy):
same as phase 2

hidden markov model (evaluation)

application:
檢測已知模型是否準確
1、隐藏状态 (天气):Sunny,Cloudy,Rainy;
2、观察状态(海藻湿度):Dry,Dryish,Damp,Soggy;
3、初始状态概率: Sunny(0.63), Cloudy(0.17), Rainy(0.20);
4、状态转移矩阵:
today weather
sunny
cloudy
rainy
yesterday weather
sunny
0.5
0.375
0.125
cloudy
0.25
0.125
0.625
rainy
0.25
0.375
0.375
5、混淆矩阵:
observed states
dry
dryish
damp
soggy
hidden states
sunny
0.6
0.2
0.15
0.05
cloudy
0.25
0.25
0.25
0.25
rainy
0.05
0.1
0.35
0.5

calculate the probability of observed state(dry,damp,soggy) based on the hidden model pattern we using

sunny
sunny
sunny
cloudy
cloudy
cloudy
rainy
rainy
rainy
dry
damp
soggy
1.brute force
p((dry,damp,soggy)|hmm) = p(dry|sunny,damp|sunny,soggy|sunny)+p(dry|sunny,damp|sunny,soggy|cloudy)+p(dry|sunny,damp|sunny,soggy|rainy)+….
2.forward (dynamic programming)
calculate probabilities in  each phase based on hidden states transition, sum up as a new hidden state probability
phase 1:
p(sunny) = initial hidden state * p(dry|sunny)=0.63 *0.6 = 0.378
p(rainy) =  initial hidden state * p(dry|rainy) = 0.17 * 0.05 = 0.0425
p(cloudy) = initial hidden state * p(dry|cloudy) = 0.2 * 0.25 =0.01
phase 2:
p(sunny) = (p(previous phase sunny) * p(sunny|sunny)+p(previous  phase rainy) * p(sunny|rainy)+p(previous phase cloudy) * p(sunny|cloudy) ) * p(damp|sunny) = (0.378*0.5+0.0425*0.25+0.01*0.25)*0.15 = 0.30319
phase 3:
same calculation as phase 2

jieba

cut(cut_all=false,HMM=true)
1. sentence (split by space or punctuation)
2.load dictionary
3.generate directed acyclic graph
0: [0] 1: [1,2,4] 2: [2] 3: [3,4] 4: [4] 5: [5]

4.find the max weight path bases on word frequency(iterate DAG and find the max weight path of each node)
5.if not a single word,return the word. otherwise store into buffer and finalseg (HMM)