naive bayes

it‘s all about conditional probability and combined probability.

bg2011082502
P(A|B)=P(AB) / P(B)
=> P(A|B)=P(A)P(B|A) / P(B)
=> P(A|B)=P(A)P(B|A) / (P(A)P(B|A)+P(A’)P(B|A’))

joint probability
assumption : events are independent .
E1 = p(s|w1)*p(s|w2)*p(s)
E2 = (1-p(s|w1))*(1-p(s|w2))*(1-p(s))

P(S|w1,w2) = P(S)P(S|w1)P(S|w2)   /  (  P(S)P(S|w1)P(S|w2) +  P(~S)P(~S|w1)P(~S|w2) )

 

bayesian inference

后验概率 = 先验概率 x 调整因子

 

usage example

1.spam mail filter

 

reference:

http://www.ruanyifeng.com/blog/2011/08/bayesian_inference_part_one.html

http://www.paulgraham.com/spam.html

hidden markov model

reference: http://www.52nlp.cn/category/hidden-markov-model

Nothing lasts forever,except for change. people are interested in  figuring out the pattern between changes.

what’s a pattern?

  1. deterministic  patterns (like traffic lights. everything is for sure,one state follow by the other state.easy to predict)
    traffic-lights_anim
  2. non-deterministic patterns(not for sure,conditional probability involves)
    weather-example-anim
    2.1 first order markov process
    two basic assumption
    1.齐次马尔科夫性假设,即假设隐藏的马尔科夫链在任意时刻t的状态只依赖于其前一时刻的状态,与其它时刻的状态及观测无关,也与时刻t无关;
    P(states[t] | states[t-1],observed[t-1],…,states[1],observed[1]) = P(states[t] | states[t-1]) t = 1,2,…,T

2.观测独立性假设,即假设任意时刻的观测只依赖于该时刻的马尔科夫链的状态,与其它观测和状态无关,
P(observed[t] | states[T],observed[T],…,states[1],observed[1]) = P(observed[t] | states[t]) t = 1,2,…,T

state transition matrix  (how state transit)weather-matrix
when we have a pattern to describe changes,we still need a initial state.pi vector  [(sun,1.0),(cloud,0.0),(rain,0.0)]

2.2 hidden markov model
sometimes the pattern we want is not descried sufficiently,we want to know more about hidden states by observing the observed states.(observable states ===>hidden states)

so we need:
#observed states
#hidden states
#pi vector(initial hidden states)
#state transition matrix  (the probability of the current states given the previous states)
#confusion matrix (the probability of the observed states given the hidden state)
for example,
1、隐藏状态 (天气):Sunny,Cloudy,Rainy;
2、观察状态(海藻湿度):Dry,Dryish,Damp,Soggy;
3、初始状态概率: Sunny(0.63), Cloudy(0.17), Rainy(0.20);
4、状态转移矩阵:
today weather
sunny
cloudy
rainy
yesterday weather
sunny
0.5
0.375
0.125
cloudy
0.25
0.125
0.625
rainy
0.25
0.375
0.375
5、混淆矩阵:
observed states
dry
dryish
damp
soggy
hidden states
sunny
0.6
0.2
0.15
0.05
cloudy
0.25
0.25
0.25
0.25
rainy
0.05
0.1
0.35
0.5

Application

1.evaluation
使用前向算法(forward algorithm)来计算给定隐马尔科夫模型(HMM)后的一个观察序列的概率,并因此选择最合适的隐马尔科夫模型(HMM) ====>  给定模型λ=(A,B,π) 和 观测序列 O=(o1,o2,…,oT) ,计算在模型λ 下观测序列O出现的概率P(O|λ)

2.decoding
已知模型λ=(A,B,π) 和观测序列O=(o1,o2,…,oT) ,求对给定观测序列条件概率P(S|O) 最大的状态序列I=(s1,s2,…,sT) ,即给定观测序列。求最有可能的对应的状态序列  Viterbi算法
3.learning
根据一个观察序列(来自于已知的集合),以及与其有关的一个隐藏状态集,估计一个最合适的隐马尔科夫模型(HMM),也就是确定对已知序列描述的最合适的(pi,A,B)三元组

mmseg

just for concept reveal

1.find all the three-word chunk with maximum length

2.four ambiguity resolution rules

reference: http://technology.chtsai.org/mmseg/

social media & social network & instant message

social media:
主打
1.傳統媒體(新聞)
2.kol(名人,網紅,大v)
3.話題

弱點:
弱關係(個體與個體之間互動好少,甚至互動會浸沒在巨大信息流中.需要共同點將兩個個體鏈接起來)
消息價值少

現況:
媒體用呢發放新聞,甚至一個account就是全部,直接在post中賣廣告
商戶維護品牌,導流到自己product
user給予監督,feedback
analytics賣數據

social network
主打
1.ugc
2.共同興趣(書籍,電影,音樂)
3.make new friends

弱點:
同樣是弱關係,但係有機會轉化為強關係,例如搞線下活動,增加彼此認識
消息價值比social media高,但消息相對較少.隨著時間流逝,價值會不斷fade out,用戶會流失.(要主動去幫用戶維繫關係,例如提醒用戶近況,最終目的是讓用戶主動去了解朋友近況,同時從一個興趣作為切入點,推薦新資訊,跳轉到新的興趣,最終增加用戶黏性)
消息太流於表面,太多,也會做成負面影響(需要沉澱信息,讓用戶感覺瀏覽是有價值,不是浪費時間)

現況:
導流到產品(自己商城或者其他企業)
衍生出IM
收費自媒體

instant message
主打
1.ugc
2.強關係,消息價值高

弱點:
沒有社交屬性,主動認識新朋友幾乎為0,但是可以考慮做類似公眾號個D social media
當人數增加,反而變成負擔,私密消息反而減少(什麼分組設計感覺好多餘,如果要考慮到咁仔細,還不如不發消息)

現況:
增加social media,賣廣告
圖片導流到商品

Edit Distance

sighhhhh,the question seems so clearly. why is so difficult to write down the correct answer.

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character

it’s obvious that f(n) bases on f(n-1)
also we have three options.insert,delete or replace.
so the minimum is among these three .
======>
f(n) = min(f(n-1,insert),f(n-1,delete),f(n-1,replace))

now we know about how the function transfers, it’s time to look for the base case.
f(0, k) = f(k, 0) = k
f(pos1,pos2) means distance required to convert word1[:pos1] to word2[:pos2]

edit_distance

like the above pic, in order to change ab(word1) to ab(word2),there are three ways.

ab(w1)->a(w2) (insert b)   

a(w1)->a(w2) (replace when the next are not equal)

a(w1)->ab(w2)(delete b)

reference: https://www.youtube.com/watch?v=We3YDTzNXEk

solutions:

a word puzzle

awesome!!
target