**definition**

a computer program is said to learn from experience E with respect to task T and some performance measure P, if its performance on T, as measured by P , improves with experience E.

**type**

1. supervised learning

1.1 classification (mapping to label, discrete)

1.2 regression (mapping to continuous number)

2.unsupervised learning (cluster data)

**supervised learning workflow**

(from coursera)

**how to measure the accuracy of the hypothesis (linear)**

#linear regression cost function

find the most probable theta to minimize the cost function.when the cost function equal 0 means all the data plot lies in the line.

**how to find the probable theta to minimize residual**

#gradient descent

why gradient descent works

repeat until convergence (simultaneous update all the theta) {

}

where i = {0,1}

#gradient descent for linear regression

repeat until convergence {

}

detail:

https://math.stackexchange.com/q/1695446