what is machine learning

a program that can teach itself and improve over time.

can be used in:

  1. face detection
  2. speech recognition
  3. stock prediction

how to machine learn

  1. start with training data
  2. can achieve testing error of 0.4%

three canonical problems

regression

finds line of best fit. continus data

classifaction

we have lots of data, each dot is a new image. it’s scattered in space, doesnt matter.

unsupervised learning

clustering - clustering data together dimensionatly reduction - line of best fit but not a line, a curvey boi

we can build a vector of all features

nominal: no ordering among possible values (boolean) ordinal: possible values of the feature are totally ordered

maths topics

training instances are independent and identically (I.I.D.) - sampled independently from the same unknown distribution

there are also cases where this assumption does not hold

the primary purpose in supervised learning is to find a model that generalises

todo: read up on these: ecision trees • neural networks • support vector machines • Bayesian networks • ensembles of the above

odor = a: e (400)

400 is confidence

clustering - we want to make sure that we cluster into groups with lots of similarity but don’t connect them into groups that don’t have similarity.

Decision trees

what is a decision tree?

should i just submit my facebook chatbot, which uses k neartest neighbor?

finding the best split

key hyptohesis: the simplest tree that classifies the training instances accurately will work well on previously unseen instances.

occams razor

entities should not be multipled beyond necessity When you have two competing theories that make exactly the same predictions, the simpler one is the better.