Data Mining Notes

Steps
1. Define the problem;
2. Gather the dataset;
3. Dataset cleaning:
    A. Following the problem, get ride of the unuseful data;
    B. Make up the missing data using the average method or others;
4. Analysis
Analysis Method
1. Feature Extraction Methods
Target: set the rank of the data in order to define which is more important to recognize the class
A. T-test
Target: using t to define the difference of the class. If t is big, the parameter is more useful to distinguish the class.
Steps
a.  Calculate the average (u) of Class 1 and Class 2
b. Using following formula to calculate the s 




c. Using following formula to calculate the t 

Example:
Class 1
Class 2
25
5
16
12
21
9
18
13
32
19
u1 = 1/5*(25+16+21+18+32)=22.4
u2 = 1/5*(5+12+9+13+19)=11.6

     = 40.3

      = 26.8

= 2.948

B. Principle Component Analysis (PCA)
Target: find out the direction efficient for representation.
Example: input (x1, x2) output (u1, u2)



C. Linear Discriminant Analysis (LDA)
Target: Find a linear subspace that maximizes class separability among the feature vector.
Example: input (x1,x2,class) output (u1, u2)
2. Classification
A. Prototype based methods: K-nearest Neighbour (KNN)

B. Boundary based methods: Multiple Layer Perception (MLP)

Comments

Popular posts from this blog

Nginx Proxy & Load Balance & LNMP

Snort+barnyard2+Snorby CentOS 6.5_64 Installation

ORACLE Error