Feature selection methods aid you in your mission to create an accurate predictive model. They help you by choosing features that will give you as good or better accuracy whilst requiring less data. Feature selection methods can be used to identify and remove unneeded, irrelevant and redundant attributes from data that do not contribute to the accuracy of a predictive model or may in fact decrease the accuracy of the model. Fewer attributes is desirable because it reduces the complexity of the model, and a simpler model is simpler to understand and explain.
Information gain looks at each feature in isolation, computes its information gain and measures how important and relevant it is to the class label (alert type). Computing the information gain for a feature involves computing the entropy of the class label (alert type) for the entire dataset and subtracting the conditional entropies for each possible value of that feature. The entropy calculation requires a frequency count of the class label by feature value. In more details, all instances (alerts) are selected with some feature value v, then the number of occurrences of each class within those instances are counted, and the entropy for v is computed. This step is repeated for each possible value v of the feature. The entropy of a subset can be computed more easily by constructing a count matrix, which tallies the class membership of the training examples by feature value.
Another approach is to reduce the dimensionality of the feature space and poke around in this reduced feature space. A basic technique well-suited for this problem is the Principal Component Analysis which tries to find the directions of most variation in the data set.
 Age |
 Duration |
 Campaign |
 consumer confidence index |
 consumer price index |
 employment variation rate |
 employees |
 euribor3m |
 Lifetime Post Impressions by people who have liked your Page |
 pdays |