David Andrzejewski at Bayes’ Cave wrote up a nice summary of practical machine learning advice from the KDD 2011 paper “Detecting Advesarial Advertisements in the Wild”. I’ve quoted below several of the main points from David’s summary:
- ABE: Always Be Ensemble-ing
- Throw a ton of features at the model and let L1 sparsity figure it out
- Map features with the “hashing trick“
- Handle the class imbalance problem with ranking
- Use a cascade of classifiers
- make sure the system “still works” as its inputs evolve over time
- Make efficient use of expert effort
- Allow humans to hard-code rules
- periodically use non-expert evaluations to make sure the system is working