“Detecting Adversarial Advertisements in the Wild”

David Andrzejewski at Bayes’ Cave wrote up a nice summary of practical machine learning advice from the KDD 2011 paper “Detecting Advesarial Advertisements in the Wild”.  I’ve quoted below several of the main points from David’s summary:

  • ABE: Always Be Ensemble-ing
  • Throw a ton of features at the model and let L1 sparsity figure it out
  • Map features with the “hashing trick
  • Handle the class imbalance problem with ranking
  • Use a cascade of classifiers
  • make sure the system “still works” as its inputs evolve over time
  • Make efficient use of expert effort
  • Allow humans to hard-code rules
  • periodically use non-expert evaluations to make sure the system is working