I have been wanting to write a post about using mutual information for feature selection, but I know that a few readers do not know information theory. Some of my executive friends have forgotten all of their calculus, so I will probably first write a short executive introduction to information theory next week or find an appropriate introduction on the internet. Earlier today while I was looking for a very basic introduction, I ran across the 111 page “Basic Concepts in Information Theory” by Marc Uro and the 3 page blog post “A Short, Simple Introduction to Information Theory” by Ryan Moulton which are both appropriate for undergraduate sophomores.
Marc’s book is great. It covers all of the basic information theory ideas like entropy, cross-entropy, mutual information, linear codes, compression, and Shannon capacity. Most of his explanations use only basic probability with a little summation notation and a matrix or two. (His notation is carefully crafted, but sometimes a bit non-standard). He avoids using calculus, so this book can be read by a bright freshman.
Ryan’s two page web post uses a little probability, logarithms, and summation notation but nothing else.