“An Estimate of an Upper Bound for the Entropy of English”

In the short, well written paper “An Estimate of an Upper Bound for the Entropy of English“, Brown, Stephan Della Pietra, Mercer, Vincent Della Pietra, and Lai (1992) give an estimated upper bound for English of 1.75 bits per character.  That estimate was somewhat lower than Shannon’s original upper bound of 2.3 bits per character. Along the way they give nice simple explanations of entropy and cross-entropy as applied to text.  More recently Montemurro and Zanette (2011) showed the entropy of all languages is around 3.5 bits per word. (see Wired Article and Plos One)