# November 2013

You are currently browsing the monthly archive for November 2013.

## “Basic Concepts in Information Theory”

I have been wanting to write a post about using mutual information for feature selection, but I know that a few readers do not know information theory.  Some of my executive friends have forgotten all of their calculus, so I will probably first write a short executive introduction to information theory next week or find an appropriate introduction on the internet.  Earlier today while I was looking for a very basic introduction, I ran across the 111 page “Basic Concepts in Information Theory” by Marc Uro and the 3 page blog post “A Short, Simple Introduction to Information Theory” by Ryan Moulton which are both appropriate for undergraduate sophomores.

Marc’s book is great.  It covers all of the basic information theory ideas like entropy, cross-entropy, mutual information, linear codes, compression, and Shannon capacity.  Most of his explanations use only basic probability with a little summation notation and a matrix or two.  (His notation is carefully crafted, but sometimes a bit non-standard).  He avoids using calculus, so this book can be read by a bright freshman.

Ryan’s two page web post uses a little probability, logarithms, and summation notation but nothing else.

## The Big O writes on Scientific Error, Bias, and Self-(peer)-review.

I have slightly edited the post below by my friend the big O.

After all my complaining I have reached an actual argument about scientists or, better put, the science class.  They lack rigorous checks and balances.
To think that science can effectively check itself for error and bias just because they’re supposed to is analogous to thinking that the house of representatives can effectively check itself for error and bias just because they’re supposed to. But we know that the job of congressman frequently attracts a type of personality. We know there is an attractive glory to accomplishment in government as there is an attractive glory to accomplishing something in science.  We know that money provides an incentivizing role in even the noblest of endeavors.
And so in government we have branches whose explicit role is to check and balance one another.  And we have a press whose job is to check and seek error with the government itself. And we have a population who feel no embarrassment at checking everyone: we scrutinize the press, we critique the government in general, and we attack specific branches of government, all the while recognizing that government by the people (etc.) is prone to errors and biases specifically because it is a government run by people.
This is not the attitude we have with the class of people who conduct science.  The scientist class has reached such a rarefied status that it lacks equivalent checks and balances. We expect scientists to nobly check themselves.  But i argue they cannot because they’re people.  We need more rigorous outside skepticism than we currently have.
Science has the hardest arguments on earth to develop, prove, justify, and explain because the arguments of science are targeted at revealing something close to objective truth. there are more obstacles and unseen variables between scientific theory and proof than in any other field.  I think we would be better off to consider non-scientist (a “scientist” today being someone who is sanctioned by a university to be labeled as such) checking and balancing as part of – not apart from – the scientific process.
I like the idea of retaining a unembarrassed and reasoned skepticism of the “truth” offered by scientists – particularly in the weak sciences – and instead accept effectiveness (e.g. when science becomes technology) as truth.  When something – a theory or an experiment – works in our daily lives we can label that something as “true enough to be effective” and realize that as an auspicious label.  The rest should invite continued checking and balancing from both in and outside the scientific class.
$$\$$
And then in another email, I added this as a response to my critics:
The one thing I do not attack is the scientific method or reason.
I say the scientific method is not currently being employed to an extent that it could be and we’re worse off for it. As evidence of this I see the very frequent conflation of science – which is an effective process – with scientists – whom I consider to be as flawed and for the same reasons as any other profession.  (That npr study conforms with my personal experience that scientists my be more justifying of their bias than others).  This conflation leads to a citizenry reluctant to be skeptical of scientists and scientific work because they fear they are questioning science.  They are not. they are a part (apparently unknowingly) of science and their reluctance to analyze, question, investigate, criticize scientists and scientist’s work leads to a weakening of science.

The peer review you cite is precisely what I am labeling inadequate.  What if a congressman suggested his law was good because it had been peer reviewed?  What if he said he had special training as a lawyer or an economist or a historian?  Would that be a satisfying rational not to have separate gov’t branches, press, or citizenry actively challenge his work?  Scientific peer review is equivalent to gov’t peer review – it is necessary but less adequate than broadening that peer review to others currently only limitedly involved or allowed in the process.

Again:  re: the scientific method, I don’t believe that a science that includes only an academia certified science class does or even can adequately follow the scientific method to its fullest rigor anymore than congress can adequately run the government to its fullest rigor minus the critical analysis of outside agencies such as other branches of gov’t, the press, and citizenry.

## Simple Fact about Maximizing a Gaussian

Over the last few weeks, I’ve been working with some tricky features. Interestingly, I needed to add noise to the features to improve my classifier performance.  I will write a post on these “confounding features” later.  For now, let me just point out the following useful fact.

If

$$f(x, \sigma) = {1\over{\sigma \sqrt{2 \pi}}} \exp{\left(-{{x^2}\over{2 \sigma^2}}\right)},$$

then

$$\max_\sigma f(x,\sigma) = f(x, |x|).$$

So, if you have a Gaussian with mean zero and you want to fatten it to maximize the likelihood of the probability density function at $x$ without changing the mean, then set the standard deviation to $|x|$.