Artificial Intelligence Blog

Baseball, P vs NP, April Fools

April 10, 2013 in Complexity by hundalhh | Permalink

“I assume most of you already heard the news that a Caltech grad student, April Felsen, announced a 400-page proof of P≠NP last week.”

“You want me to justify the P vs. NP problem by its relevance to baseball?? Why shouldn’t baseball have to justify itself by its relevance to P vs. NP? Pshaw! Begone from the house of study, you cretinous fools, and never return!”

I think everyone should read any blog post that includes those two quotes. Especially this one.

“Poor Man’s Agile: Scrum in 5 Simple Steps”

April 8, 2013 in Programming by hundalhh | Permalink

In enjoyed reading Scott Porad‘s abbreviated post on Scrum.

“Resurgence in Neural Networks”

April 6, 2013 in Deep Belief Networks, Neural Nets by hundalhh | Permalink

T Jake Luciani wrote a nice, easy to read blog post on the recent developments in neural networks.

“Autoencoders, MDL, and Helmholtz Free Energy”

April 4, 2013 in Clustering, Deep Belief Networks, Information Theory, Neural Nets, Sparsity by hundalhh | Permalink

In “Autoencoders, MDL, and Helmholtz Free Energy“, Hinton and Zemel (2001) use Minimum Description Length as an objective function for formulating generative and recognition weights for an autoencoding neural net. They develop a stochastic Vector Quantization method very similar to mixture of Gaussians where each input vector is encoded with

$$E_i = – \log \pi_i – k \log t + {k\over2} \log 2 \pi \sigma^2 + {{d^2} \over{2\sigma^2}}$$

nats (1 nat = 1/log(2) bits = 1.44 bits) where $t$ is the quantization width, $d$ is the Mahalanobis distance to the mean of the Gaussian, $k$ is the dimension of the input space, $\pi_i$ is the weight of the $i$th Gaussian. They call this the “energy” of the code. Encoding only using this scheme wastes bits because, for example, there may be vectors that are equally distant from two Gaussian. The amount wasted is

$$H = -\sum p_i \log p_i$$

where $p_i$ the probability that the code will be assigned to the $i$th Gaussian. So the “true” expected description length is

$$F = \sum_i p_i E_i – H$$

which “has exactly the form of the Helmholtz free energy.” This free energy is minimized by setting

$$p_i = {{e^{-E_i}}\over{\sum_j e^{-E_j}}}.$$

In order to make computation practical, they recommend using a suboptimal distributions “as a Lyapunov function for learning” (see Neal and Hinton 1993). They apply their method to learn factorial codes.

A New Dalton Minimum ? (Off topic)

April 2, 2013 in Astronomy by hundalhh | 1 comment

Seemingly weak Solar Cycle #24

A long time ago I was a meteorologist at the Joint Typhoon Warning Center in Guam. One of my forecasters had a friend that used solar images to forecast wheat futures. I wonder what those people are predicting these days.

The Dalton minimum occurred between 1790 to 1830. If the sun does not perk up, solar cycle #24 will resemble the Dalton minimum solar cycles. (See also the Spörer minimum and the Maunder minimum.) For more information check out NOAA’s Solar Cycle Progression page.

Deriving the Gaussian Distribution from the Sterling Approximation and the Central Limit Theorem

March 30, 2013 in Math, Statistics by hundalhh | Permalink

You can find anything on the web. Earlier today, I noticed that the Sterling approximation and the Gaussian distribution both had a $\sqrt{2 \pi}$ in them, so I started thinking that maybe you could apply Sterling’s approximation to the binomial distribution with $p=1/2$ and large $n$ and the central limit theorem to derive the Gaussian distribution. After making a few calculations it seemed to be working and I thought, “Hey, maybe someone else has done this.” Sure enough, Jake Hoffman did it and posted it here. The internet is pretty cool.

“Generalized Thompson sampling for sequential decision-making and causal inference”

March 28, 2013 in Multi-Armed Bandit Problem, Reinforcement Learning by hundalhh | Permalink

In “Generalized Thompson sampling for sequential decision-making and causal inference“, Ortega and Braun (2013) give a short history of Thompson sampling (see [1], [2], [3], [4]) and report on the relationship between intelligent agents, evolutionary game theory, Bayesian inference, KL Divergence, and Thompson sampling. They develop a generalization of Thompson sampling based on a Bayesian prior over a distribution of environments. Some numerical results are provided. Here’s the abstract:

Recently, it has been shown how sampling actions from the predictive distribution over the optimal action—sometimes called Thompson sampling—can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.

“Machine Learning: A Love Story”

March 26, 2013 in General ML by hundalhh | Permalink

I enjoyed this simple entertaining video by Hilary Mason which is both a history of machine learning and an overview.

Hausdorff dimension

March 24, 2013 in Math by hundalhh | Permalink

I was reading MarkCC’s gripe about the misuse of the word dimension and it reminded me about the Hausdorff dimension. Generally speaking, the Hausdorff dimension matches our normal intuitive definition of dimension, i.e the H-dimension of a smooth curve or line is 1, the H-dimension of a smooth surface is 2, the H-dimension of the interior of a cube is 3. But for fractals, the H-dimension can be a non-integer. For the Cantor set, the H-dimension is 0.63. For more information, check out the Wikipedia article

http://en.wikipedia.org/wiki/Hausdorff_dimension

“Matrix Factorizations and the Grammar of Life”

March 22, 2013 in Abstraction for Learning, General ML, Languages, Sparsity, Statistics by hundalhh | Permalink

I’m quite excited by the Nuit Blanche post on the papers “Structure Discovery in Nonparametric Regression through Compositional Kernel Search” (Duvenaudy, Lloydy, Grossez, Tenenbaumz, Ghahramaniy 2013) and “Exploiting compositionality to explore a large space of model structures” (Grosse, Salakhutdinovm, Freeman, and Tenenbaum 2012). For years my old company Momentum Investment Services, Carl, and I have been looking for fast, systematic ways to search large hypothesis spaces. We considered context-free grammars as a means of generating hypothesis. Carl and I did not get anywhere with that research, but now it seems that others have succeeded. Be sure to look over the article, the blog posts, and the comments.

« Older entries § Newer entries »