It looks like Canadian Professor and Director, Centre for Theoretical Neuroscience Chris Eliasmith** **is having some success constructing “the world’s largest simulation of a functioning brain.” His book titled “How to Build a Brain” expected in February.

You are currently browsing the monthly archive for **December 2012**.

In the widely cited paper “Rapid object detection using a boosted cascade of simple features“, Viola and Jones (CVPR 2001) apply “Harr-like” features and AdaBoost to a fast “cascade” of increasingly complex image classifiers (mostly facial recognition). They write, “The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest.” The Harr-like decomposition quickly (constant time) creates mostly localized features and AdaBoost learns quickly so the combination is fast. They report, “In the domain of face detection it is possible to achieve fewer than 1% false negatives and 40% false positives using a classiﬁer constructed from **two** Harr-like features.” [emphasis added]

At the top 500 website, I notice that the main CPUs are made only by four companies: IBM, Intel, AMD, and Nvidia. HP was squeezed out in 2008, leaving only four players. It makes me wonder if the trend toward fewer manufacturers will continue. Also, the both the #1 super computer and #500 did not keep up with the general trendline over the last two or three years. On the other hand, the average computational power of the top 500 has stayed very close to the trendline which increases by a factor of 1.8 every year.

Lifted Inference uses the rules of first order predicate logic to improve the speed of the standard Markov Random Field algorithms applied to Markov Logic Networks. I wish I had been in Barcelona Spain in July last year for IJCAI11 because they had a cool tutorial on Lifted Inference. Here’s a quote

Much has been achieved in the field of AI, yet much remains to be done if we are to reach the goals we all imagine. One of the key challenges with moving ahead is closing the gap between logical and statistical AI. Recent years have seen an explosion of successes in combining probability and (subsets of) first-order logic respectively programming languages and databases in several subfields of AI: Reasoning, Learning, Knowledge Representation, Planning, Databases, NLP, Robotics, Vision, etc. Nowadays, we can learn probabilistic relational models automatically from millions of inter-related objects. We can generate optimal plans and learn to act optimally in uncertain environments involving millions of objects and relations among them. Exploiting shared factors can speed up message-passing algorithms for relational inference but also for classical propositional inference such as solving SAT problems. We can even perform exact lifted probabilistic inference avoiding explicit state enumeration by manipulating first-order state representations directly.

In the related paper “Lifted Inference Seen from the Other Side : The Tractable Features“, Jha, Gogate, Meliou, Suciu (2010) reverse this notion. Here’s the abstract:

Lifted Inference algorithms for representations that combine ﬁrst-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efﬁciency by exploiting repeated structure in the ﬁrst-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we deﬁne a set of rules that look only at the logical representation to identify models for which exact efﬁcient inference is possible. Our rules yield new tractable classes that could not be solved efﬁciently by any of the existing techniques.

Answer: Statistical Relational Learning. Maybe I can get the book for Christmas.

I just had to pass along this link from jwz’s blog.

Thank you to Freakonometrics for pointing me toward the book “Proofs without words” by Rodger Nelson. Might be a nice Christmas present

NIPS was pretty fantastic this year. There were a number of breakthroughs in the areas that interest me most: Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks. Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:

- Representation, Inference and Learning in Structured Statistical Models
- Stochastic Search and Optimization
- Quantum information and the Brain
- Relax and Randomize : From Value to Algorithms (Great)
- Classification with Deep Invariant Scattering Networks
- Discriminative Learning of Sum-Product Networks
- On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
- A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
- Regularized Off-Policy TD-Learning
- Multi-Stage Multi-Task Feature Learning
- Graphical Models via Generalized Linear Models (Great)
- No voodoo here! Learning discrete graphical models via inverse covariance estimation (Great)
- Gradient Weights help Nonparametric Regressors
- Dropout: A simple and effective way to improve neural networks (Great)
- Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
- A Better Way to Pre-Train Deep Boltzmann Machines
- Bayesian Optimization and Decision Making
- Practical Bayesian Optimization of Machine Learning Algorithms
- Modern Nonparametric Methods in Machine Learning
- Deep Learning and Unsupervised Feature Learning

The blog Computational Information Geometry Wonderland pointed me toward the article “k-MLE: A fast algorithm for learning statistical mixture models” by Frank Nielsen (2012). $k$-means can be viewed as alternating between 1) assigning points to clusters and 2) performing a maximum likelihood estimation (MLE) of the mean of spherical Gaussians clusters (all of which are forced to have the same covariance matrix equal to a scalar multiple of the identity). If we replace the spherical Gaussian with another set of distributions, we get $k$-MLE. Nielsen does a remarkably good job of introducing the reader to some complex concepts without requiring anything other than a background in probability and advance calculus. He explores the relationships between $k$-MLE with exponential families and information geometry. Along the way he exposes the reader to Bregman divergences, cross-entropy, Legendre duality, Itakura-Saito divergence, and Burg matrix divergence.

Julia can be written like Malab without typing information and it runs very fast, at nearly the speed of C, because it does *runtime* type inference and JIT compilation. Underneath it has sophisticated dynamic algebraic typing system which can be manipulated by the programmer (much like Haskell). Carl sent me a link to this video about how the language achieves this level of type inference and type manipulation.