December 2012

You are currently browsing the monthly archive for December 2012.

It looks like Canadian Professor and Director, Centre for Theoretical Neuroscience Chris Eliasmith is having some success constructing “the world’s largest simulation of a functioning brain.”  His book titled “How to Build a Brain” expected in February.

In the widely cited paper “Rapid object detection using a boosted cascade of simple features“, Viola and Jones (CVPR 2001) apply “Harr-like” features and AdaBoost to a fast “cascade” of increasingly complex image classifiers (mostly facial recognition).   They write, “The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest.” The Harr-like decomposition quickly (constant time) creates mostly localized features and AdaBoost learns quickly so the combination is fast. They report, “In the domain of face detection it is possible to achieve fewer than 1% false negatives and 40% false positives using a classifier constructed from two Harr-like features.” [emphasis added]

At the top 500 website, I notice that the main CPUs are made only by four companies: IBM, Intel, AMD, and Nvidia.  HP was squeezed out in 2008, leaving only four players.  It makes me wonder if the trend toward fewer manufacturers will continue.  Also, the both the #1 super computer and #500 did not keep up with the general trendline over the last two or three years.  On the other hand, the average computational power of the top 500 has stayed very close to the trendline which increases by a factor of 1.8 every year.

Lifted Inference uses the rules of first order predicate logic to improve the speed of the standard Markov Random Field algorithms applied to Markov Logic Networks.  I wish I had been in Barcelona Spain in July last year for IJCAI11 because they had a cool tutorial on Lifted Inference.  Here’s a quote

Much has been achieved in the field of AI, yet much remains to be done if we are to reach the goals we all imagine. One of the key challenges with moving ahead is closing the gap between logical and statistical AI. Recent years have seen an explosion of successes in combining probability and (subsets of) first-order logic respectively programming languages and databases in several subfields of AI: Reasoning, Learning, Knowledge Representation, Planning, Databases, NLP, Robotics, Vision, etc. Nowadays, we can learn probabilistic relational models automatically from millions of inter-related objects. We can generate optimal plans and learn to act optimally in uncertain environments involving millions of objects and relations among them. Exploiting shared factors can speed up message-passing algorithms for relational inference but also for classical propositional inference such as solving SAT problems. We can even perform exact lifted probabilistic inference avoiding explicit state enumeration by manipulating first-order state representations directly.

In the related paper “Lifted Inference Seen from the Other Side : The Tractable Features“, Jha, Gogate, Meliou, Suciu (2010) reverse this notion.  Here’s the abstract:

Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques.

Answer:  Statistical Relational Learning.  Maybe I can get the book for Christmas.

I just had to pass along this link from jwz’s blog.

Thank you to Freakonometrics for pointing me toward the book “Proofs without words” by Rodger Nelson.  Might be a nice Christmas present  :)

NIPS was pretty fantastic this year.  There were a number of breakthroughs in the areas that interest me most:  Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks.  Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:


  1. Representation, Inference and Learning in Structured Statistical Models
  2. Stochastic Search and Optimization
  3. Quantum information and the Brain
  4. Relax and Randomize : From Value to Algorithms  (Great)
  5. Classification with Deep Invariant Scattering Networks
  6. Discriminative Learning of Sum-Product Networks
  7. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
  8. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
  9. Regularized Off-Policy TD-Learning
  10. Multi-Stage Multi-Task Feature Learning
  11. Graphical Models via Generalized Linear Models (Great)
  12. No voodoo here! Learning discrete graphical models via inverse covariance estimation (Great)
  13. Gradient Weights help Nonparametric Regressors
  14. Dropout: A simple and effective way to improve neural networks (Great)
  15. Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
  16. A Better Way to Pre-Train Deep Boltzmann Machines
  17. Bayesian Optimization and Decision Making
  18. Practical Bayesian Optimization of Machine Learning Algorithms
  19. Modern Nonparametric Methods in Machine Learning
  20. Deep Learning and Unsupervised Feature Learning
Unfortunately, when you have 30 full day workshops in a two day period, you miss most of them.  I could only attend the three listed above.  There were many other great ones.



The blog Computational Information Geometry Wonderland pointed me toward the article “k-MLE: A fast algorithm for learning statistical mixture models” by Frank Nielsen (2012).  $k$-means can be viewed as alternating between 1) assigning points to clusters and 2) performing a maximum likelihood estimation (MLE) of the mean of spherical Gaussians clusters (all of which are forced to have the same covariance matrix equal to a scalar multiple of the identity).  If we replace the spherical Gaussian with another set of distributions, we get $k$-MLE.  Nielsen does a remarkably good job of introducing the reader to some complex concepts without requiring anything other than a background in probability and advance calculus.  He explores the relationships between $k$-MLE with exponential families and information geometry.  Along the way he exposes the reader to Bregman divergences,  cross-entropy,  Legendre dualityItakura-Saito divergence, and Burg matrix divergence.

Julia can be written like Malab without typing information and it runs very fast, at nearly the speed of C, because it does runtime type inference and JIT compilation. Underneath it has sophisticated dynamic algebraic typing system which can be manipulated by the programmer (much like Haskell).  Carl sent me a link to this video about how the language achieves this level of type inference and type manipulation.

« Older entries