Neural Nets

You are currently browsing the archive for the Neural Nets category.

The linear-nonlinear-Poisson (LNP) cascade model is a standard model of neuron responses.  Louis Shao has recently shown that an artificial neural net consisting of LNP neurons can simulate any Boltzmann machine and perform “a semi-stochastic Bayesian inference algorithm lying between Gibbs sampling and variational inference.”  In his paper he notes that the “properties of visual area V2 are found to be comparable to those on the sparse autoencoder networks [3]; the sparse coding learning algorithm [4] is originated directly from neuroscience observations; also psychological phenomenon such as end-stopping is observed in sparse coding experiments [5].”



In “A Neuro-evolution Approach to General Atari Game Playing“, Hausknecht, Miikkulainen, and Stone (2013) describe and test four general game learning AIs based on evolving neural nets. They apply the AIs to sixty-one Atari 2600 games exceeding the best known human performance in three of the sixty-one games (Bowling, Kung Fu Master, and Video Pinball).  This work improves their previous Atari gaming AI described in “HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player” (2012) with P. Khandelwal.

The Atari 2600 presents a uniform interface for all its games:  a 2D screen, a joystick, and one button.  The Atari 2600 games are simulated in the Arcade Learning Environment which has allowed several researchers to develop AIs for the Atari.

The four algorithms tested are:

  1. Fixed topology neural nets that adapt by changing weights between neurons
  2. Neural nets that evolve both the weights and the topology of the network (NEAT created by Stanley and Miikkulainen (2002))
  3. “Indirect encoding of the network weights”  (HyperNEAT created by Gauci and Stanley (2008)) 
  4. “A hybrid algorithm combining elements of both indirect encodings and individual weight evolution” (HybrID by Clune, Stanley, Pennock, and Ofria (2011))

All of the algorithms evolved a population of 100 individual neural nets over 150 generations mostly using the topology shown below.


For MOVIES of the AIs in action click below


T Jake Luciani wrote a nice, easy to read blog post on the recent developments in neural networks.

In “Autoencoders, MDL, and Helmholtz Free Energy“, Hinton and Zemel (2001) use Minimum Description Length as an objective function for formulating generative and recognition weights for an autoencoding neural net.  They develop a stochastic Vector Quantization method very similar to mixture of Gaussians where each input vector is encoded with

$$E_i = – \log \pi_i  – k \log t + {k\over2} \log 2 \pi \sigma^2 + {{d^2} \over{2\sigma^2}}$$

nats (1 nat = 1/log(2) bits = 1.44 bits) where $t$ is the quantization width, $d$ is the Mahalanobis distance to the mean of the Gaussian, $k$ is the dimension of the input space, $\pi_i$ is the weight of the $i$th Gaussian.  They call this the “energy” of the code.  Encoding only using this scheme wastes bits because, for example, there may be vectors that are equally distant from two Gaussian. The amount wasted is

$$H = -\sum p_i \log p_i$$

where $p_i$ the probability that the code will be assigned to the $i$th Gaussian. So the “true” expected description length is

$$F = \sum_i p_i E_i – H$$

which “has exactly the form of the Helmholtz free energy.”  This free energy is minimized by setting

$$p_i = {{e^{-E_i}}\over{\sum_j e^{-E_j}}}.$$

In order to make computation practical, they recommend using a suboptimal distributions “as a Lyapunov function for learning” (see Neal and Hinton 1993). They apply their method to learn factorial codes.


Scott Aaronson gave a wonderful talk at NIPS 2012 called “Quantum information and the Brain“.  I really enjoyed his very concise intuitive description of quantum mechanics and his sceptical yet fair attitude towards the ideas expressed by Penrose in “The Emperor’s New Mind”.

Bengio and Lecun created this wonderful video on Deep Neural Networks.  Any logical function can be represented by a neural net with 3 layers (one hidden, see e.g. CNF), however simple 4 level logical functions with a small number of nodes may require a large number of nodes in a 3 layer representation.  They point to theorems that show that the number of nodes required to represent a k level logical function can require an exponential number of nodes in a k-1 level network. They go on to explain denoising auto encoders for the training of deep neural nets.

In “Improving neural networks by preventing co-adaptation of feature detectors“, Hinton, Srivastava, Krizhevsky, Sutskever, and Salakhutdinov answer the question:  What happens if “On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present.”  This mimics the standard technique of training several neural nets and averaging them, but it is faster.  When they applied the “dropout” technique to a deep Boltzmann neural net on the MNIST hand written digit data set and the TIMIT speech data set, they got robust learning without overfitting.  This was one of the main techniques used by the winners of the Merck Molecular Activity Challenge.

Hinton talks about the dropout technique in his video Brains, Sex, and Machine Learning.

Check out the Bookworm9765’s review of Kurzweil‘s book “How to Create a mind” on  Here is a snippet:

The core of Kurzweil’s theory is that the brain is made up of pattern processing units comprised of around 100 neurons, and he suggests that the brain can be understood and simulated primarily by looking at how these lego-like building blocks are interconnected.

It looks like Canadian Professor and Director, Centre for Theoretical Neuroscience Chris Eliasmith is having some success constructing “the world’s largest simulation of a functioning brain.”  His book titled “How to Build a Brain” expected in February.

NIPS was pretty fantastic this year.  There were a number of breakthroughs in the areas that interest me most:  Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks.  Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:


  1. Representation, Inference and Learning in Structured Statistical Models
  2. Stochastic Search and Optimization
  3. Quantum information and the Brain
  4. Relax and Randomize : From Value to Algorithms  (Great)
  5. Classification with Deep Invariant Scattering Networks
  6. Discriminative Learning of Sum-Product Networks
  7. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
  8. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
  9. Regularized Off-Policy TD-Learning
  10. Multi-Stage Multi-Task Feature Learning
  11. Graphical Models via Generalized Linear Models (Great)
  12. No voodoo here! Learning discrete graphical models via inverse covariance estimation (Great)
  13. Gradient Weights help Nonparametric Regressors
  14. Dropout: A simple and effective way to improve neural networks (Great)
  15. Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
  16. A Better Way to Pre-Train Deep Boltzmann Machines
  17. Bayesian Optimization and Decision Making
  18. Practical Bayesian Optimization of Machine Learning Algorithms
  19. Modern Nonparametric Methods in Machine Learning
  20. Deep Learning and Unsupervised Feature Learning
Unfortunately, when you have 30 full day workshops in a two day period, you miss most of them.  I could only attend the three listed above.  There were many other great ones.



« Older entries § Newer entries »