Neural Nets

You are currently browsing the archive for the Neural Nets category.

NIPS was pretty fantastic this year.  There were a number of breakthroughs in the areas that interest me most:  Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks.  Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:

 

  1. Representation, Inference and Learning in Structured Statistical Models
  2. Stochastic Search and Optimization
  3. Quantum information and the Brain
  4. Relax and Randomize : From Value to Algorithms  (Great)
  5. Classification with Deep Invariant Scattering Networks
  6. Discriminative Learning of Sum-Product Networks
  7. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
  8. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
  9. Regularized Off-Policy TD-Learning
  10. Multi-Stage Multi-Task Feature Learning
  11. Graphical Models via Generalized Linear Models (Great)
  12. No voodoo here! Learning discrete graphical models via inverse covariance estimation (Great)
  13. Gradient Weights help Nonparametric Regressors
  14. Dropout: A simple and effective way to improve neural networks (Great)
  15. Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
  16. A Better Way to Pre-Train Deep Boltzmann Machines
  17. Bayesian Optimization and Decision Making
  18. Practical Bayesian Optimization of Machine Learning Algorithms
  19. Modern Nonparametric Methods in Machine Learning
  20. Deep Learning and Unsupervised Feature Learning
Unfortunately, when you have 30 full day workshops in a two day period, you miss most of them.  I could only attend the three listed above.  There were many other great ones.

 

 

Hinton has a new Google tech talk “Brains, Sex, and Machine Learning“.  I think that if you are into neural nets, you’ve got to watch this video.  Here’s the abstract.

   Recent advances in machine learning cast new light on two puzzling biological phenomena. Neurons can use the precise time of a spike to communicate a real value very accurately, but it appears that cortical neurons do not do this. Instead they send single, randomly timed spikes. This seems like a clumsy way to perform signal processing, but a recent advance in machine learning shows that sending stochastic spikes actually works better than sending precise real numbers for the kind of signal processing that the brain needs to do. A closely related advance in machine learning provides strong support for a recently proposed theory of the function of sexual reproduction. Sexual reproduction breaks up large sets of co-adapted genes and this seems like a bad way to improve fitness. However, it is a very good way to make organisms robust to changes in their environment because it forces important functions to be achieved redundantly by multiple small sets of genes and some of these sets may still work when the environment changes. For artificial neural networks, complex co-adaptations between learned feature detectors give good performance on training data but not on new test data. Complex co-adaptations can be reduced by randomly omitting each feature detector with a probability of a half for each training case. This random “dropout” makes the network perform worse on the training data but the number of errors on the test data is typically decreased by about 10%. Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov have shown that this leads to large improvements in speech recognition and object recognition.

 

Hinton has a lot of great ideas in this video including this slide on a massively parallel approach to neural nets.

And this one

And, as mentioned in the abstract, the idea of “dropouts” is very important. (Similar to denoising.)

I wonder if the idea of dropouts can be applied to create more robust Bayesian networks / Probabilistic Graphical Models.  Maybe the same effect can be achieved by introducing a bias (regularization) against connections between edges (similar to the idea of sparsity).

 

 

In “Back Propagation in a Clifford Algebra“, Pearson and Bisset (1992) discuss the interesting problem of replacing real numbers in a neural net by elements from a Clifford Algebra.  They replace the sigmoid activation function with

$$f(x) = x / ( c + |x|/r)$$

where $c$ and $r$ are real positive constants and $|x|$ is the norm of the element in the Clifford algebra.  Deriving the back propagation algorithm is straight forward otherwise.

In a later article “Neural Networks in the Clifford Domain“, the same authors explain how complex numbers, quaternions, or Clifford algebras can convey electrical phase information between neurons which might be necessary for more accurate representation of how the brain actually works.  Also, it is possible that signal processing and image processing applications may benefit. They write,

“It is conjectured that complex valued feed-forward networks will be able to achieve better representations of problems that map into the complex domain naturally (such as phase and frequency information) than if the components of the signal were split up and presented to a real valued feed forward network.”

Andrew Ng / Stanford has a great wiki tutorial on sparse deep neural nets.

There is a great post at stack overflow comparing SVMs with Neural Nets. Check it out.

Carl and I have been discussing concepts related to sparsity in neural nets. One older idea similar to sparsity is negatively correlated or uncorrelated neural nets. To use this technique, you could train several neural nets on part or all of the data and then keep the neural networks that are less correlated. The idea used CELS algorithm is to add a correlation penalty term to the loss function of the neural net optimization process. In “Simultaneous Training of Negatively Correlated Neural Networks in an Ensemble” Liu and Yao (1999) say that

The idea behind CELS is to encourage different individual networks in an ensemble to learn different parts or aspects of a training data so that the ensemble can learn the whole training data better.

which seems similar to what happens in sparse neural networks.

Newer entries »