Deep Belief Networks

You are currently browsing the archive for the Deep Belief Networks category.

Dropout – What happens when you randomly drop half the features?

January 16, 2013 in Deep Belief Networks, Neural Nets by hundalhh | Permalink

In “Improving neural networks by preventing co-adaptation of feature detectors“, Hinton, Srivastava, Krizhevsky, Sutskever, and Salakhutdinov answer the question: What happens if “On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present.” This mimics the standard technique of training several neural nets and averaging them, but it is faster. When they applied the “dropout” technique to a deep Boltzmann neural net on the MNIST hand written digit data set and the TIMIT speech data set, they got robust learning without overfitting. This was one of the main techniques used by the winners of the Merck Molecular Activity Challenge.

Hinton talks about the dropout technique in his video Brains, Sex, and Machine Learning.

The 20 most striking papers, workshops, and presentations from NIPS 2012

December 18, 2012 in Deep Belief Networks, General ML, Graphical Models, Multi-Armed Bandit Problem, Neural Nets, Reinforcement Learning by hundalhh | Permalink

NIPS was pretty fantastic this year. There were a number of breakthroughs in the areas that interest me most: Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks. Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:

Unfortunately, when you have 30 full day workshops in a two day period, you miss most of them. I could only attend the three listed above. There were many other great ones.

“Semantic Hashing”

December 12, 2012 in Deep Belief Networks, Graphical Models, Languages by hundalhh | Permalink

In “Semantic Hashing“, Salakhutdinov and Hinton (2007) show how to classify documents with binary vectors. They combine deep learning and graphical models to assign each document a binary vector. Similar documents can be found by using the L1 difference between the binary vectors. Here is their abstract.

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs “semantic hashing”: Documents are mapped to memory addresses in such away that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to ﬁlter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.

“Brains, Sex, and Machine Learning”

November 30, 2012 in Deep Belief Networks, Neural Nets by hundalhh | Permalink

Hinton has a new Google tech talk “Brains, Sex, and Machine Learning“. I think that if you are into neural nets, you’ve got to watch this video. Here’s the abstract.

Recent advances in machine learning cast new light on two puzzling biological phenomena. Neurons can use the precise time of a spike to communicate a real value very accurately, but it appears that cortical neurons do not do this. Instead they send single, randomly timed spikes. This seems like a clumsy way to perform signal processing, but a recent advance in machine learning shows that sending stochastic spikes actually works better than sending precise real numbers for the kind of signal processing that the brain needs to do. A closely related advance in machine learning provides strong support for a recently proposed theory of the function of sexual reproduction. Sexual reproduction breaks up large sets of co-adapted genes and this seems like a bad way to improve fitness. However, it is a very good way to make organisms robust to changes in their environment because it forces important functions to be achieved redundantly by multiple small sets of genes and some of these sets may still work when the environment changes. For artificial neural networks, complex co-adaptations between learned feature detectors give good performance on training data but not on new test data. Complex co-adaptations can be reduced by randomly omitting each feature detector with a probability of a half for each training case. This random “dropout” makes the network perform worse on the training data but the number of errors on the test data is typically decreased by about 10%. Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov have shown that this leads to large improvements in speech recognition and object recognition.

Hinton has a lot of great ideas in this video including this slide on a massively parallel approach to neural nets.

And this one

And, as mentioned in the abstract, the idea of “dropouts” is very important. (Similar to denoising.)

I wonder if the idea of dropouts can be applied to create more robust Bayesian networks / Probabilistic Graphical Models. Maybe the same effect can be achieved by introducing a bias (regularization) against connections between edges (similar to the idea of sparsity).

Deep Learning in The NY Times

November 24, 2012 in Deep Belief Networks by hundalhh | Permalink

John Markoff at the NY times writes about recent advances in and implications of deep neural nets in “Scientists See Promise in Deep-Learning Programs“. He reviews the recent activities of Hinton, LeCun, and Ng as well as applications of deep learning in speech recognition, computer vision, chemistry (Kaggel Merck Prize), and near real-time natural language processing and machine translation.

“Church: a language for generative models”

November 15, 2012 in Deep Belief Networks, General ML, Graphical Models by hundalhh | Permalink

In “Church: a language for generative models“, Goodman, Mansinghka, Roy, Bonawitz, and Tenenbaum introduce the probabilistic computer language “Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset.” There will be a workshop on probabilistic programming at NIPS (which I first read about at the blog Statistical Modeling, Causal Inference, and Social Science). Here is a cool tutorial.