# Graphical Models

You are currently browsing the archive for the Graphical Models category.

## What happens when you combine Relational Databases, Logic, and Machine Learning?

Answer:  Statistical Relational Learning.  Maybe I can get the book for Christmas.

## The 20 most striking papers, workshops, and presentations from NIPS 2012

NIPS was pretty fantastic this year.  There were a number of breakthroughs in the areas that interest me most:  Markov Decision Processes, Game Theory, Multi-Armed Bandits, and Deep Belief Networks.  Here is the list of papers, workshops, and presentations I found the most interesting or potentially useful:

Unfortunately, when you have 30 full day workshops in a two day period, you miss most of them.  I could only attend the three listed above.  There were many other great ones.

## “Semantic Hashing”

In “Semantic Hashing“, Salakhutdinov and Hinton (2007) show how to classify documents with binary vectors.  They combine deep learning and graphical models to assign each document a binary vector.  Similar documents can be found by using the L1 difference between the binary vectors.  Here is their abstract.

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs “semantic hashing”: Documents are mapped to memory addresses in such away that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to ﬁlter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.

## “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”

In the seminal paper “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data“, Lafferty, McCallum, Pereira (2001) introduce a very popular type of Markov random field for segmentation. Conditional Random Fields (CRFs) are used in many fields including machine translation, parsing, genetics, and transmission codes.  They are a non-directed version of Hidden Markov Networks.  The paper describes conditional random fields, provides an iterative method to estimate the parameters of the CRF, and reports experimental comparisons between CRFs, hidden Markov models, and maximum entropy Markov models.

## Sparse Kalman Filters

Nuit Blanche pointed me toward the interesting paper “Re-Weighted L1 Dynamic Filtering for Time-Varying Sparse Signal Estimation” by Charles and Rozell (2012). They investigate the generalized Kalman filter situation which is a hidden Markov model with an infinite number of states and real valued observations. More specifically
$$x_n = f_n (x_{n-1}) + v_n$$

$$y_n = G_n x_{n-1} + \epsilon_n$$

where $x_n$ is the state, $f_n$ is the known state transition function, $y_n$ is a real valued observation vector, $G_n$ is the known observation matrix (full rank), $v_n$ is the process vector noise (not necessarily Gaussian), and $\epsilon_n$ is the observation noise vector (once again, not necessarily Gaussian). The object of the game is to estimate $x_n$ given the transition function, the observation matrices, and the observations $y_n$.

They sparsely represent the state vector $x_n$ is by $x_n = W z_n$ where $W$ is a matrix and $z_n$ is a vector that is mostly zeros. In order to achieve a sparse representation, a L1 norm regularization term is applied. This regularization usually corresponds to a Laplacian prior as in Bias pursuit denoising or LASSO regression. As with neural nets, sparsity has had an impact in many fields. They write

In particular, outside of the dynamic estimation regime, signal priors based on the notion of sparsity have led to state-of-the-art performance in many linear inverse problems (e.g., undersampling, inpainting, denoising, compressed sensing, etc. [3]) across several different application domains (e.g. natural images [3], audio [4], hyperspectral imagery [5], etc.).

The previous work integrating sparsity models into dynamic estimation has not yet followed the spirit
of the successful Kalman ﬁlter by propagating higher-order statistics from one time step to the next in a
way that is well-matched to the assumptions of the problem.

As is pointed out in [8], propagating a covariance matrix is no longer necessarily the correct or optimal method once the
assumptions of the Kalman ﬁlter are not met. Instead, such a matrix is no more special than a simple parameter matrix, which
may or may not assist in the signal estimation.”

The paper includes several experimental results in section IV.