Igor Carron provides a nice history of Compressed Sensing on his blog.  He reviews the computation complexity of various Compressed Sensing algorithms, noting that the field started with the discovery of polynomial time solutions to underdetermined linear problems.  While $l_0$ norm regularization results in NP-complete problems, $l_1$ regularization is easier.

If you want hits on your blog, write about an article that is being read by thousands or millions of people.  Some of those readers will Google terms from the article. Today I blogged very briefly about the NY Times article “Scientists See Promise in Deep-Learning Programs” and I was surprised at the number of referrals from Google.  Hmm, maybe I should blog about current TV shows (Big Bang? Mythbusters?), movies (Cloud Atlas?), and video games (Call of Duty?).   Carl suggested that I apply deep learning, support vector machines, and logistic regression to estimate the number of hits I will get on a post.  If I used restricted Boltzmann machines, I could run it in generative mode to create articles  :)      I was afraid if I went down that route I would eventually have a fantastically popular blog about Britney Spears.

John Markoff at the NY times writes about recent advances in and implications of deep neural nets in “Scientists See Promise in Deep-Learning Programs“.  He reviews the recent activities of  Hinton, LeCun, and Ng as well as applications of deep learning in speech recognition, computer vision, chemistry (Kaggel Merck Prize), and near real-time natural language processing and machine translation.

Gauss did a lot of math, but sometimes I am surprised when I find something new to me done by Gauss

$\tan(x) = { 1 \over{1/x \ – {1\over{3/x \ – {1\over{5/x\  – \ldots}}}}}}$

In “Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery“, Niekum and Barto (2011) apply Sub-goal Discovery and Skill Discovery to improve Reinforcement Learning on Markov Decision Processes.  They write,

“Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. For example, opening a door ought to be the same skill whether an agent is one inch or two inches away from the door, or whether the door is red or blue; making each possible configuration a separate skill would be unwise. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal.

We show that these problems can be overcome by collecting subgoal data from a series of tasks and clustering it in an agent-space [9], a shared feature space across multiple tasks. The resulting clusters generalize subgoals within and across tasks and can be used as templates for portable skill termination conditions. Clustering also allows the creation of skill termination conditions in a datadriven way that makes minimal assumptions and can be tailored to the domain through a careful choice of clustering algorithm. Additionally, this framework extends the utility of single-state subgoal discovery algorithms to continuous domains, in which the agent may never see the same state twice. We argue that clustering based on a Dirichlet process mixture model is appropriate in the general case when little is known about the nature or number of skills needed in a domain. Experiments in a continuous domain demonstrate the utility of this approach and illustrate how it may be useful even when traditional subgoal discovery methods are infeasible.”

They apply Dirichlet process mixture models (more specifically an infinite Gaussian mixture model with a Dirichlet prior on the number of Gaussians and the parameters of the Gaussians) using expectation-maximization to do clustering of the states—each cluster corresponding to a stratagem referred to as a “latent option”.  Experimental results in an environment similar to the Lightworld domain (see “Building Portable Options: Skill Transfer in Reinforcement Learning” by Konidaris and  Barto (2007)) are reported.

 

Topological Sorting

Given a directed a-cyclic graph (DAG) find an ordering of the nodes which is consistent with the directed edges.  This is called Topological Sorting and several algorithms exist.

In “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” by Pressa and Dyson (2012) show that there are strategies, named “zero determinant strategies” (ZD) which “win” against so-called evolutionary strategies in Prisoner’s Dilemma.  An easy to read, less mathematical summary of the paper is given in “Extortion and cooperation in the Prisoner’s Dilemma” by Stewart  and Plotkin.  Fans of Tit-for-Tat need not fear because Tit-for-Tat does not lose significantly against the ZD strategies. The idea of the ZD strategies is not very complicated.   The ZD player cooperates enough that it is profitable to cooperate with the ZD player, but the ZD player defects enough that 1) it is not profitable to defect against the ZD player, 2) the ZD player wins more than his opponent when the opponent cooperates.

The NLTK Python Library contains a large number of packages for text manipulation and classification.  It includes routines for classification (maximum entropy, naive Bayes, support vector machines, an interface to the Weka library, expectation maximization, k-means, conditional random fields,…), text-manipulation, parsing, and graphics.

In “Church: a language for generative models“, Goodman, Mansinghka, Roy, Bonawitz, and Tenenbaum introduce the probabilistic computer language “Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset.”  There will be a workshop on probabilistic programming at NIPS (which I first read about at the blog Statistical Modeling, Causal Inference, and Social Science).  Here is a cool tutorial.

Jacek Gondzio has some nice slides (2009) on interior point methods for large scale support vector machines.  He focuses on the primal dual logarithmic barrier methods (see e.g. Wright 1987) for softer classification.  Great explanations, diagrams, and numerical results are provided.  Kristian Woodsend wrote his 2009 Ph.D. thesis on the same subject.  Woodsend applies the interior point methods and low rank approximations of the SVM kernel to reduce the computational cost to order $n$ where $n$ is the number of data points.  He compares this approach to active set methods, gradient projection algorithms, and cutting-plane algorithms and concludes with numerical results.

« Older entries § Newer entries »