The NLTK Python Library contains a large number of packages for text manipulation and classification. It includes routines for classification (maximum entropy, naive Bayes, support vector machines, an interface to the Weka library, expectation maximization, k-means, conditional random fields,…), text-manipulation, parsing, and graphics.

You are currently browsing the monthly archive for **November 2012**.

In “Church: a language for generative models“, Goodman, Mansinghka, Roy, Bonawitz, and Tenenbaum introduce the probabilistic computer language “Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset.” There will be a workshop on probabilistic programming at NIPS (which I first read about at the blog Statistical Modeling, Causal Inference, and Social Science). Here is a cool tutorial.

Jacek Gondzio has some nice slides (2009) on interior point methods for large scale support vector machines. He focuses on the primal dual logarithmic barrier methods (see e.g. Wright 1987) for softer classification. Great explanations, diagrams, and numerical results are provided. Kristian Woodsend wrote his 2009 Ph.D. thesis on the same subject. Woodsend applies the interior point methods and low rank approximations of the SVM kernel to reduce the computational cost to order $n$ where $n$ is the number of data points. He compares this approach to active set methods, gradient projection algorithms, and cutting-plane algorithms and concludes with numerical results.

In “Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity“, Ailon (2011) presents a method for ordering a set based on noisy comparisons of elements. He proves that his adaptive method approximates the NP Hard optimal solution (Alon 2006) to within $(1+\epsilon)$ times the minimum error after $f(\epsilon^{-1}, n)$ comparisons where $f$ is a polynomial. Ailon builds on the work of Kenyon-Mathieu and Schudy (2007) who also developed a polynomial time algorithm for approximate ranking. The Kenyon-Mathieu and Schudy algorithm was not query efficient.

In the seminal paper “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data“, Lafferty, McCallum, Pereira (2001) introduce a very popular type of Markov random field for segmentation. Conditional Random Fields (CRFs) are used in many fields including machine translation, parsing, genetics, and transmission codes. They are a non-directed version of Hidden Markov Networks. The paper describes conditional random fields, provides an iterative method to estimate the parameters of the CRF, and reports experimental comparisons between CRFs, hidden Markov models, and maximum entropy Markov models.

In “Back Propagation in a Clifford Algebra“, Pearson and Bisset (1992) discuss the interesting problem of replacing real numbers in a neural net by elements from a Clifford Algebra. They replace the sigmoid activation function with

$$f(x) = x / ( c + |x|/r)$$

where $c$ and $r$ are real positive constants and $|x|$ is the norm of the element in the Clifford algebra. Deriving the back propagation algorithm is straight forward otherwise.

In a later article “Neural Networks in the Clifford Domain“, the same authors explain how complex numbers, quaternions, or Clifford algebras can convey electrical phase information between neurons which might be necessary for more accurate representation of how the brain actually works. Also, it is possible that signal processing and image processing applications may benefit. They write,

“It is conjectured that complex valued feed-forward networks will be able to achieve better representations of problems that map into the complex domain naturally (such as phase and frequency information) than if the components of the signal were split up and presented to a real valued feed forward network.”

I was reading “Around the Blogs in 80 Summer Hours” at Nuit Blanche and these two links caught my eye:

Implementation: BiLinear Modelling Via Augmented Lagrange Multipliers (BLAM)

In “Algorithms for Infinitely Many-Armed Bandits”, Wang, Audibert, and Munos (2008) describe some algorithms for the multi-armed bandit problem when a large number or infinitely many arms are available. Their algorithms are designed for the situation where all rewards are contained in $[0,1]$ and “the probability that a new arm is $\epsilon$-optimal is of order $\epsilon^\beta$”. More precisely, there exist real numbers $c, \mu^*,$ and $\beta$ such that the expected value of an unexplored arm $\mu$ obeys

$$P(\mu^* – \mu < \epsilon) < c \epsilon^\beta.$$

They prove that the total regret is at most of order $n^{\beta/(\beta+1)}\log^2(n)$ if $\beta > 1$ and $\log^2(n)\sqrt{n}$ otherwise. Additionally, they prove a lower bound of order $n^{\beta / (\beta + 1)}$ for any algorithm. Their algorithm applies UCB to the first $n^{\beta/(\beta+1)}$ arms. (The case where $\beta = 1$ was explored in “Bandit problems with infinitely many arms” by Berry, Chen, Zame, Heath, and Shepp (1997).)