November 2012

You are currently browsing the monthly archive for November 2012.

“Brains, Sex, and Machine Learning”

November 30, 2012 in Deep Belief Networks, Neural Nets by hundalhh | Permalink

Hinton has a new Google tech talk “Brains, Sex, and Machine Learning“. I think that if you are into neural nets, you’ve got to watch this video. Here’s the abstract.

Recent advances in machine learning cast new light on two puzzling biological phenomena. Neurons can use the precise time of a spike to communicate a real value very accurately, but it appears that cortical neurons do not do this. Instead they send single, randomly timed spikes. This seems like a clumsy way to perform signal processing, but a recent advance in machine learning shows that sending stochastic spikes actually works better than sending precise real numbers for the kind of signal processing that the brain needs to do. A closely related advance in machine learning provides strong support for a recently proposed theory of the function of sexual reproduction. Sexual reproduction breaks up large sets of co-adapted genes and this seems like a bad way to improve fitness. However, it is a very good way to make organisms robust to changes in their environment because it forces important functions to be achieved redundantly by multiple small sets of genes and some of these sets may still work when the environment changes. For artificial neural networks, complex co-adaptations between learned feature detectors give good performance on training data but not on new test data. Complex co-adaptations can be reduced by randomly omitting each feature detector with a probability of a half for each training case. This random “dropout” makes the network perform worse on the training data but the number of errors on the test data is typically decreased by about 10%. Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov have shown that this leads to large improvements in speech recognition and object recognition.

Hinton has a lot of great ideas in this video including this slide on a massively parallel approach to neural nets.

And this one

And, as mentioned in the abstract, the idea of “dropouts” is very important. (Similar to denoising.)

I wonder if the idea of dropouts can be applied to create more robust Bayesian networks / Probabilistic Graphical Models. Maybe the same effect can be achieved by introducing a bias (regularization) against connections between edges (similar to the idea of sparsity).

Matroids

November 28, 2012 in Math by hundalhh | 2 comments

Matroids are an abstraction of Vector Spaces. Vectors spaces have addition and scalar multiplication and the axioms of vectors spaces lead to the concepts of subspaces, rank, and independent sets. All Vector Spaces are Matroids, but Matroids only retain the properties of independence, rank, and subspaces. The idea of the Matroid closure of a set is the same as the span of the set in the Vector Space.

100 Most useful Theorems and Ideas in Mathematics

November 27, 2012 in Math by hundalhh | 6 comments

So I have been thinking about which ideas in mathematics I use most often and I have listed them below. Please feel free to comment because I would like to use this list as a starting point for a future list of “The most useful ideas in Mathematics for Scientists and Engineers”.

$\ $

I showed this list to Carl and he said his list would be completely different. (He choked a bit when he saw primes at #71.) Hopefully we can post his list as well as others.

$\ $

counting
zero
integer decimal positional notation 100, 1000, …
the four arithmetic operations + – * /
fractions
decimal notation 0.1, 0.01, …
basic propositional logic (Modus ponens, contrapositive, If-then, and, or, nand, …)
negative numbers
equivalence classes
equality & substitution
basic algebra – idea of variables, equations, …
the idea of probability
commutative and associative properties
distributive property
powers (squared, cubed,…), – compound interest (miracle of)
scientific notation 1.3e6 = 1,300,000
polynomials
first order predicate logic
infinity
irrational numbers
De Morgan’s laws
statistical independence
the notion of a function
square root (cube root, …)
inequalities (list of inequalities)
power laws (i.e. $a^b a^c = a^{b+c}$ )
Cartesian coordinate plane
basic set theory
random variable
probability distribution
histogram
the mean, expected value & strong law of large numbers
the graph of a function
standard deviation
Pythagorean theorem
vectors and vector spaces
limits
real numbers as limits of fractions, the least upper bound
continuity
$R^n$, Euclidean Space, and Hilbert spaces (inner or dot product)
derivative
correlation
central limit theorem, Gaussian Distribution, Properties of Guassains.
integrals
chain rule
modular arithmetic
sine cosine tangent
$\pi$, circumference, area, and volume formulas for circles, rectangles, parallelograms, triangles, spheres, cones,…
linear regression
Taylor’s theorem
the number e and the exponential function
Rolle’s theorem, Karush–Kuhn–Tucker conditions, derivative is zero at the maximum
the notion of linearity
Big O notation
injective (one-to-one) / surjective (onto) functions
imaginary numbers
symmetry
Euler’s Formula $e^{i \pi} + 1 = 0$
Fourier transform, convolution in time domain is the product in the frequency domain (& vice versa), the FFT
fundamental theorem of calculus
logarithms
matrices
conic sections
Boolean algebra
Cauchy–Schwarz inequality
binomial theorem – Pascal’s triangle
the determinant
ordinary differential equation (ODE)
mode (maximum likelihood estimator)
cosine law
prime numbers
linear independence
Jacobian
fundamental theorem of arithmetic
duality – (polyhedron faces & points, geometry lines and points, Dual Linear Program, dual space, …)
intermediate value theorem
eigenvalues
median
entropy
KL distance
binomial distribution
Bayes’ theorem
$2^{10} \approx 1000$
compactness, Heine – Borel theorem
metric space, Triangle Inequality
Projections, Best Approximation
$1/(1-X) = 1 + X + X^2 + \ldots$
partial differential equations
quadratic formula
Reisz representation theorem
Fubini’s theorem
the ideas of groups, semigroups, monoids, rings, …
Singular Value Decomposition
numeric integration – trapezoidal rule, Simpson’s rule, …
mutual information
Plancherel’s theorem
matrix condition number
integration by parts
Euler’s method for numerical integration of ODEs (and improved Euler & Runge–Kutta)
pigeon hole principle

There is a long list of mathematical ideas that I use less often. Here’s a sampling: Baire category theorem, Banach Spaces, Brouwer Fixed Point Theorem, Carathéodory’s Theorem, Category Theory, Cauchy integral formula, calculus of variations, closed graph theorem, Chinese remainder theorem, Clifford algebra (quaternions), Context Free Grammars, countable vs uncountable infinity, Cramer’s Rule, cohomology, Euclidean algorithm, fundamental group, Gauss’ Law, Grassmannian algebra , Graph Theory, Hahn-Banach Theorem, homology, Hairy Ball Theorem, Hölder’s inequality, inclusion-exclusion, Jordan Decomposition, Kalman Filters, Markov Chains (Hidden Markov Models), modules, non-associative algebras, Picard’s Great Theorem, Platonic/Euclidean solids, Principle of Induction, Probabilistic Graphical Models (Bayesian Networks, Markov Random Fields), Pontryagain duality, Quaternions, Spectral Theorem, Sylow p subgroup, repeating decimals equal a fraction, ring ideals, sine law, tensors, tessellation, transcendental numbers, Uniform Boundedness Theorem, Weierstrass approximation theorem, …

History of Compressed Sensing

November 26, 2012 in Sparsity by hundalhh | Permalink

Igor Carron provides a nice history of Compressed Sensing on his blog. He reviews the computation complexity of various Compressed Sensing algorithms, noting that the field started with the discovery of polynomial time solutions to underdetermined linear problems. While $l_0$ norm regularization results in NP-complete problems, $l_1$ regularization is easier.

TIL: A trick for hits on your blog

November 25, 2012 in Uncategorized by hundalhh | Permalink

If you want hits on your blog, write about an article that is being read by thousands or millions of people. Some of those readers will Google terms from the article. Today I blogged very briefly about the NY Times article “Scientists See Promise in Deep-Learning Programs” and I was surprised at the number of referrals from Google. Hmm, maybe I should blog about current TV shows (Big Bang? Mythbusters?), movies (Cloud Atlas?), and video games (Call of Duty?). Carl suggested that I apply deep learning, support vector machines, and logistic regression to estimate the number of hits I will get on a post. If I used restricted Boltzmann machines, I could run it in generative mode to create articles I was afraid if I went down that route I would eventually have a fantastically popular blog about Britney Spears.

Deep Learning in The NY Times

November 24, 2012 in Deep Belief Networks by hundalhh | Permalink

John Markoff at the NY times writes about recent advances in and implications of deep neural nets in “Scientists See Promise in Deep-Learning Programs“. He reviews the recent activities of Hinton, LeCun, and Ng as well as applications of deep learning in speech recognition, computer vision, chemistry (Kaggel Merck Prize), and near real-time natural language processing and machine translation.

Gauss’s continued fraction

November 24, 2012 in Math by hundalhh | Permalink

Gauss did a lot of math, but sometimes I am surprised when I find something new to me done by Gauss

$\tan(x) = { 1 \over{1/x \ – {1\over{3/x \ – {1\over{5/x\ – \ldots}}}}}}$

“Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery”

November 22, 2012 in Clustering by hundalhh | Permalink

In “Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery“, Niekum and Barto (2011) apply Sub-goal Discovery and Skill Discovery to improve Reinforcement Learning on Markov Decision Processes. They write,

“Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-speciﬁc subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. For example, opening a door ought to be the same skill whether an agent is one inch or two inches away from the door, or whether the door is red or blue; making each possible configuration a separate skill would be unwise. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal.

We show that these problems can be overcome by collecting subgoal data from a series of tasks and clustering it in an agent-space [9], a shared feature space across multiple tasks. The resulting clusters generalize subgoals within and across tasks and can be used as templates for portable skill termination conditions. Clustering also allows the creation of skill termination conditions in a datadriven way that makes minimal assumptions and can be tailored to the domain through a careful choice of clustering algorithm. Additionally, this framework extends the utility of single-state subgoal discovery algorithms to continuous domains, in which the agent may never see the same state twice. We argue that clustering based on a Dirichlet process mixture model is appropriate in the general case when little is known about the nature or number of skills needed in a domain. Experiments in a continuous domain demonstrate the utility of this approach and illustrate how it may be useful even when traditional subgoal discovery methods are infeasible.”

They apply Dirichlet process mixture models (more specifically an inﬁnite Gaussian mixture model with a Dirichlet prior on the number of Gaussians and the parameters of the Gaussians) using expectation-maximization to do clustering of the states—each cluster corresponding to a stratagem referred to as a “latent option”. Experimental results in an environment similar to the Lightworld domain (see “Building Portable Options: Skill Transfer in Reinforcement Learning” by Konidaris and Barto (2007)) are reported.

Topological Sorting

November 20, 2012 in Math by hundalhh | Permalink

Given a directed a-cyclic graph (DAG) find an ordering of the nodes which is consistent with the directed edges. This is called Topological Sorting and several algorithms exist.

New Developments in Iterated Prisoner’s Dilemma

November 18, 2012 in Games by hundalhh | Permalink

In “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” by Pressa and Dyson (2012) show that there are strategies, named “zero determinant strategies” (ZD) which “win” against so-called evolutionary strategies in Prisoner’s Dilemma. An easy to read, less mathematical summary of the paper is given in “Extortion and cooperation in the Prisoner’s Dilemma” by Stewart and Plotkin. Fans of Tit-for-Tat need not fear because Tit-for-Tat does not lose significantly against the ZD strategies. The idea of the ZD strategies is not very complicated. The ZD player cooperates enough that it is profitable to cooperate with the ZD player, but the ZD player defects enough that 1) it is not profitable to defect against the ZD player, 2) the ZD player wins more than his opponent when the opponent cooperates.

« Older entries

Artificial Intelligence Blog

November 2012

“Brains, Sex, and Machine Learning”

Matroids

100 Most useful Theorems and Ideas in Mathematics

History of Compressed Sensing

TIL: A trick for hits on your blog

Deep Learning in The NY Times

Gauss’s continued fraction

“Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery”

Topological Sorting

New Developments in Iterated Prisoner’s Dilemma

Categories

Archives

Blogroll

GameTheory

Subscribe to ArtEnt via Email