Matroids are an abstraction of Vector Spaces. Vectors spaces have addition and scalar multiplication and the axioms of vectors spaces lead to the concepts of subspaces, rank, and independent sets. All Vector Spaces are Matroids, but Matroids only retain the properties of independence, rank, and subspaces. The idea of the Matroid closure of a set is the same as the span of the set in the Vector Space.
Mathematician and Father. Into games, astronomy, psychology and philosophy.
- counting
- zero
- integer decimal positional notation 100, 1000, …
- the four arithmetic operations + – * /
- fractions
- decimal notation 0.1, 0.01, …
- basic propositional logic (Modus ponens, contrapositive, If-then, and, or, nand, …)
- negative numbers
- equivalence classes
- equality & substitution
- basic algebra – idea of variables, equations, …
- the idea of probability
- commutative and associative properties
- distributive property
- powers (squared, cubed,…), – compound interest (miracle of)
- scientific notation 1.3e6 = 1,300,000
- polynomials
- first order predicate logic
- infinity
- irrational numbers
- De Morgan’s laws
- statistical independence
- the notion of a function
- square root (cube root, …)
- inequalities (list of inequalities)
- power laws (i.e. $a^b a^c = a^{b+c}$ )
- Cartesian coordinate plane
- basic set theory
- random variable
- probability distribution
- histogram
- the mean, expected value & strong law of large numbers
- the graph of a function
- standard deviation
- Pythagorean theorem
- vectors and vector spaces
- limits
- real numbers as limits of fractions, the least upper bound
- continuity
- $R^n$, Euclidean Space, and Hilbert spaces (inner or dot product)
- derivative
- correlation
- central limit theorem, Gaussian Distribution, Properties of Guassains.
- integrals
- chain rule
- modular arithmetic
- sine cosine tangent
- $\pi$, circumference, area, and volume formulas for circles, rectangles, parallelograms, triangles, spheres, cones,…
- linear regression
- Taylor’s theorem
- the number e and the exponential function
- Rolle’s theorem, Karush–Kuhn–Tucker conditions, derivative is zero at the maximum
- the notion of linearity
- Big O notation
- injective (one-to-one) / surjective (onto) functions
- imaginary numbers
- symmetry
- Euler’s Formula $e^{i \pi} + 1 = 0$
- Fourier transform, convolution in time domain is the product in the frequency domain (& vice versa), the FFT
- fundamental theorem of calculus
- logarithms
- matrices
- conic sections
- Boolean algebra
- Cauchy–Schwarz inequality
- binomial theorem – Pascal’s triangle
- the determinant
- ordinary differential equation (ODE)
- mode (maximum likelihood estimator)
- cosine law
- prime numbers
- linear independence
- Jacobian
- fundamental theorem of arithmetic
- duality – (polyhedron faces & points, geometry lines and points, Dual Linear Program, dual space, …)
- intermediate value theorem
- eigenvalues
- median
- entropy
- KL distance
- binomial distribution
- Bayes’ theorem
- $2^{10} \approx 1000$
- compactness, Heine – Borel theorem
- metric space, Triangle Inequality
- Projections, Best Approximation
- $1/(1-X) = 1 + X + X^2 + \ldots$
- partial differential equations
- quadratic formula
- Reisz representation theorem
- Fubini’s theorem
- the ideas of groups, semigroups, monoids, rings, …
- Singular Value Decomposition
- numeric integration – trapezoidal rule, Simpson’s rule, …
- mutual information
- Plancherel’s theorem
- matrix condition number
- integration by parts
- Euler’s method for numerical integration of ODEs (and improved Euler & Runge–Kutta)
- pigeon hole principle
Igor Carron provides a nice history of Compressed Sensing on his blog. He reviews the computation complexity of various Compressed Sensing algorithms, noting that the field started with the discovery of polynomial time solutions to underdetermined linear problems. While $l_0$ norm regularization results in NP-complete problems, $l_1$ regularization is easier.
If you want hits on your blog, write about an article that is being read by thousands or millions of people. Some of those readers will Google terms from the article. Today I blogged very briefly about the NY Times article “Scientists See Promise in Deep-Learning Programs” and I was surprised at the number of referrals from Google. Hmm, maybe I should blog about current TV shows (Big Bang? Mythbusters?), movies (Cloud Atlas?), and video games (Call of Duty?). Carl suggested that I apply deep learning, support vector machines, and logistic regression to estimate the number of hits I will get on a post. If I used restricted Boltzmann machines, I could run it in generative mode to create articles I was afraid if I went down that route I would eventually have a fantastically popular blog about Britney Spears.
John Markoff at the NY times writes about recent advances in and implications of deep neural nets in “Scientists See Promise in Deep-Learning Programs“. He reviews the recent activities of Hinton, LeCun, and Ng as well as applications of deep learning in speech recognition, computer vision, chemistry (Kaggel Merck Prize), and near real-time natural language processing and machine translation.
Gauss did a lot of math, but sometimes I am surprised when I find something new to me done by Gauss
$\tan(x) = { 1 \over{1/x \ – {1\over{3/x \ – {1\over{5/x\ – \ldots}}}}}}$
In “Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery“, Niekum and Barto (2011) apply Sub-goal Discovery and Skill Discovery to improve Reinforcement Learning on Markov Decision Processes. They write,
“Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. For example, opening a door ought to be the same skill whether an agent is one inch or two inches away from the door, or whether the door is red or blue; making each possible configuration a separate skill would be unwise. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal.
We show that these problems can be overcome by collecting subgoal data from a series of tasks and clustering it in an agent-space [9], a shared feature space across multiple tasks. The resulting clusters generalize subgoals within and across tasks and can be used as templates for portable skill termination conditions. Clustering also allows the creation of skill termination conditions in a datadriven way that makes minimal assumptions and can be tailored to the domain through a careful choice of clustering algorithm. Additionally, this framework extends the utility of single-state subgoal discovery algorithms to continuous domains, in which the agent may never see the same state twice. We argue that clustering based on a Dirichlet process mixture model is appropriate in the general case when little is known about the nature or number of skills needed in a domain. Experiments in a continuous domain demonstrate the utility of this approach and illustrate how it may be useful even when traditional subgoal discovery methods are infeasible.”
They apply Dirichlet process mixture models (more specifically an infinite Gaussian mixture model with a Dirichlet prior on the number of Gaussians and the parameters of the Gaussians) using expectation-maximization to do clustering of the states—each cluster corresponding to a stratagem referred to as a “latent option”. Experimental results in an environment similar to the Lightworld domain (see “Building Portable Options: Skill Transfer in Reinforcement Learning” by Konidaris and Barto (2007)) are reported.
Given a directed a-cyclic graph (DAG) find an ordering of the nodes which is consistent with the directed edges. This is called Topological Sorting and several algorithms exist.
In “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” by Pressa and Dyson (2012) show that there are strategies, named “zero determinant strategies” (ZD) which “win” against so-called evolutionary strategies in Prisoner’s Dilemma. An easy to read, less mathematical summary of the paper is given in “Extortion and cooperation in the Prisoner’s Dilemma” by Stewart and Plotkin. Fans of Tit-for-Tat need not fear because Tit-for-Tat does not lose significantly against the ZD strategies. The idea of the ZD strategies is not very complicated. The ZD player cooperates enough that it is profitable to cooperate with the ZD player, but the ZD player defects enough that 1) it is not profitable to defect against the ZD player, 2) the ZD player wins more than his opponent when the opponent cooperates.
The NLTK Python Library contains a large number of packages for text manipulation and classification. It includes routines for classification (maximum entropy, naive Bayes, support vector machines, an interface to the Weka library, expectation maximization, k-means, conditional random fields,…), text-manipulation, parsing, and graphics.