Articles by carl

You are currently browsing carl’s articles.

Wired has an interesting article “Darpa Has Seen the Future of Computing … And It’s Analog”.

“One of the things that’s happened in the last 10 to 15 years is that power-scaling has stopped,” … Moore’s law — the maxim that processing power will double every 18 months or so — continues, but battery lives just haven’t kept up. “The efficiency of computation is not increasing very rapidly,” ….

I discovered automatic differentiation a few weeks ago, and I can’t believe I had never heard of it before. Although I believe everybody who has more than a passing knowledge of algorithms (especially numerical algorithms!) should know about it, apparently very few do.

I will just give a very brief introduction here before pointing out a few good sources of information.

First of all, automatic differentiation — “autodiff” — is neither numerical differentiation nor symbolic differentiation, although it does calculate exact derivatives!

In the formulation that I find most astonishing, autodiff uses object-orientied programming techniques and operator overloading to simultaneously and transparently turn all (differentiable) function calculations into evaluations of the function’s first derivative.

More concretely, instead of calculating with a floating point variable x, we instead calculate with an object x that has two data components (x.value and x.deriv, say), the value and the derivative, and has methods that overload all of the mathematical functions and operators in the language. So that, for example, when one calculates y = cos x one is automatically calculating both $y = \cos x$ and $y’ = – \sin x$ and with the results being stored in y.value and y.deriv! Operator overloading covers cases like x ^ 3, 3 ^ x, or even x ^ x using the standard rules of differentiation. And once one realizes how that works, then expressions like x + y, x * y, and x / y become easy as well. In this way, all calculations can be handled.

It should be noted that this works for all numerical computations, not just calculations involving mathematical formulas, and that it can be easily generalized to calculating arbitrary higher-order derivatives.

By the way, the technique outlined above is “forward mode” autodiff; it is less obvious that there is also “reverse mode” autodiff. Forward mode is more efficient for functions of single variables; reverse mode is more efficient for functions with a single output value (i.e., real-valued as opposed to vector-valued functions). It turns out that reverse mode autodiff is a generalization of neural net back-propagation and was actually discovered before backprop!

Justin Domke made a great blog post on automatic differentiation in 2009; and Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming is a very accessible paper on actually implementing autodiff in Matlab.

Finally, seems to be the home of all things autodiff on the web.

Suppose some Intelligent Algorithm can learn to play tic-tac-toe. Tic-tac-toe is such a simple game that it can be exhaustively learned via memorization – but this is not what I mean here. I mean that the intelligent algorithm (IA) actually has the ability to generally learn, and that it has used this ability to learn tic-tac-toe. What would one like the IA to now know? Well, in addition to knowing the rules and optimal strategy of the game, the IA should also have figured out that tic-tac-toe is played on a planar grid, and that the winning positions correspond to horizontal, vertical, and diagonal lines on this grid.

Now after it has learned 3 x 3 tic-tac-toe, suppose that we want our IA to also learn to play 4 x 4 tic-tac-toe (perhaps this game should be called tic-toc-toe-tum). The IA should be able to use what it has learned in the 3 x 3 game to more easily learn the 4 x 4 game, and it should explicitly “understand” this relationship. Concretely, what does this mean?

I believe this is one of the couple fundamental open problems in machine learning.

I just discover the Face Recognition Homepage while looking for a database of images of faces. But it also is a good place to find algorithms, source code, and other good stuff!

“How Many Computers to Identify a Cat? 16,000” is a New York Times article about Deep Belief Networks, “the hottest thing in the speech recognition field these days.”