# August 2012

You are currently browsing the monthly archive for August 2012.

## “Why Software Is Eating The World”

Marc Anderson writes

My own theory is that we are in the middle of a dramatic and broad technological and economic shift in which software companies are poised to take over large swathes of the economy….Over the next 10 years, I expect many more industries to be disrupted by software, with new world-beating Silicon Valley companies doing the disruption in more cases than not.

Six decades into the computer revolution, four decades since the invention of the microprocessor, and two decades into the rise of the modern Internet, all of the technology required to transform industries through software finally works and can be widely delivered at global scale.

in this Wall Street Journal article.

## “Machine Learning Techniques for Stock Prediction”

In “Machine Learning Techniques for Stock Prediction”, Vatsal H. Shah (2007) evaluates several machine learning techniques applied to stock market prediction. The techniques used are: support vector machines, linear regression, “prediction using decision stumps”, expert weighting, text data mining, and online learning (the code was from YALE/Weka). The main stock features used were moving averages, exponential moving average, rate of change, and relative strength index. He concludes with “Of all the Algorithms we applied, we saw that only Support Vector Machine combined with Boosting gave us satisfactory results.

## Julia!

I am quite excited about the Julia language (windows download, manual). It’s free. It’s almost the same as Matlab, but it is as fast as C++ (much faster than Matlab and Octave, 160 times faster in the example below). Here is a quick comparison.

Matlab code (primeQ.m):

function b = primeQ( i )
for j=2:ceil(i/2.0)
if mod(i,j) == 0
b = false;
return
end
end
b = true;
end


Matlab input:

tic; primeQ(71378569); toc

Matlab output:

Elapsed time is 52.608765 seconds.

Julia code (primeQ.jl):

function primeQ( i )
for j=2:ceil(i/2.0)
if mod(i,j) == 0
return false;
end
end
return true
end


Julia input:

tic(); primeQ(71378569); toc()

Julia output:

elapsed time: 0.3280000686645508 seconds

## Review of Machine Learning Techniques for Non-specialists

In “A Review of Studies on Machine Learning Techniques”, Singh, Bhatia, and Sangwan (2007) comment on neural nets, self organizing maps, case based reasoning, classification trees (CART), rule induction, and genetic algorithms. They include a nice chart at the end of the article that could be quite useful for managers.

## DARPA: The future of computing is analog

Wired has an interesting article “Darpa Has Seen the Future of Computing … And It’s Analog”.

“One of the things that’s happened in the last 10 to 15 years is that power-scaling has stopped,” … Moore’s law — the maxim that processing power will double every 18 months or so — continues, but battery lives just haven’t kept up. “The efficiency of computation is not increasing very rapidly,” ….

## The K-center algorithm

I have always liked the K-center algorithm. K-center tends to cover the data set uniformly rather than concentrating on the high density areas (like K-means). Also, K-center does well if small outlier clusters belong to different classes, whereas K-means tends to ignore small clusters. Check out K-Center and Dendrogram Clustering: Applications to Image Segmentation for some nice pictures.

## Reductions Paper

In “Machine Learning Techniques—Reductions Between Prediction Quality Metrics” Beygelzimer, Langford, and Zadrozny (2009?) summarize a bunch of “techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function.” They give a simplified overview of machine learning algorithms and sampling methods relating them to error correcting codes and regret minimization.

## “At what point does the chain saw replace Paul Bunyan?”

The New York Time’s article “Skilled Work, Without the Worker” describe how robots in places like the Netherlands and California are starting to displace Chinese factories filled with low skilled workers. They write

The falling costs and growing sophistication of robots have touched off a renewed debate among economists and technologists over how quickly jobs will be lost. This year, Erik Brynjolfsson and Andrew McAfee, economists at the Massachusetts Institute of Technology, made the case for a rapid transformation. “The pace and scale of this encroachment into human skills is relatively recent and has profound economic implications,” they wrote in their book, “Race Against the Machine.”

In a review of “Race Against the Machine”, Bill Jarvis writes

In “Race Against the Machine”, economists Erik Brynjolfsson and Andrew McAfee ask the question: Could technology be destroying jobs? They then expand on that to explore whether advancing information technology might be an important contributor to the current unemployment disaster. The authors argue very convincingly that the answer to both questions is YES.

which reminds me of Robin Hanson’s paper “Economic Growth Given Machine Intelligence”.

## Softboost

I’ve been looking at versions of Adaboost that are less sensitive to noise such as Softboost. Softboost works by ignoring a number of outliers set by the user (the parameter $v$), finding weak learners that are not highly correlated with the weak learners already in the boosted learner mix and updating the distribution by KL projection onto the set of distributions restricted to those uncorrelated to the mistakes of the latest learner and not placing too much weight on any particular data point. Softboost avoids over fitting by stopping when no feasible point is found for the KL projection.

In “Soft Margins for AdaBoost”, Ratsch, Onoda, and Muller, generalize Adaboost by adding a softening parameter $\phi$ to the distribution update step. They relate soft boosting to simulated annealing and minimization of a generalized exponential loss function. The paper has numerous helpful graphs and experimental data.

## Even More Machine Learning Blogs

Long, Informative Articles
http://www.swkorridor.dk/en/blogs/machine_learning_applications/

Computer Vision, Image Processing Blog
http://quantombone.blogspot.com/

Causality Blog
http://www.mii.ucla.edu/causality/

Stack Exchange for Statistics
http://stats.stackexchange.com/

MetaOptimize Stack Exchange
http://metaoptimize.com/qa/

Reddit Machine Learning
http://www.reddit.com/r/machinelearning

Stack Overflow Datamining
http://stackoverflow.com/questions/tagged/data-mining

Stack Overflow Machine Learning
http://stackoverflow.com/questions/tagged/machine-learning

MLoss
http://mloss.org/community/
Software, Machine Learning, Science and Math

Julia language http://julialang.org/blog/

Alexandre Passos’ research blog
http://atpassos.posterous.com/
Real Commentary on Real Machine Learning Techniques & Papers

Anand Sarwate
http://ergodicity.net/
Frequent, Varied articles including ML

Peekaboo Andy’s Computer Vision and Machine Learning Blog
http://peekaboo-vision.blogspot.com/

Andrew Eckford: The Blog
http://andreweckford.blogspot.com/

Andrew Rosenberg
http://spokenlanguageprocessing.blogspot.com/
Great Material on NLP and ML

Freakonomics
http://freakonometrics.blog.free.fr/index.php/

Brian Chesney
http://bpchesney.org/
Informative, Numer Analysis, Optimization, ML

Daniel Lemire’s blog
http://lemire.me/blog/
Interesting Thoughts on Science, Software, and Global Warming

Frank Nielsen: Computational Information Geometry Wonderland
http://blog.informationgeometry.org/index.php
Blog on Information Theory, Image Processing, Statistics, …

Igor Carron’s Nuit Blanche
http://nuit-blanche.blogspot.com/
Great blog on Statistics, Modelling Dynamic Systems, Data Mining, Compressive Sensing, Signal Processing, …

Jonathan Manton’s Blog
http://jmanton.wordpress.com/
A mathematician writes numerous in-depth posts on Numerical Analysis, Software, Probability, Teaching, …

http://www.idsia.ch/~juergen/
Not a blog, but it is a good resource

Paul Mineiro: Machined Learnings
http://www.machinedlearnings.com/
Many posts

Theory of Statstics and Information Theory

Rob Hyndman: Research tips
http://robjhyndman.com/researchtips/
Forecasting

Rod Carvalho: Stochastix
http://stochastix.wordpress.com/

Roman Shapovalov: Computer Blindness
http://computerblindness.blogspot.com/
Graphical Models, Learning Theory, Computer Vision, ML

Shubhendu Trivedi: Onionesque Reality
http://onionesquereality.wordpress.com/
Personal Blog, Abstract Ideas, Math, ML, …

Suresh: The Geomblog
http://geomblog.blogspot.com/
Teaching, DataMining, ML, Geometry, Computational Geometry

Roman Shapovalov: Computer Blindness
http://computerblindness.blogspot.com/
Graphical Models, Learning Theory, Computer Vision, ML

Computer Languages, AI, NLP

Shubhendu Trivedi: Onionesque Reality
http://onionesquereality.wordpress.com/
Personal Blog, Abstract Ideas, Math, ML, …

Suresh: The Geomblog
http://geomblog.blogspot.com/
Teaching, DataMining, ML, Geometry, Computational Geometry

Terran Lane: Ars Experientia