August 2012

You are currently browsing the monthly archive for August 2012.

Marc Anderson writes

My own theory is that we are in the middle of a dramatic and broad technological and economic shift in which software companies are poised to take over large swathes of the economy….Over the next 10 years, I expect many more industries to be disrupted by software, with new world-beating Silicon Valley companies doing the disruption in more cases than not.

Six decades into the computer revolution, four decades since the invention of the microprocessor, and two decades into the rise of the modern Internet, all of the technology required to transform industries through software finally works and can be widely delivered at global scale.

in this Wall Street Journal article.

In “Machine Learning Techniques for Stock Prediction”, Vatsal H. Shah (2007) evaluates several machine learning techniques applied to stock market prediction. The techniques used are: support vector machines, linear regression, “prediction using decision stumps”, expert weighting, text data mining, and online learning (the code was from YALE/Weka). The main stock features used were moving averages, exponential moving average, rate of change, and relative strength index. He concludes with “Of all the Algorithms we applied, we saw that only Support Vector Machine combined with Boosting gave us satisfactory results.


I am quite excited about the Julia language (windows download, manual). It’s free. It’s almost the same as Matlab, but it is as fast as C++ (much faster than Matlab and Octave, 160 times faster in the example below). Here is a quick comparison.

Matlab code (primeQ.m):

function b = primeQ( i )
   for j=2:ceil(i/2.0)
       if mod(i,j) == 0
           b = false;
   b = true;

Matlab input:

tic; primeQ(71378569); toc

Matlab output:

Elapsed time is 52.608765 seconds.

Julia code (primeQ.jl):

function primeQ( i )
   for j=2:ceil(i/2.0)
       if mod(i,j) == 0
           return false;
   return true 

Julia input:


tic(); primeQ(71378569); toc()

Julia output:

elapsed time: 0.3280000686645508 seconds

In “A Review of Studies on Machine Learning Techniques”, Singh, Bhatia, and Sangwan (2007) comment on neural nets, self organizing maps, case based reasoning, classification trees (CART), rule induction, and genetic algorithms. They include a nice chart at the end of the article that could be quite useful for managers.

Wired has an interesting article “Darpa Has Seen the Future of Computing … And It’s Analog”.

“One of the things that’s happened in the last 10 to 15 years is that power-scaling has stopped,” … Moore’s law — the maxim that processing power will double every 18 months or so — continues, but battery lives just haven’t kept up. “The efficiency of computation is not increasing very rapidly,” ….

I have always liked the K-center algorithm. K-center tends to cover the data set uniformly rather than concentrating on the high density areas (like K-means). Also, K-center does well if small outlier clusters belong to different classes, whereas K-means tends to ignore small clusters. Check out K-Center and Dendrogram Clustering: Applications to Image Segmentation for some nice pictures.

In “Machine Learning Techniques—Reductions Between Prediction Quality Metrics” Beygelzimer, Langford, and Zadrozny (2009?) summarize a bunch of “techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function.” They give a simplified overview of machine learning algorithms and sampling methods relating them to error correcting codes and regret minimization.

The New York Time’s article “Skilled Work, Without the Worker” describe how robots in places like the Netherlands and California are starting to displace Chinese factories filled with low skilled workers. They write

The falling costs and growing sophistication of robots have touched off a renewed debate among economists and technologists over how quickly jobs will be lost. This year, Erik Brynjolfsson and Andrew McAfee, economists at the Massachusetts Institute of Technology, made the case for a rapid transformation. “The pace and scale of this encroachment into human skills is relatively recent and has profound economic implications,” they wrote in their book, “Race Against the Machine.”

In a review of “Race Against the Machine”, Bill Jarvis writes

In “Race Against the Machine”, economists Erik Brynjolfsson and Andrew McAfee ask the question: Could technology be destroying jobs? They then expand on that to explore whether advancing information technology might be an important contributor to the current unemployment disaster. The authors argue very convincingly that the answer to both questions is YES.

which reminds me of Robin Hanson’s paper “Economic Growth Given Machine Intelligence”.

I’ve been looking at versions of Adaboost that are less sensitive to noise such as Softboost. Softboost works by ignoring a number of outliers set by the user (the parameter $v$), finding weak learners that are not highly correlated with the weak learners already in the boosted learner mix and updating the distribution by KL projection onto the set of distributions restricted to those uncorrelated to the mistakes of the latest learner and not placing too much weight on any particular data point. Softboost avoids over fitting by stopping when no feasible point is found for the KL projection.

In “Soft Margins for AdaBoost”, Ratsch, Onoda, and Muller, generalize Adaboost by adding a softening parameter $\phi$ to the distribution update step. They relate soft boosting to simulated annealing and minimization of a generalized exponential loss function. The paper has numerous helpful graphs and experimental data.

Machine Learning Links from Google and
Long, Informative Articles

Computer Vision, Image Processing Blog

Causality Blog

Stack Exchange for Statistics

Machine Learning News Google Group!forum/ml-news

MetaOptimize Stack Exchange

Reddit Machine Learning

Stack Overflow Datamining

Stack Overflow Machine Learning

Software, Machine Learning, Science and Math

Julia language


Alexandre Passos’ research blog
Real Commentary on Real Machine Learning Techniques & Papers

Anand Sarwate
Frequent, Varied articles including ML

Peekaboo Andy’s Computer Vision and Machine Learning Blog

Andrew Eckford: The Blog
Lots of notes about conferences

Andrew Rosenberg
Great Material on NLP and ML

Read for fun

Brian Chesney
Informative, Numer Analysis, Optimization, ML

Daniel Lemire’s blog
Interesting Thoughts on Science, Software, and Global Warming

Frank Nielsen: Computational Information Geometry Wonderland
Blog on Information Theory, Image Processing, Statistics, …

Igor Carron’s Nuit Blanche
Great blog on Statistics, Modelling Dynamic Systems, Data Mining, Compressive Sensing, Signal Processing, …

Jonathan Manton’s Blog
A mathematician writes numerous in-depth posts on Numerical Analysis, Software, Probability, Teaching, …

Jürgen Schmidhuber’s Home Page
Not a blog, but it is a good resource

Paul Mineiro: Machined Learnings
Many posts

Radford Neal’s Blog
Theory of Statstics and Information Theory

Rob Hyndman: Research tips

Rod Carvalho: Stochastix
Math, Probability, Haskell, Numerical Methods,

Roman Shapovalov: Computer Blindness
Graphical Models, Learning Theory, Computer Vision, ML

Shubhendu Trivedi: Onionesque Reality
Personal Blog, Abstract Ideas, Math, ML, …

Suresh: The Geomblog
Teaching, DataMining, ML, Geometry, Computational Geometry

Roman Shapovalov: Computer Blindness
Graphical Models, Learning Theory, Computer Vision, ML

Sami Badawi: Hadoop comparison
Computer Languages, AI, NLP

Shubhendu Trivedi: Onionesque Reality
Personal Blog, Abstract Ideas, Math, ML, …

Suresh: The Geomblog
Teaching, DataMining, ML, Geometry, Computational Geometry

Terran Lane: Ars Experientia
ML, Teaching

Terry Tao’s Blog
Anything mathematical — deep



« Older entries