Articles by hundalhh

Mathematician and Father. Into games, astronomy, psychology and philosophy.

In “Gambling in a rigged casino: The adversarial multi-armed bandit problem” Auer, Cesa-Bianchi, Freund, and Schapire (1998) give an algorithm for adversarial multi-armed bandits with at worst $O(\sqrt{T K \log K })$ regret for $K$ bandits over $T$ tries which is similar to the weighted majority algorithm. They also establish a lower bound on the regret of $O(\sqrt{T K})$ for the adversarial case.  In section 9, the authors prove that this algorithm can be applied to two person finite games to compute the value of the game for both players.

The paper develops four algorithms in sequence called Hedge, Exp3, Exp3.1, and Exp4.   Each algorithm uses a previous algorithm as a subroutine.  Hedge is a full information algorithm that requires observation of the untried arms after choosing one arm.  Exp3 only requires a bound on the maximum expected value possible at time $T$ and Exp3.1 operates by guessing the maximal expected value.  Exp4 is a mixture of experts algorithm.

In the related paper “The non-stochastic multi-armed bandit problem” by the same authors, the algorithms are extended and the regret bounds are improved. At the end of this paper, they explore applications to Game Theory.

In “Statistical Learning Algorithms Based on Bregman Distances“,  La erty, Pietra, and Pietra (1999) take a fairly standard entropy based tree growing algorithm and replace KL distance with Bregman distance. “In the feature selection step, each linear constraint in a pool of candidate features is evaluated by the reduction in Bregman distance that would result from adding it to the model.”  This is reminiscent of the distribution update step in Softboost and the “information projection” view of Adaboost (See Boosting as Entropy Projection).  The paper is somewhat technical containing interesting theorem proving techniques.

 

Kdnuggets.com posted these poll results on data mining software.  Looks like R, Excel, Rapid-I RapidMiner, KNIME, and Weka/Pentaho were the most popular.

Rolling Dice Mod 3

It is fairly obvious if you think about it and if you are a mathematician, but I was surprised by this theorem.

Theorem:  Roll any number of dice all of which can have any (finite) number of sides except at least one die which is the standard six sided die.  The probability that the sum is divisible by three is exactly 1/3.  Also, the probability that the sum is odd is exactly 1/2.

Corollary:  Roll any number of six sided dice. The probability that the sum is divisible by 3 is 1/3.  The probability that the sum is even is exactly 1/2.

In “Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation“, Ni, Saunders, Szedmak, and Niranjan (2011) create a “phrase reordering model” for statistical machine translation.  They apply their method to a Chinese-English corpus to match phrases in each language.  They compare their method to well known maximum entropy methods, support vector machines, maximum margin regression, and max-margin structure learning while giving short summaries on how each method is applied.  I’m very impressed with their writing style and the content of the paper.  The concept of maximum margin regression (similar to SVM) is explored in “Learning via Linear Operators: Maximum Margin Regression; Multiclass and Multiview Learning at One-class Complexity” by Szedmak, and Shawe-Taylor, and Parado-Hernandez (2006).  Max-margin structure learning is described in “Max–margin markov networks” by Taskar, Guestrin, and Koller (NIPS 2003).

Marc Anderson writes

My own theory is that we are in the middle of a dramatic and broad technological and economic shift in which software companies are poised to take over large swathes of the economy….Over the next 10 years, I expect many more industries to be disrupted by software, with new world-beating Silicon Valley companies doing the disruption in more cases than not.

Six decades into the computer revolution, four decades since the invention of the microprocessor, and two decades into the rise of the modern Internet, all of the technology required to transform industries through software finally works and can be widely delivered at global scale.

in this Wall Street Journal article.

In “Machine Learning Techniques for Stock Prediction”, Vatsal H. Shah (2007) evaluates several machine learning techniques applied to stock market prediction. The techniques used are: support vector machines, linear regression, “prediction using decision stumps”, expert weighting, text data mining, and online learning (the code was from YALE/Weka). The main stock features used were moving averages, exponential moving average, rate of change, and relative strength index. He concludes with “Of all the Algorithms we applied, we saw that only Support Vector Machine combined with Boosting gave us satisfactory results.

Julia!

I am quite excited about the Julia language (windows download, manual). It’s free. It’s almost the same as Matlab, but it is as fast as C++ (much faster than Matlab and Octave, 160 times faster in the example below). Here is a quick comparison.

Matlab code (primeQ.m):

function b = primeQ( i )
   for j=2:ceil(i/2.0)
       if mod(i,j) == 0
           b = false;
           return
       end
   end
   b = true;
end

Matlab input:

tic; primeQ(71378569); toc

Matlab output:

Elapsed time is 52.608765 seconds.

Julia code (primeQ.jl):

function primeQ( i )
   for j=2:ceil(i/2.0)
       if mod(i,j) == 0
           return false;
       end
   end
   return true 
end

Julia input:

load(“primeQ.jl”)

tic(); primeQ(71378569); toc()

Julia output:

elapsed time: 0.3280000686645508 seconds

In “A Review of Studies on Machine Learning Techniques”, Singh, Bhatia, and Sangwan (2007) comment on neural nets, self organizing maps, case based reasoning, classification trees (CART), rule induction, and genetic algorithms. They include a nice chart at the end of the article that could be quite useful for managers.

I have always liked the K-center algorithm. K-center tends to cover the data set uniformly rather than concentrating on the high density areas (like K-means). Also, K-center does well if small outlier clusters belong to different classes, whereas K-means tends to ignore small clusters. Check out K-Center and Dendrogram Clustering: Applications to Image Segmentation for some nice pictures.

« Older entries § Newer entries »