You are currently browsing the archive for the Math category.


Equation from Teacher’s Choice

So I have been wanting to understand Category Theory (see [1], [2], [3]) mainly because I thought it would help me understand advanced functional programming in Haskell and because of the book “Physics, topology, logic and computation: a Rosetta Stone” by Baez and Stay (2011).  A number of the constructs in Haskell were derived from Category Theory so that was enough motivation for me to learn it.  But also, monoidal categories are useful in several fields: physics, computer science, pure mathematics, and statistics. The biggest application for me is the idea of generalizing probabilistic graphical models through either Ologs or monoidal categories / tensor networks (see Brendan Fong’s thesis “Causal Theories: A Categorical Perspective on Bayesian Networks“).

To start my investigation of Category Theory, I began with the $20 thin book “Basic Category Theory for Computer Scientists” by Benjamin Pierce (see also the free online version “A taste of category theory for computer scientists” (1988)).

I really enjoyed the book.  Dr. Pierce’s style is a little informal compared to pure math books like Mac Lane’s “Categories for the Working Mathematician” (pdf / Amazon), but I enjoy that more relaxed style of writing when I am first learning a field.

The biggest obstacle for learning category theory is the fact that category theory generalizes a lot of areas of pure mathematics like topology, abstract algebra, and geometry.  It’s hard to generalize before you have examples to generalize, but the examples being generalized in category theory are mostly from higher level mathematics found in senior level undergraduate and graduate level courses. Pierce ameliorates this problem by introducing some of the most basic categories first: sets, ordered sets, partially ordered sets, groups, monoids, vector spaces, measure spaces, topological spaces, proofs, and a simple functional computer language.  He takes the time to explicitly define most of these ideas, so, in theory, you could read this book without a background in theoretical mathematics, but it would be hard.

After defining categories and introducing the most basic categories, Pierce describes and defines the most basic ideas in category theory: subcategories, commutative diagrams, monomorphisms, epimorphisms, isomorphisms, initial/terminal objects, products, coproducts, universal constructions, equalizers, pullbacks, pushouts, limits, cones, colimits, cocones, exponentiation, and closed Cartesian categories. These ideas are spelled out over the thirty pages of chapter one including illuminating homework exercises. The homework exercises varied significantly in difficulty.  Many of the exercises were trivial and there are two or three that I am still working on despite investing several hours of thought.  Generally, I found the exercises to be a bit harder than those in Mac Lane’s book, but Pierce’s book required less of a background in mathematics.  A couple of the exercises were incorrectly stated or impossible.

Chapter two introduced functors, natural transformations, adjoints, and F-algebras. After reading this chapter, I was finally able to understand the definition of monads which are an important part of the computer language Haskell!  Pierce provides many examples of each of these ideas and enjoyable homework exercises to increase understanding.  Pierce’s definition of adjoints is much easier to understand than the standard definitions using counit adjunction or Hom Sets.

The last major chapter concerns applications of category theory to computer science–specifically lambda-calculus and programming language design.  

The first two chapters of the book give a reasonable condensed introduction to category theory for students that have taken a course in abstract algebra.  A course in topology or linear algebra would be another useful prerequisite.  I carried around the light 100 page book for a few months so that I could learn something whenever I had some extra time.  I had hoped that when I had proven that several functors where monads, I would then really understand monads, but a full understanding still eludes me.  Similarly, I have proven that several functor pairs are adjoint, but I still don’t feel as though I understand adjoint functors.  I guess more work and learning are needed.

Have a great year.  –  Cheers, Hein


Orthogonal functions are very useful and rather easy to use.  The most famous ones are sin(n x) and cos(n x) on the interval [$-\pi$, $\pi$].  It is extremely easy to do linear regression with orthogonal functions.

Linear regression involves approximating a known function $f(x)$ as the weighted sum of other functions.  This shows up all the time in various disguises in machine learning:

  • When using AdaBoosting we perform a weighted sum of weak classifiers. 
  • Support vector machines find a linear combination of the features that separates the data into two classes.  
  • Each node in a neural net typically computes a weighted sum of the nodes in the prior layer.

For orthogonal functions, computing the linear regression is simple.  Suppose that the features are $f_1, f_2, \ldots, f_n$ and the functions $f_i$ are orthonormal, then the best approximation of the function $g(x)$ is the same as the solution to the linear regression

$$ g(x) \approx \sum_{i=1}^n \alpha_i f_i(x)$$

where $\alpha_i$ is the “inner product” of $f_i$ and $g$ denoted

$\alpha_i= \langle\;g, f_i\rangle = \int g(x) f(x) w(x) dx.$  (Two function are orthogonal if their inner product (or dot product) is zero.  The set of orthogonal functions $\{f_i\}$ is called orthonormal if $<f_i, f_i>=1$ for all $i$.)  The function $w(x)$ is a positive weight function which is typically just $w(x)=1$.

For example, if we try to approximate the function $g(x)=x^2$ on the interval [-1,1] by combining the features $f_1(x) = 1/\sqrt(2)$ and $f_2(x) = x \sqrt{3 \over 2} $, then

$$ g(x) \approx \sum_{i=1}^n \alpha_i f_i(x)$$


$$\alpha_1 = \langle g, f_1\rangle = \int_{-1}^1 g(x) f_1(x) dx =  \int_{-1}^1 x^2 1/\sqrt2 dx = \sqrt{2}/3,$$

$$\alpha_2 = \langle g, f_2\rangle = \int_{-1}^1 g(x) f_2(x)  dx=  \int_{-1}^1 x^2  \sqrt{3/2}\; x dx =0,$$

and the best approximation is then $\sum_{i=1}^n \alpha_i f_i(x) = \alpha_1 f_1(x) = 1/3$.


The other day I needed to do a linear regression and I wanted to use orthogonal functions, so I started by using the trigonometric functions to approximate my function $g(x)$.  Using trig functions to approximate another function on a closed interval like [-1,1] is the same as computing the Fourier Series.  The resulting approximations were not good enough at x=-1 and x=1, so I wanted to change the weight function $w(x)$.  Now assume that $u_1(x), u_2(x), \ldots, u_n(x)$ are orthonormal with the weight function w(x) on [-1,1].  Then

$$ \int_{-1}^1 u_i(x) u_j(x) w(x) dx = \delta_{i,j}$$

where $\delta_{i,j}$ is one if $i=j$ and zero otherwise.  If we make a change of variable $x=y(t)$, then

$$ \int_{y^{-1}(-1)}^{y^{-1}(1)} u_i(y(t)) u_j(y(t)) w(y(t)) y'(t) dx = \delta_{i,j}.$$

It would be nice if

$$(1)\quad w(y(t)) y'(t)=1,$$

because then I could just set $u_i(y(t))$ equal to a trig function and the resulting $u_i(x)$ would be orthonormal with respect to the weight function $w(x)$. Condition (1) on $y$ is really an ordinary differential equation

$$y’ = {1\over{w(y)}}.$$

In order to get a better fit at $x=\pm 1$, I tried two weight functions $w(x) = {1\over{(1-x)(1+x)}}$ and $w(x) = {1\over{\sqrt{(1-x)(1+x)}}}$.  For the former, the differential equation has solutions that look similar to the hyperbolic tangent function. The latter $w(x)$ yields the nice solution

$$y(t) = \sin( c + t).$$

So now if I choose $c=\pi/2$, then $y(t) = \cos(t)$, $y^{-1}(-1) = \pi$, $y^{-1}(1)=0$, so I need functions that are orthonormal on the interval $[-\pi, 0]$.  The functions $\cos(n x)$ are orthogonal on that interval, but I need to divide by $\sqrt{\pi/2}$ to make them orthornormal.  So, I set $u_n(y(t)) = \sqrt{2/\pi}\cos(n\;t)$ and now

$$ \int_{-1}^1 {{u_i(x) u_j(x)}\over {\sqrt{(1-x)(1+x)}}}  dx = \delta_{i,j}.$$

The $u_n(x)$ functions are orthonormal with respect to the weight $w(x)$ which is exactly what I wanted.

Solving for $u(x)$ yields

$$u_n(x) = \sqrt{2/\pi} \cos( n \;\arccos(x)),$$

$$u_0(x) = \sqrt{2/\pi}, u_1(x) = \sqrt{2/\pi} x,$$

$$u_2(x) = \sqrt{2/\pi}(2x^2-1), u_3(x) = \sqrt{2/\pi}(4x^3-3x),\ldots.$$


If you happen to be an approximation theorist, you would probably notice that these are the Chebyshev polynomials multiplied by $\sqrt{2/\pi}$.   I was rather surprised when I stumbled across them, but when I showed this to my friend Ludmil, he assured me that this accidental derivation was obvious.  (But still fun :) )

You can find anything on the web.  Earlier today, I noticed that the Sterling approximation and the Gaussian distribution both had a $\sqrt{2 \pi}$ in them, so I started thinking that maybe you could apply Sterling’s approximation to the binomial distribution with $p=1/2$ and large $n$ and the central limit theorem to derive the Gaussian distribution.  After making a few calculations it seemed to be working and I thought, “Hey, maybe someone else has done this.”  Sure enough, Jake Hoffman did it and posted it here.  The internet is pretty cool.

I was reading MarkCC’s gripe about the misuse of the word dimension and it reminded me about the Hausdorff dimension.  Generally speaking, the Hausdorff dimension matches our normal intuitive definition of dimension, i.e the H-dimension of a smooth curve or line is 1, the H-dimension of a smooth surface is 2, the H-dimension of the interior of a cube is 3. But for fractals, the H-dimension can be a non-integer. For the Cantor set, the H-dimension is 0.63. For more information, check out the Wikipedia article


Check out Terence Tao‘s wonderful post “Matrix identities as derivatives of determinant identities“.

Thank you to Freakonometrics for pointing me toward the book “Proofs without words” by Rodger Nelson.  Might be a nice Christmas present  :)


Matroids are an abstraction of Vector Spaces.  Vectors spaces have addition and scalar multiplication and the axioms of vectors spaces lead to the concepts of subspaces, rank, and independent sets.  All Vector Spaces are Matroids, but Matroids only retain the properties of independence, rank, and subspaces. The idea of the Matroid closure of a set is the same as the span of the set in the Vector Space.

So I have been thinking about which ideas in mathematics I use most often and I have listed them below.  Please feel free to comment because I would like to use this list as a starting point for a future list of “The most useful ideas in Mathematics for Scientists and Engineers”.
$\ $
I showed this list to Carl and he said his list would be completely different.  (He choked a bit when he saw primes at #71.)  Hopefully we can post his list as well as others.
$\ $
  1. counting
  2. zero
  3. integer decimal positional notation 100, 1000, …
  4. the four arithmetic operations + – * /
  5. fractions
  6. decimal notation  0.1, 0.01, …
  7. basic propositional logic (Modus ponens, contrapositive, If-then, and, or, nand, …)
  8. negative numbers
  9. equivalence classes
  10. equality & substitution
  11. basic algebra – idea of variables, equations, …
  12. the idea of probability
  13. commutative and associative properties
  14. distributive property
  15. powers (squared, cubed,…),  – compound interest (miracle of)
  16. scientific notation 1.3e6 = 1,300,000
  17. polynomials
  18. first order predicate logic
  19. infinity
  20. irrational numbers
  21. Demorgan’s laws
  22. statistical independence
  23. the notion of a function
  24. square root  (cube root, …)
  25. inequalities (list of inequalities)
  26. power laws (i.e. $a^b a^c = a^{b+c}$ )
  27. Cartesian coordinate plane
  28. basic set theory
  29. random variable 
  30. probability distribution
  31. histogram
  32. the meanexpected value & strong law of large numbers
  33. the graph of a function
  34. standard deviation
  35. Pythagorean theorem
  36. vectors and vector spaces
  37. limits
  38. real numbers as limits of fractions, the least upper bound
  39. continuity
  40. $R^n$, Euclidean Space,  and Hilbert spaces (inner or dot product)
  41. derivative
  42. correlation
  43. central limit theorem, Gaussian Distribution, Properties of Guassains.
  44. integrals
  45. chain rule
  46. modular arithmetic
  47. sine cosine tangent
  48. $\pi$circumference, area, and volume formulas for circles, rectangles, parallelograms, triangles, spheres, cones,…
  49. linear regression
  50. Taylor’s theorem
  51. the number e and the exponential function
  52. Rolle’s theoremKarush–Kuhn–Tucker conditions, derivative is zero at the maximum
  53. the notion of linearity
  54. Big O notation
  55. injective (one-to-one) / surjective (onto) functions
  56. imaginary numbers
  57. symmetry
  58. Euler’s Formula $e^{i \pi} + 1 = 0$
  59. Fourier transform, convolution in time domain is the product in the frequency domain (& vice versa), the FFT
  60. fundamental theorem of calculus
  61. logarithms
  62. matrices
  63. conic sections
  64. Boolean algebra
  65. Cauchy–Schwarz inequality
  66. binomial theorem – Pascal’s triangle
  67. the determinant
  68. ordinary differential equation (ODE)
  69. mode (maximum likelihood estimator)
  70. cosine law
  71. prime numbers
  72. linear independence
  73. Jacobian
  74. fundamental theorem of arithmetic
  75. duality – (polyhedron faces & pointsgeometry lines and pointsDual Linear Programdual space, …)
  76. intermediate value theorem
  77. eigenvalues
  78. median
  79. entropy
  80. KL distance
  81. binomial distribution
  82. Bayes’ theorem
  83. $2^{10} \approx 1000$
  84. compactnessHeine – Borel theorem
  85. metric space, Triangle Inequality
  86. ProjectionsBest Approximation
  87. $1/(1-X) = 1 + X + X^2 + \ldots$
  88. partial differential equations
  89. quadratic formula
  90. Reisz representation theorem
  91. Fubini’s theorem
  92. the ideas of groups, semigroups, monoids, rings, …
  93. Singular Value Decomposition
  94. numeric integration – trapezoidal rule, Simpson’s rule, …
  95. mutual information
  96. Plancherel’s theorem
  97. matrix condition number
  98. integration by parts
  99. Euler’s method for numerical integration of ODEs (and improved EulerRunge–Kutta)
  100. pigeon hole principle
There is a long list of mathematical ideas that I use less often.  Here’s a sampling: Baire category theorem, Banach SpacesBrouwer Fixed Point Theorem, Carathéodory’s Theorem, Category TheoryCauchy integral formula, calculus of variations, closed graph theoremChinese remainder theorem, Clifford algebra (quaternions), Context Free Grammarscountable vs uncountable infinityCramer’s RulecohomologyEuclidean algorithm, fundamental group, Gauss’ LawGrassmannian algebra , Graph TheoryHahn-Banach Theorem, homology, Hairy Ball Theorem, Hölder’s inequality, inclusion-exclusion, Jordan Decomposition, Kalman FiltersMarkov Chains (Hidden Markov Models), modules, non-associative algebras, Picard’s Great TheoremPlatonic/Euclidean solids, Principle of InductionProbabilistic Graphical Models (Bayesian Networks, Markov Random Fields), Pontryagain duality, Spectral TheoremSylow p subgroup, repeating decimals equal a fraction, ring ideals, sine law, tensorstessellation, transcendental numbers, Uniform Boundedness TheoremWeierstrass approximation theorem, …

Gauss did a lot of math, but sometimes I am surprised when I find something new to me done by Gauss

$\tan(x) = { 1 \over{1/x \ – {1\over{3/x \ – {1\over{5/x\  – \ldots}}}}}}$

« Older entries § Newer entries »