So I have been learning a little category theory in the hopes that I could use it somehow to teach computers abstraction.
What is abstraction? Carl and I have been talking about this for years. Here are some ideas on what abstraction could be:
1a) An average model of a large group of similar things. For example, the abstraction of a chair might be an idealized chair with a seat, a backrest, four legs, and possibly “strechers” between the legs to give it strength and stability.
1b) A function in a computer program that is used in several places in the code. In effect this function shortens the code. If the code has length L1 without the function, the code is length L2 with the function, the function is length LC, and the function is used N times in the code with the function, then typically
L2 $\approx$ L1 – LC*N + LC + N.
The function is an abstraction of the pieces of code that it replaced similar to the way an ideal chair is an abstraction of many chairs. The function has the common characteristics of the code it replaced. For example, the following code consists of about 160 characters
and it could be shortened slightly to about 137 characters by
Though this example is not in any way practical, it does illustrate how a function can capture commonality. All of the pieces of code replaced by the pr function had a call to printf, the substring “Hello “, and the carriage return “\n”. The pr function was, in a sense, the idealized version of the six print statements in the original code.
In category theory, we could refer to the category of strings with arrows going from superstrings to substrings. In this category, there would be an arrow from the line of code ” printf(“Hello John\n”);” to the string ” printf(“Hello \n”);”. The categorical coproduct of several strings in this category would be the longest common substring. So, category theory suggests the substring ” printf(“Hello \n”);” should somehow be included in the abstraction of those six printf statements in the original code.
2) A simplifying transformation that changes things into simpler things which are easier to work with. For example, if you are counting sheep in a pen, you might convert groups of sheep into numbers and then add the numbers to get a total count for the entire pen. We simplified the counting process by first converting groups of sheep into numbers. Another example might be solving a corn maze by first flying over it, taking a picture, and then converting the “impassable” rows of corn into lines drawn on a piece of paper. To solve the corn maze, you first convert it to a pen drawing on paper and then solve the pen drawn maze.
3) Removing irrelevant information or detail. If a piece in chess is attacking the opponent’s king, then the opponent must either a) move the king, b) put a piece between the attacker and the king, or c) capture the attacking piece. If you wanted to list the possible moves for the opponent, you can often delete from the chess board most of the pieces because most of the pieces on the board do not affect the three options above. In the diagram below, the white pawns, rook, and king and the black pawn on a7 do not affect the legal moves for black.
In a corn maze, the type of corn, the height of the corn stalks, the temperature, or the type of soil do not affect the solution of the corn maze, so that information could be ignored. Ignoring irrelevant information allows the brain to form a more abstract, simple model of the corn maze.
4) Variables in first order predicate logic. “If person A is taller than person B and person B is taller than person C, then person A is taller than person C.” In that statement, person A could any one of a billion people on earth.
5) Establishing classes of objects. “Fruit tends to spoil quickly.” That statement applies to a huge variety of organic objects. Abstraction classes can even be put in a hierarchy like Animal ==>> Mammal ==> Cat ==> Siamese Cat.
6) Statistical Patterns and Compression. If a piece of code like zip.exe compresses a corpus of English text files, we could say that the piece of code formed an abstract model of the corpus. In the case of the zip function which uses the Lempel–Ziv–Markov chain algorithm, the zip program would notice that
- the letters ‘e’ and ‘t’ appear frequently,
- the letter ‘u’ usually follows ‘q’, and
- the capital letter I frequently appears with spaces on either side.
These patterns are all mini-abstractions or aspects of English.
A very general notion of abstraction could be any function which shortens the compression of data. That function could be an abstraction of some part of the data. If your data consisted of 3 dimensional models of chairs, then a compression program could use the fact that a chair usually has four legs, a seat, and a backrest. A compression program might benefit from a function like the one shown below.
The abstract version of a chair would help compress the data describing many chairs.
A compression program for a corpus of chess games would probably include a board evaluation function. The program could use the evaluation function to suggest the top five most likely moves. This would allow the program to use less space to compress the moves by using a probability distribution (e.g. arithmetic encoding) over moves as a function of the “goodness” of each move. Better moves would be more likely than worse moves. Converting the positions to valuations helps the computer to compress by making it “understand” chess. If the computer is compressing many chess games, it may even note that some human players were more aggressive players or stronger players. By classifying the players by skill or behavior, the program achieves better compression.
Well that’s it for now.
Hopefully, sometime in the future, I will be describing how category theory can help us automate those six abstractions and then use the abstractions to generate features for learning algorithms.
(Code highlighting was done by http://hilite.me/. Thanks!)