During the holiday season I finally decided to learn something about Category theory because it seems to be the hot topic in the Haskell community. The book I used to guide me in this journey is Category theory for programmers by Bartosz Milewski.
I cannot say I fully understood it, I’m far away from doing so, but I want to write down what I think I learned nonetheless. I hope this will clarify my thoughts and maybe help someone else.
What is category theory
Bartosz likes to say that Category theory is the science of structure and composition. Since functional programmers believe, rightly imho, that programs are nothing but a huge composition of functions it might be worthwhile to take a look at it.
Generally speaking though, category theory is a branch of maths that tries to define the structure of “things” by only studying the “relationships” between them. It’s closely related to Set theory but the core difference is that in category theory it’s not allowed to define anything relying on properties on the elements themselves, but only in terms of their connectivity.
Category
A category is a bunch of objects connected by arrows. Every category has two properties:
- every object always has an identity arrow that points to itself;
- the arrows, also called morphisms, must be composable that is if there’s a
function from object A to B and from B to C then there must be
also an arrow from A to C. Moreover, the composition must be
associative that is
(f . g) . h = f . (g . h)
.
If the second property sounds like the definition of function composition then you’re absolutely right! In fact, it turns out there is a category where the objects are types and the arrows are the functions between them. In Haskell this category is called Hask, but this concepts can be translated to other type systems with almost no changes.
Here’s an example of a simple category
Isomorphism
Another core concept of category theory is the concept of isomorphism. This buzzword means that there’s an invertible arrow, a morphism, between two objects in a category. In other words, the two objects are isomorphic.
Simply put, this means that the composition between the two arrows is the
identity arrow. In pseudo Haskell notation this could be expressed as f . g = id
.
As a trivial example, let’s show how Maybe ()
is isomorphic to Bool
.
isJust :: Maybe () -> Bool
isJust (Just _) = True
isJust Nothing = False
justIf :: Bool -> Maybe ()
justIf True = Just ()
justIf False = Nothing
-- function comparison is not implemented in Haskell, but the following
-- statements shouldn't be too hard to verify
-- isJust . justIf = id
-- justIf . isJust = id
Clearly, Maybe ()
is not semantically the same as Bool
(i.e. they’re not the
same type), but they actually carry the same amount of information albeit in a
different shape.
A use case I find quite interesting for isomorphisms is in compiler optimizations. Theoretically a compiler could substitute a data type with another data type that is isomorphic to the original one but that’s more machine efficient.
Duality
Since a category is just a collection of objects and arrows it’s always possible to create the dual of a category by inverting the direction of the arrows. The dual category is also called the opposite category or Cop. This duality is interesting because it also means that structures built on a category C also have a dual shape in Cop. Usually the dual shapes are called with the same name with a co prefix.
For example, here’s the opposite category of first example category
Note how the identity arrows are preserved since their source and target are the same object.
Also note that the dual of the dual of a category is the category itself that is (Cop) op = C. This is why you can give a nut to a category theorist even though he asked you a coco-nut.
Product
At this point we have the tools to understand one of the core shapes in category theory: the Product. A product between two objects is an object that has two morphisms to the original objects. Intuitively, a Product between two objects is another object from which it’s possible to retrieve both the original objects. In Haskell pseudo-notation these two morphisms, also called projections, can be written as
f :: p -> a
g :: p -> b
Using this definition there could be multiple valid product objects for any two objects. To solve this ambiguity there’s an additional constraint the product must satisfy: it must have a unique morphism to any other candidate product object. I know it sounds scary, but all it means is that the product object cannot be built from any other candidate product object. In other words, the product object has less structure than the other candidate products.
Here’s the graph for the product P. Note that I’ve omitted the identity morphisms for simplicity only.
In this diagram, there are two candidate product objects: P and P’. However, only P is the product between A and B because there’s a morphism h that can be used to go from P’ to P.
In Haskell, the product of two types is the two element tuple (a, b)
which can
also be expressed as data Product a b = Product a b
. The projections are
simply the fst
and snd
functions.
As an example of a Haskell type that is a candidate for being the product type
let’s consider (a, b, Int)
. It’s easy to show that the morphism that builds
(a, b)
from it is
h :: (a, b, Int) -> (a, b)
h (a, b, _) = (a, b)
Therefore we can say that (a, b, Int)
is not the product of a
and b
.
Coproduct
In category theory, as soon as a pattern is found, the next thing to do is to apply the duality concept to find its dual pattern. In our case let’s build the dual of the product: the coproduct.
This is the product diagram with the arrows inverted.
The intuition behind this is that a coproduct is an object that can be built either by A or by B. In fact, there are two morphisms, also known as injections, that create a coproduct. Sure enough, there could be multiple coproducts for the same pair of objects, but just like a product, the coproduct cannot be built from any other coproduct candidate.
This might be a bit harder to recognize, but if we try hard enough we can see the pattern of algebraic data types! In fact, in Haskell we can define the coproduct in the following way
data Coproduct a b = CoproA a | CoproB b
The two injections are the value constructors.
You might recognize this declaration as the declaration of the Either
data
type and you’d be right because they’re isomorphic to each other!
Algebra
Both product and coproduct, also known as sum, have names of arithmetic operations but they don’t seem to relate to the concept of addition and multiplication. Accident? As Master Oogway says, there is no accident.
As a refresher, here are the definitions of Product
and Sum
data Product a b = Product a b
data Sum a b = SumA a | SumB b
One of the questions we can ask is how many ways are there to create any of these patterns.
Let’s start with the product. If it had a single type parameter then the number of values would simply be the number of distinct values in the wrapped type. Since in the real product there are two type parameters, the total number of ways to build a product is the product of the number of distinct values in each type parameter!
We can apply the same reasoning to sum. To build a sum we either need an a or a b but not both. Therefore the number of ways of building a sum is just the sum of the values in the type parameters.
Now that we have operations to combine (positive) numbers it might be worthwhile to find an encoding for the numbers themselves.
Let’s start with the number one. In this line of reasoning, one means that there should be only a single value for that type. It’s not too hard to come up with a valid definition. In Haskell something like the following will work
data One = One
There’s just one way of instantiating the type One that is by using the value
constructor One. Note how One is isomorphic to ()
.
Now that we have One we can have two, three, etc.. simply by using sums and products! Take a look at the following type definitions
type Two = Sum One One
type Four = Product Two Two
Let’s check if this actually works by enumerating all the ways we can build those types:
- we can build a Two using either
SumA One
orSumB One
; - a Four can be built with
Product (SumA One) (SumA One)
,Product (SumA One) (SumB One)
,Product (SumB One) (SumA One)
andProduct (SumB One) (SumB One)
.
It seems to work, neat!
However we haven’t yet found an encoding for zero. Let’s fix that now, here’s the definition
data Zero
In Haskell it’s possible to define a type without any value constructors. This
means that we can never create any value of this type. This is also known as the
Void
type (which is not the same as void
in c/c++). In C++ a type isomorphic
to zero can be defined in terms of a class with a private constructor.
Note how Zero is the identity for Sum as well as One is for Product. Also note how any Product with Zero ends up in no ways of building that type because at some point you will have to instantiate a Zero which is impossible.
If we substitute numbers with types it becomes evident that creating new types is just a composition over Sums and Products (if we can create any number we can create any type). This allows the creation of a generic framework to manipulate data types which is how the Haskell compiler is able to implement a lot of typeclasses automatically.
Curry-Howard isomorphism
It’s possible to show that products and sums can be transformed into constructs
in logic. In particular a product would be logical and or &&
and a sum would
be a logical or or ||
. Furthermore, Zero would correspond to false and One
to true.
It’s quite easy to show that the familiar properties about such constructs hold.
For example a && false = false
holds because if we translate the proposition
to sums and products it eventually evaluates to Product a Zero
which we know
is always Zero
. This same approach can be used to show that a || true = true
.
We can also show how functions can be mapped to logical implication or a -> b
meaning that if a is true then we can say b. The obvious question here
is what happens when a is false? Well, the function would be of type Zero -> b
but if we look closely we might realize that we can never call that
function because Zero doesn’t have any values! Therefore we cannot conclude b.
An interesting interpretation of this is that, since all the possible types are a combination of products and sums and that we can map sums and products to logical constructs then we can say that there’s an isomorphism between types and logic! This is known as the Curry-Howard isomorphism. Under this point a view, checking if a program type-checks is no different than checking whether a logical proposition holds or not.
Functor
A functor is a mapping between categories which maps both objects and morphisms. When the target category of the functor is the same as the source category then the functor is called endofunctor. In programming we don’t usually have multiple categories because we live only in the category of types therefore in this context a functor actually is an endofunctor.
In Haskell a (endo)functor can be defined in the following way:
class Functor f where
fmap :: (a -> b) -> (f a -> f b)
The mapping between objects is in the fact that f
must be a type constructor
that is a type that needs a single type parameter in order to be instantiated.
For example, Maybe
and Either String
are both type constructors because they
require only another type parameter but Either
is not because it requires 2
type parameters. Note that also Either String Int
or Int
are not type
constructors because they’re types and as such they require no type parameters.
We can see a type constructor T as taking the type parameter and mapping it
into category T.
If we try hard enough we can also see how fmap
is doing the mapping between
morphisms, that is functions, from one category to another one. In fact it takes
the morphism a -> b
which is mapped to the morphism f a -> f b
in category
F.
Here’s an example implementation of Functor for Maybe
instance Functor Maybe where
fmap f (Just x) = Just (f x)
fmap _ Nothing = Nothing
Functors are really common in programming. My intuition is that they represent some sort of containers. For example, the Maybe type can be seen as a container of an optional value and a list can be seen as a plain container of values.
Monoid
Another interesting pattern is the Monoid. A Monoid is an object equipped with a unit arrow and an associative multiplication arrow.
In Haskell this could be expressed as
class Monoid m where
mempty :: m
mappend :: m -> m -> m
Note that in Haskell mempty is not actually an arrow but a value of type m
because it can be shown that m
is isomorphic to () -> m
(there’s only a
single way of calling () -> m
).
These morphisms must satisfy the following laws
mappend mempty m = m
mappend m mempty = m
(mappend x . mappend y) . mappend z = mappend x . (mappend y . mappend z)
To put it simply, mappend is an associative “multiplication-like” morphism and mempty is the identity value under mappend.
I know this definition isn’t very clear, but hopefully an example helps.
Probably the simplest example of a Monoid is the product of numbers. In fact
it’s easy to show that the mempty value is 1 and that mappend is simply the *
operator.
Here’s a possible implementation in Haskell
data ProductM = ProductM Int
instance Monoid ProductM where
mempty = ProductM 1
mappend (ProductM a) (ProductM b) = ProductM (a * b)
The real definition is a bit different because ProductM must also be a Semigroup in order to be a Monoid. Let’s ignore this detail by just saying that a Semigroup is a Monoid without the identity value; it’s just the associative morphism.
Besides the usual unwrapping and rewrapping the implementation should be quite
straightforward to follow. I won’t bother demonstrating that ProductM
indeed
satisfies the monoid requirements because I think we can all agree that it’s
just multiplication.
Note how numbers form a monoid over addition too. In this case the mempty value
is 0 and mapped is simply the +
operator.
Monoids are extremely useful in practice! Basically every container I can think of is a Monoid under the concatenation morphism. Here’s an hopefully easy to understand implementation of Monoid for lists in Haskell
instance Monoid [a] where
mempty = []
mappend = concat
It should be quite evident how this definition can be mapped to nearly every container.
Also note how the implementation of Monoid for lists is totally generic over the contained value. That’s why lists are also known as Free monoids; they give you a Monoid for free.
To show you how much monoids are useful let’s solve a simple problem: find the sum and product of a sequence of numbers in a single pass. Here’s how it could be done in Haskell
f :: [Int] -> (Product Int, Sum Int)
f = foldl mappend mempty . map (\n -> (Product n, Sum n))
-- or using mconcat which is equivalent to the above manual foldl
-- f = mconcat . map (\n -> (Product n, Sum n))
Pretty sick, isn’t it? The key point here is that also the tuple is a Monoid as long as its components do!
Comonoid
If there is a Monoid there must be also a Comonoid in the opposite category, right? Indeed there is. Unfortunately I don’t understand it completely.
After reversing the arrows eventually we get to a definition like the following
class Comonoid m where
destroy :: m -> ()
dup :: m -> (m, m)
where dup must be associative and must return an exact copy of the input. In Haskell the only possible definitions of these morphisms are
destroy :: a -> ()
destroy _ = ()
dup :: a -> (a, a)
dup a = (a, a)
And that’s why it’s hard for me to understand. With only this possible definition I’m not able to see a general pattern. Anyway, my intuition is if a Monoid builds a somewhat bigger object at each mappend then the Comonoid, its dual, must gradually destroy the already built object. I don’t think that’s 100% accurate though.
It seems that a Comonoid can be used to model linear type systems upon which many new languages are based upon. Rust and Idris are examples of languages with a linear type system.
Monad
It’s time for the big one: the Monad.
A monad is just a monoid in the category of endofunctors.
If the definition doesn’t say anything to you don’t worry, you’re not alone. I’ll try my best to explain this definition as simple as I can.
First of all, the definition says we’re in the category of endofunctors which means that objects in this category are endofunctors. Since a monad is a monoid by definition, there must be an arrow that somehow multiplies two endofunctors together. In addition, there must be a unit arrow as well. These arrows are called natural transformation s. I know this is still a bit nebulous, so let’s cut it short and take a look at its definition in Haskell
class Functor m => Monad m where
return :: a -> m a
join :: m (m a) -> m a
I think we can vaguely see how return is similar to mempty (especially to () -> m
): it takes a value and simply wraps it inside the monad. Like mempty this
can be seen as the entry point of a monad.
On the other hand, the join function is a bit more difficult to relate to mappend, but the intuition is that join is actually doing the product within the monad itself. This ends up in a flattening of the structure of the Monad.
Let’s see how Maybe
implements Monad
instance Monad Maybe where
return = Just
join (Just a) = a
join Nothing = Nothing
However, in Haskell the monad is not defined in terms of join but instead in
terms of the bind operator or >>=
. It’s possible to show that bind can be
defined in terms of join. Here’s how it could be done
(>>=) :: Monad m => (a -> m b) -> m a -> m b
f (>>=) m = join . fmap f $ m
However, I don’t think this categorical definition of a monad explains very well what a monad is in the context of programming. My intuition for the monad is that it’s a context inside which we can encapsulate values and compose functions without ever leaving the context. For example, in the case of Maybe the context is uncertainty because we don’t know for sure whether a value is present or not. Every function that works on a Maybe inherits this uncertainty because it must remain in context.
Comonad
This is the last pattern we’ll talk about: the Comonad, the dual of the Monad.
If we try to reverse the definition of a monad we eventually get to the following Haskell definition of a comonad
class Functor m => Comonad m where
extract :: m a -> a
duplicate :: m a -> m (m a)
If you look at it long enough you might see similarities to the Comonoid. In fact
A comonad is just a comonoid in the category of endofunctors
I think I understand the comonad more than the comonoid even though the former is defined on the latter. My intuition is that, if the monad provides a way of encapsulating a value inside a context without providing a way of accessing it outside the context then the comonad must be all about extracting values from a context. The extract function should reflect this logic quite clearly. On the other hand, duplicate is the exact opposite of join because it adds another layer of nesting in the comonad.
Yoneda lemma
I’ve talked enough about category theory under a mathematical point of view. Maths has always been connected to philosophy from its origins and sure enough category theory is no exception.
In particular, I’d like to mention the Yoneda lemma which (semantically) says that
Every object is completely identified by the morphisms towards the other objects.
I have not understood what it actually means in category theory, but I find it quite fascinating when applied to the category of the real world. In this category, where objects correspond to people and arrows correspond to the relationships between them, the Yoneda lemma means that each one of us is uniquely defined by our relationships. I don’t think that’s too far away from the truth.
Haskell point free notation
During my journey in category theory, I also learned why the point free notation is so named in Haskell. If you don’t know what it is it’s a style of writing functions where arguments are not explicitly mentioned. The main tool to achieve this is by using function composition. Here’s an example
oddSquares :: [Int] -> [Int]
oddSquares = map (* 2) . filter odd
As you can see, this function takes a single parameter, a list of numbers, which is not mentioned explicitly in the function definition.
It’s so named because arguments can also be called points in the sense of calculating the function at the given points. The fact that the function composition operator is a point too is an unfortunate coincidence.
Conclusions
Category theory has way more concepts than what I tried to describe here and I didn’t understand most of them. For example, I didn’t get T-Algebras, Adjunctions, Kan extensions, the Lawvere theory and many more!
My main problem with category theory is that I cannot relate to most of its concepts. In other words, I’m not able to understand what I cannot use. I want to improve with regard to this, but it’s hard.
However, I think that even with this tiny subset of category theory the inner structure of a lot of problems can be revealed. Hopefully it also helps building well structured software.