Aggregates, Entities and Events
In our system based on an event source, objects are restored to their "current" state by replaying a set of ordered historical events. The one canonical example most often used to describe event sourcing is the common or garden bank account. Your account has a current balance, but you don’t just store that balance - you arrive at it, via a historical record of debits and credits, re-applied from an initial (empty) starting point, to eventually produce the current balance.
Events are simple objects containing no behaviour; they just describe the event in terms of a set of property values. They are ordered using a version number. When you want to restore an entity to its current state, you fetch the events relating to it, order them by version number, then apply them in order. Application of events involves nothing more than changing the state of the entity - there is no logic when an event is applied. This is because historical events which have already been applied, and saved, must always be re-playable and never subject to failure, so that the entity can be restored.
What would happen if business logic were to be included in event application? Business logic is something that potentially changes throughout the lifetime of an application. Initially, for example, you might have a text field on an entity, defined to have a size limit of 255 characters. When applying your event, you would validate that the field contained no more than 255 characters, and everything being OK, you'd proceed with the event application. Later on down the road, a decision is made to change the size limit of this field to 50 characters. So an event in the past that applied 150 characters to that text field is now invalid, in the context of this new rule, and an exception will be thrown - meaning you can’t restore your entity.
Business logic always occurs before you apply an event. And, an event is only applied to the entity if the requested operation is valid by all business rules.
When events are applied to effect changes on entities, obviously they do change the state of the entity, but what we’re presently interested in is the collection of newly applied events, rather than the state change. In event sourcing, when you persist an entity, it’s not like persisting objects with a traditional database style approach. There, you change the state of the entity, then save its new current state in some table(s). When you save an entity in an event sourced system, you save just the newly applied events. In this context, the state change resulting from a new event application is a mere side effect! One that of course does no harm.
The storage of events is very straightforward, and many options are available. One is to use a very simple database schema, consisting of just two tables:
The Aggregates table contains a list of all the aggregates in the system, with the ID of the aggregate, its Type (for debugging purposes only), and the current version. This last is a little optimisation.
Each row in the Events table describes one event, using:
- an AggregateID identifying the aggregate to which the event relates;
- the version of the event; and
- a serialized representation of the event, containing all property values that need to be applied.
A crucial point here is that events are always saved and applied at the aggregate root level. It might take a bit of time to illustrate this point, but I’ll explain this in the context of a simple model. In our example, a workflow has a title and a collection of stages. Each stage has a title and a collection of task lists. Each task list has a title. We might take it a bit further and say that each task list has a list of tasks, but we’ll stop at the task list. Now let’s describe the model using domain driven design terminology.
First I'll introduce some important terms for the benefit of people who perhaps haven’t read the Eric Evans book Domain-Driven Design: Tackling Complexity in the Heart of Software. When I was first reading about this stuff via blog posts and articles found on the net, the terms aggregate and aggregate root were used in such ways, I wondered if they were synonymous. Perhaps I had also been a bit confused by looking at code from DDD example projects, where the terms used in the book don’t quite map directly to the code (I’ll come to that shortly). Since I won’t explain it better than Eric, I’m going to quote his definition:
An Aggregate is a cluster of associated objects that we treat as a unit for the purpose of data changes. Each aggregate has a root and a boundary. The boundary defines what is inside the Aggregate. The root is a single, specific Entity contained in the Aggregate... Entities other than the root have local identity, but that identity needs to be distinguishable only within the Aggregate, because no outside object can ever see it out of the context of the root Entity.We can now attempt to describe our model using these terms. The entire diagram represents the aggregate and its boundary, and we see from it that our workflow is an entity, our aggregate root. A stage is an entity that's a child of a workflow. It only has local identity, as we'll only ever be interested in a stage in the context of a workflow instance. Similarly, a task list is an entity that's a child of a stage. It has identity local to the stage instance (although still local to the entire aggregate too), as we'll only be interested in a task list in that context. However, as I mentioned a bit earlier, it’s important to note that the task list, despite being a child of a stage, still refers to the workflow as its aggregate root, and not the stage.
It’s particularly important in the context of the implementation, where a reference to an aggregate root is maintained down the entire graph of an object. This is necessary so we can save and apply events at the aggregate root level. Even when we want to save new events that have taken place on children, such as adding a task list to a stage, still we need to save the entire aggregate root. After all, events always relate to an aggregate root - remember, we query by AggregateID to retrieve them.
So, how exactly do the proper terms fail to map exactly on to the code?
You can see that our Workflow class inherits from an AggregateRoot class, which happens to be an abstract base class. Since we mentioned that a workflow is an entity, perhaps we might expect to see it inheriting from Entity? Then we would have a reference to it inside some aggregate root class. Well, as just mentioned, the AggregateRoot class is abstract. It doesn’t really make sense to create an instance of that; you’d rather create instances of actual concrete aggregate roots. So, even though an aggregate root is an entity, in the code we’ll define any root entities as inheriting from AggregateRoot.
There is also an Entity abstract base class. Stage and TaskList inherit from this. In the code, entities that are not aggregate roots must have a reference to an aggregate root. The entity has a protected AggregateRoot field, a reference to the ‘parent’ aggregate root, although as hopefully my previous explanation will have made clear, this is not really a parent. The AggregateRoot must be passed into the entity's constructor. The only reason that it’s protected as opposed to private is so it can be passed, when a concrete entity creates a child, into the new child.
Both of these base classes provide all the behaviour for managing event application and replaying.
Having established all that, I can finally get to the interesting implementation: how events are applied and replayed.
Next time: Applying and Replaying Events.