Why I love the repository pattern

Published 2020-06-18

Persisting application state is not an easy job, to say the very least. We have an entire cottage industry of vendors who promise to simplify these issues for us. But there is a long-existing design pattern that which provides a logical separation in your application to help tackle this problem, without needing to commit to a specific database technology or library.

The repository pattern aims to isolate the complexities concerning the persistence of data (complexities which are often vendor specific) from the the business-logic of your application.

In your app, you will usually:

🔍 Find some specific entity in question
🛠️ Perform some operation on that entity, obeying a set of rules to ensure that it doesn’t enter an invalid state
💾 Persist that entity

Steps (1) and (3) both concern communication with the storage layer (e.g. the database) that you are using. It will involve mapping the data to/from storage and into objects/data structures that your app understands.

Step (2) is where the business-logic lies. A rule like “a customer’s current account balance cannot exceed its overdraft” should sit here, not in the database. These rules could be quite complex – especially if there are many rules that interact with one another – and we don’t want them to bleed into the already complicated logic surrounding (1) and (3).

Imagine the above in terms of a physical system, rather than an application. Accessing your entities is like pulling them out of a filing cabinet 🗄️: you (1) find them by a simple criteria (usually a name/ID), (2) change whatever is needed on that document and (3) then put it back into place where they were.

Repositories hence provide a simple, collection-like interface for accessing and persisting entities. E.g, if I have Accounts in my domain model, then I can create an AccountRepository:

namespace Domain;

interface AccountRepository
{
    public function findById(string $id): Account;
    public function save(Account $account): void;
}

Then in my application/command model, I don’t need to care about the implementation details of the database that is being used to persist my accounts. I can just use the repository:

namespace Command;

use Domain\AccountRepository;

class WithdrawFunds
{
    private AccountRepository $accounts;

    public function __construct(AccountRepository $accounts)
    {
        $this->accounts = $accounts;
    }

    public function __invoke(string $accountId, int $amount): void
    {
        $account = $this->accounts->findById($accountId); // (1)
        $account->withdraw(new Money($amount));           // (2)
        $this->accounts->save($account);                  // (3)
    }
}

and the implementation details are hidden behind the concrete class that I choose to use. I could have an ArrayAccountRepository that just stores them in an array - this would make it very easy to write unit tests for WithdrawFunds that don’t depend on database access, for instance. In a production environment, I could have PostgresAccountRepository, which communicates to an instance of a Postgres database. This implementation would contain all of the SQL queries needed to find the account data, and map it into an Account object. It does so without changing anything about how WithdrawFunds is implemented. I would just pass it a PostgresAccountRepository instance instead:

$withdraw = new Command\WithdrawFunds(new PostgresAccountRepository(
    // config parameters for Postgres communication...
));
$withdraw("my-account", 100);

So why do I personally use it?

I’ve always loved databases growing up. Part of that is due to my parents tech-related careers. My mum’s role as a data analyst involves building complex SQL reports for clients, and my dad’s role as an IT systems engineer resulted in me having access to lots of “enterprise” software in the house as a kid. Between them, I was able to work with Adobe Go-Live and Microsoft Access way back in the early 2000s to build my own (locally hosted, never deployed!) websites and store data about them. I went on to take a databases course in my second year at university, and enjoyed it thoroughly.

I say all of this to convey to you that I’m not some luddite who hates databases on principle or is afraid of integrating with them or wants to avoid writing SQL. But when building a proof-of-concept, I often get frustrated when I’m trying to quickly hash out the working logic; I get dragged down thinking about how the database schema is supposed to look before the app is even in a suitable shape to start integrating one. Sometimes the thing that I’m building may not even need a database at all when I’m done (at least, not a relational one).

What the repository pattern allows me to do is focus on the business logic of the application first, and defer the decisions on persistence until much later in the project. This is exactly the benefit that Uncle Bob has proselytized in his Clean Architecture series for years now, but it specifically comes out of an intentional decision to separate persistence logic from the other logic in your system.

It also makes my domain and application model much easier to test. Since I can create test implementations of the repositories very quickly, this encourages me to actually write those unit tests for my code, since they’re quick-wins instead of burdens. I can even use the test implementations of the repositories for local persistence to start out with, and gradually introduce the type of storage that is most relevant to the stage in the project that I’m in.

Finally, by deferring the decision on the database until much later in the project’s lifecycle, I can make a much more informed decision about the storage choice than I would have to do up-front. All too often, we reach immediately for a relational database system (RDBS) simply because these are what we tend to find ORM vendors design their solutions around. We couple other aspects of the system into these technologies on the basis that it makes it “faster” to get to a complete system, but don’t give ourselves enough time to think if said solution is appropriate to our use cases, or to think about the consequences of early coupling.

Sometimes document storage is more suitable (e.g. Redis, MongoDB); other times we need something that is better tuned for plain-text indexing (e.g. ElasticSearch), or graph-specific queries (e.g. GraphQL) or keeping an immutable audit-trail of events (e.g. Kafka, Event Store). When you’re building CRUD apps, then maybe RDBSs are fine for your use cases, but I like to have the confidence in making that decision after I’ve spent more time fleshing out the use cases and technical requirements of the system.

What quirks have I found in using it?

Firstly, you may have noticed that I spoke of the domain and command model specifically when talking about the pattern.

Repositories are not intended for adding more query or batch methods. If your repository starts to look like this:

// Avoid this!
interface AccountRepository {
    public function getById(string $id): Account;
    public function getByRegistrationDate(DateTimeInterface $date): array;
    public function save(Account... $accounts): void;
    public function deleteRegisteredBeforeDate(string... $ids): void;
}

then you’re heading for a bad place. You need to separate your command model from your query model (read: CQRS), and the repositories are designed to be used by the domain and command models. You’ll find that the repository interface remains pretty stable and doesn’t need much adjustment as time goes on, but the query model will change much more frequently because there are far more ways that clients/customers will want to query their data than save their data.

You don’t need to go so far as having separate databases for reads and writes, however. You just need to ensure that:

Your query (and batch) methods sit on a different interface(s) to the repository
Your query-specific methods should not use the domain entities, but should have its own view models (which do not modify application state and are purely data structures without service methods)

So an example would be:

namespace Query;

// This Account class lives in the Query namespace and is just for a presentational view in the search results
// e.g. it has no withdraw() method to change its state
class Account
{
    // ...
}

interface SearchAccounts
{
    /**
     * @return []Account
     */
    public function findByRegistrationDate(DateTimeInterface $date): array;
}

You can even let your concrete implementations implement the query interfaces if you want:

class PostgresAccountRepository implements AccountRepository, SearchAccounts
{
    // ...
}

Secondly, languages that use promises for modelling asynchronous behaviour tend to force the interface to expose that asynchrony in the method signature. So if the language would use promises to faciliate asynchronous communication with the database, you can’t really create an interface that hides this implementation detail. For example, with typescript we would have:

interface AccountRepository {
    getById(accountId: string): Promise<Account>;
    save(account: Account): Promise<void>;
}

which isn’t all that bad, but is still a quirk to be aware of. The “test” implementations can use promises that resolve immediately after finding the account within a Map:

class InMemoryAccountRepository implements AccountRepository {
    private accounts: Map<string, Account> = new Map<string, Account>();

    public getById(accountId: string): Promise<Account> {
        const account = this.accounts.get(accountId);

        if (account !== undefined) {
            return Promise.resolve(account);
        } else {
            return Promise.reject(new Error("account not found"));
        }
    }

    public save(account: Account): Promise<void> {
        this.accounts.set(account.id, account);
        return Promise.resolve();
    }
}

Note: with async notation it is almost identical:

class InMemoryAccountRepository implements AccountRepository {
    private accounts: Map<string, Account> = new Map<string, Account>();

    public async getById(accountId: string): Promise<Account> {
        const account = this.accounts.get(accountId);

        if (account !== undefined) {
            return account;
        } else {
            throw new Error("account not found");
        }
    }

    public async save(account: Account): Promise<void> {
        this.accounts.set(account.id, account);
    }
}

Lastly, we have naming conventions. This is entirely personal, and comes down mainly to disagreements when working on shared codebases over how a named instance variable for a repository should be.

You’ve probably noticed above that in WithdrawFunds I named the instance variable as though it were just a collection. So I don’t have $accountsRepository, but just $accounts. I do this because it’s in the definition of the repository pattern that it gives the illusion of a collection, and suffixing it -Repo or -Repository kinda ruins that illusion. I like it when names of classes, interfaces, methods and variables communicate their original intent where possible.

Understandably, you may be thinking: but if you’re dealing with an explicitly in-memory collection? If I have a class like AccountCollection, which is meant to deal with operations on in-memory accounts, won’t that cause confusion with AccountRepository? Especially when I have a method that uses both the collection and the repository!

I would say in response:

Most of the time, the need for a -Collection class is borne from the language itself not supporting generics. If you have a generic Collection class, you can get the functionality by instantiating a Collection<Account> when the language supports it. It is unfortunate for some languages that don’t yet support it (e.g. PHP), but it is what it is.
Assuming that you have modelled your entities using well-designed aggregates, you are very unlikely to do use both a -Repository and a -Collection for the same entity type in the same method. You should not need to search for (and persist) multiple of the same kind of entity within the same action (with the obvious exception being batch update operations, and that is a meatier topic for another day!). The -Collection will most likely be used in the query model instead (since it will likely involve searching/filtering for multiple entities of the same kind)

Hopefully you’ve enjoyed this look into the repository pattern, and I hope it helps you with your own projects!