Coding Reflections: LINQ to SQL and the Repository Pattern

So it has been a while since I’ve blogged about anything, and for that I do apologize! Things have been pretty busy around here, which I suppose should be considered a good thing!

Recently I’ve been working with domain-driven design and trying to get a feel for how LINQ to SQL can be utilized as the O/R mapper behind repositories. In DDD, you usually have a repository for each domain entity to handle CRUD operations. With LINQ, your repository can expose instances of IQueryable<T> so that client code can efficiently execute complex queries on the domain using LINQ. LINQ to SQL does a great job translating this to SQL and materializing the results. Wonderful! And if you’re really picky about DDD, you can even define a separate interface for your entities within your domain model and avoid exposing designer generated classes outside of your repositories.

One problem with the repository approach is managing the life cycle of the DataContext object. If you have ever used LINQ to SQL, you’ve run into one problem or another with the DataContext class. Typically users run into one of the following problems:

You cannot use more than one instance of a DataContext in a single LINQ query, and
You cannot use objects materialized by one DataContext in a new DataContext instance.

So both problems can only happen if you have more than one instance of a DataContext floating around. Why might that be? Well, unless you constantly provide method parameters to pas around a single DataContext instance, it’s going to be really hard to encapsulate any common functionality anywhere! Here’s an example:

public IQueryable<Account> ListAccountsByLastName(string name)
{
    //we would normally do: using(DataContext dc...) but we can't, because then the DataContext would be disposed before returning our query.
    DataContext dc = new DataContext(Properties.Settings.Default.ConnectionString);
    return (from a in dc.GetTable<Account>()
            where a.LastName.Contains(name)
            select a);
}

public IQueryable<Account> ListAccountsByLastName(DataContext dc, string name)
{
    return (from a in dc.GetTable<Account>()
            where a.LastName.Contains(name)
            select a);
}

The first method has to come up with its own DataContext. The second method accepts a DataContext as the first parameter. Neither method is good: you either introduce a new DataContext and limit what can be accomplished with the instance IQueryable<T> and any materialized results, or you always have to be concerned with passing around a DataContext.

I’ve already discussed the second problem with a possible workaround.

I don’t know what the LINQ to SQL developers were thinking. They obviously didn’t take a lesson from the ADO.NET and System.Transactions developers. In my mind, the DataContext class should be available as part of an ambient scope, just like System.Transactions.TransactionScope. You should always be able to pick up the current DataContext. We can easily shield client code from any details required when handling the DataContext class by wrapping it up in a UnitOfWork class. The result: a clean separation of client code from the O/R mapper used by the repositories.

Here’s some sample code:

public Account CreateUser(string email, string firstName, string lastName)
{
    IUserRepository Users = IoC.ResolveService<IUserRepository>();

    Account NewUser = Users.CreateUser(email);

    NewUser.FirstName = firstName;
    NewUser.LastName = lastName;

    return NewUser;
}

public void Main()
{
    using (IUnitOfWork Work = UnitOfWork.Start())
    {
        CreateUser("someone@somewhere.com", "Michael", "P.");

        Work.Commit();
    }
}

Here we see both the Unit of Work and Inversion of Control patterns at work. A Unit of Work is simply an ambient context within which code executes. My implementation simply provides a dictionary for storing objects within that thread-specific context. It also supports a Commit operation that completes any transacted operations performed in the context. The Inversion of Control pattern allows me to define the repository that implements the IUserRepository interface somewhere other than in the client code. This implementation uses a section in the app.config file.

So within the CreateUser method above, we get an instance of the user repository and invoke its CreateUser method. Here’s that method:

public Account CreateUser(string email)
{
    Account TargetAccount = new Account()
    {
        Type = AccountType.Standard,
        Email = email,
    };

    Context.GetTable<Account>().InsertOnSubmit(TargetAccount);

    return TargetAccount;
}

Notice how there is no explicit creation of a DataContext even within the repository. The Context property is exposed by a repository base class:

protected DataContext Context { get { return DataContextProxy.Singleton; } }

The DataContextProxy class and Singleton property just help us manage a single instance of a DataContext stored in the current unit of work scope (an exception is raised if a unit of work hasn’t been started by the client code). The inversion of control pattern is actually used again by the DataContextProxy class to delegate creation of the DataContext instance. This allows the connection string (and optionally a mapping source) to be provided by client code.

By limiting the instances of DataContext to just one, within a unit of work scope, you never need to worry about which DataContext materialized a particular object or produced an instance of IQueryable<T>. Your repositories always have access to the current DataContext and your client code never has to worry about it or any complex attaching / detaching behavior.

Excellent! But you might be wondering: what do I do if I use stored procedures and table valued functions that have been generated by the LINQ to SQL designer for my particular data context. My repository only has access to plain old DataContext and the GetTable<T> method. Well, you could execute your database functions using the DataContext method ExecuteQuery. This will work, but you’ll get back an IEnumerable<T> instance for table value functions, not an IQueryable<T> instance. This means any further LINQ statements will be done in memory, not executed in SQL.

You could also use your own derived DataContext generated by the designer to call your database functions, but then you’d be subject to the limitations above when working with multiple DataContext instances. At least, you should be. In reality there is a loophole in the LINQ to SQL query converter that does allow you to successfully combine queries involving mapped database functions from different contexts. You don’t get this functionality automatically, but I’ll show you the rest of the DataContextProxy class and how it bypasses the DataContext method CreateMethodCallQuery next time!

Coding Reflections

Wednesday, March 18, 2009

LINQ to SQL and the Repository Pattern

No comments:

A Little About Me

What I'm Reading

Slashdot

Blog Archive