Sharding into the cloud

NHibernate.Shards is an extension to the well-known ORM which allows a logical database to be partitioned across multiple physical databases and servers.  It's a port of the Hibernate.Shards project, as with lots of thing in NHibernate.  I thought it would be interesting to see how well it worked against SQL Azure.  It turned out to be not interesting at all ... just plain easy!

Step 1: Register with SQL Azure
Turnaround on token requests is pretty quick right now (<24 hours in my case).

Step 2: Setup some SQL Azure databases

Step 3: Setup appropriate logins, users
The SQL Azure team have done a great job to allow SQL Server Management Studio to connect a query window to an Azure database, but I'm a bit SQL-phobic at the best of times. This was the most challenging bit for me!

Step 4: Download and compile NHibernate.Shards from NHContrib Subversion

Step 5: Set your connection strings in the config file

Step 6: Press play.  Really, that's all there is to it!

Now you may notice that I neglected to create any schema in the Azure databases - that's because NHibernate can do that for me.  Did I mention that I'm a bit SQL-phobic?  [;)]

The code I was using was the standard example that comes with NHibernate.Shards, which records WeatherReport objects, which I've attached.  It's the same example that Ayende dissected, so you can also pick up his discussion of hards-progress-report.aspx" mce_href="http://ayende.com/Blog/archive/2009/10/18/nhibernate-shards-progress-report.aspx">the workings of NHibernate.Shards.  The code looks like this (click to enlarge):

sql-azure-shard-code

And the results are as follows (click to enlarge):

sql-azure-shards

Some of the features of NHibernate.Shards that really stood out for me:

  • It can query all shards in parallel or sequentially.  For SQL Azure, that's quite useful!  A sequential query my single-record shards took 601ms, whereas a parallelized query took 411ms (almost 33% less).
  • New records can be allocated to shards based on either the data (e.g. surname starts with A-M or N-Z) or some other scheme (e.g. round-robin).
  • If the correct shard can be identified based on an object's identity, then only that single shard is queried to retrieve the entity (this is based on your own IShardResolutionStrategy implementation).
  • If you sort by a property, then this sort will be applied even when data is merged from multiple shards.

Overall though, it all just works tremendously well.  Congratulations really must go to:

  • The Microsoft SQL Azure team
  • Dario Quintana, for his work on NHibernate.Shards
  • Fabio, Ayende and the rest of the NHibernate committers

EDIT: Querying data from the shards is done using code like the following.  You should notice that this code makes no references to the shards, and in fact is "normal" NHibernate code.  The sharding is all handled transparently in the ORM.

sql-azure-shard-code2

November 10 2009
Comments are closed