By Jim White (Director of Training and Instructor)
Last week, my fellow Intertech colleague and Microsoft MVP, Tim Star, and I presented the Windows Azure Bootcamp for the Twin Cities. According to Microsoft reps, we broke the record for the most attended bootcamp in the US with nearly a hundred people at the event.
A common question that I received during and after the bootcamp was "Why would I want to use Windows Azure Table Storage versus Windows SQL Azure to store my application data?" A good question with the answer dependent on your application and data needs.
Table Storage and SQL Azure Defined
First, allow me to backup and set the stage a little for this discussion. SQL Azure is essentially SQL Server in the Microsoft cloud computing environment known as Azure. That is not quite true in that SQL Azure currently has a number of limitations and unsupported features that SQL Server 2008 has (here is a starter list of limitations: http://msdn.microsoft.com/en-us/library/ee336245.aspx). I like to say that SQL Azure is either SQL Server 2008 minus or SQL Express plus depending on how you want to view it. Table Storage is one of three alternate storage mechanisms built into Azure that is collectively called Windows Azure Storage Services. The other two being queues and blobs. Table Storage allows you to store serialized entities in a table, but the term table here is not a relational database table. To provide people with some analogy they can use to get their arms around Table Storage, I like to tell people to think of Table Storage as a fancy spreadsheet. You can store the state of your entities in the columns of the spreadsheet. However, there is no linkage or relationship (therefore joins) between entities – at least none that is automatically managed and maintained by Azure. There are no custom indexes – at least not today.
Interestingly, when Azure was first introduced in 2008, SQL Server was not part of the original picture. Because of developer negative reactions, SQL Azure was added to the next preliminary release in 2009. There is a growing faction that is trying to get the software community to look at SQL alternatives. The "No-SQL" community (see here and here), to some extent, has influenced part of the Azure cloud computing platform through the Table Storage option, but not enough to eliminate it from the Microsoft cloud.
While both SQL Azure and Azure Table Storage provide data persistence via table structure, there are a number of differences between them. The sections below outline some of the key differences and factors you want to weigh before building an application for Azure that requires some form of table persistence.
Scale and Performance
When looking at sheer volume, Table Storage is today far more scalable than SQL Azure. Given a storage account (storage accounts hold blobs, queues and tables) is allowed to be 100TB in size, in theory your table could consume all 100TB. At first glance, a 100TB chunk of data may seem overwhelming. However, Table Storage can be partitioned. Each partition of Table Storage can be moved to a separate server by the Azure controller thereby reducing the load on any single server. As demand lessens, the partitions can be reconsolidated. Reads of Azure Table Storage are load balanced across three replicas to help performance.
Entities in Table Storage are limited to 1MB each with no more than 255 properties (3 of which are required partition key, row key, and timestamp). That seems like an absurd number, and it is, but remember that there are no relationships and joins in Table Storage. Therefore, you might need some wide tables to handle associated data.
Today, SQL Azure databases are limited to 1GB or 10GB. However, sometime this month (June 2010), a 50GB limit is supposed to be available. What happens if your database is larger than 10GB today (or 50GB tomorrow)? Options include repartitioning your database into multiple smaller databases or sharding (Microsoft’s generally recommended approach). Without getting into the database details of both of these database design patterns, both of these approaches are not without issue and complexity, some of which must be resolved at the application level.
Data in the cloud, be it in SQL Azure or Azure Table Storage, can be accessed from in or out of the cloud. To access data in SQL Azure, all the standard tools and APIs apply that work with SQL Server. Meaning, your existing .NET/SQL Server knowledge and experience can be heavily leveraged. ADO.NET and ODBC APIs can be used by application code to access the SQL Azure database. Tools like SQL Server Management Studio and Visual Studio can be pointed to the SQL Azure instance and manipulate the schema and data just as you do today with SQL Server.
Access to Azure Table Storage is accomplished either via REST API or Storage Client Library provided with the Windows Azure SDK. Using the REST API allows client applications to communicate and use data from Table Storage without having detailed and specific knowledge of an Azure API, but it is more complex and difficult to work with. The Storage Client Library (which leverages LINQ to Objects) provides a layer of convenience but requires the application reference the Storage Client Library APIs. REST and the Storage Client Library incur a learning curve that is typically not there when using SQL Azure.
As mentioned, data in SQL Azure and Table Storage can be accessed from applications in and out of the cloud. That means applications can be moved in or out of the cloud and still deal with the data in the cloud in the same way. However, one question that may need to be considered is whether the data must always live in the cloud? Applications generally view data in SQL Azure similar enough to data in a normal SQL Server database as to allow the data to migrate back and forth between the cloud and on-premise databases. In fact, there are even migration tools to help move data between instances of SQL Server and SQL Azure.
However, given the unique nature and access APIs of Table Storage, portability of the data is not as straight forward. Table Storage tightly couples your data to the cloud. Moving the data out of the cloud would require an on-premise data storage alternative, a data migration strategy, and likely require application code changes.
Transactions and Concurrency
SQL Azure supports typical ACID transactions for work within the same database. Transactions across databases are not supported. SQL Azure allows for typical optimistic and pessimistic concurrency strategies.
Table Storage supports transactions for entities in the same table and table partition, but not across tables or partitions. Additionally, only 100 operations or less (what is called a batch in Azure Table Storage) can be part of the transaction. Only one operation can be performed on each entity in the batch, and the batch must be limited to 4MB. Table Storage abides strictly by an optimistic concurrent strategy. If, on commit of the transaction, data has been changed by another process the whole transaction must be rolled back and retried. Due to this single concurrency strategy, a built-in retry option is provided with the Storage Client Library.
Using Table Storage, queries are limited to 1000 entities by default. If more than 1000 entities are found, a continuation token is returned and must be used by the application to retrieve the next set of entities. Queries that take longer than 5 seconds also return a continuation token. Queries that take longer than 30 seconds are cancelled. Data in Table Storage is organized by partition key and indexed by row key. Because there are no custom indexes in tables, queries by partition key and row key are fast, but queries that do not use partition key and row key are slow.
Generally speaking, SQL Azure has no limitations, issues or special programming requirements to work with large queries. Good database and index design can help improve performance of queries; especially large ones.
Columns in Table Storage are limited to the types in the table below.
|Int32 or int|
|Int64 or long|
Perhaps the most unique aspect to designing and architecting applications for the cloud is that it requires developers to think like businessmen. Each technical choice often has direct costs associated with it when developing for the cloud. The choice in data storage can have a huge impact on the cost of running an application.
Azure Table Storage costs 15? per GB of storage per month. Additionally, you pay 1 cent per 10,000 transactions with Table Storage. SQL Azure costs are $9.99 for 1 GB of storage per month ($99.99 for 10GB).
See Microsoft’s sight for more details and specifics on costs here.
Chris Hay and Brian Prince in their forth coming book Azure in Action (published by Manning – see here) provide a synopsis of the SQL Azure vs. Table Storage in a few paragraphs. "If size is the issue, that would be the first flag that you might want to consider Azure Tables. As long as the support Tables has for transactions and queries meets your needs. The size limit surely will, at 100TB." Further they suggest sophisticated transactions, or a complex authorization model might require the services of SQL Azure. And as shown by the cost table above, "The final consideration is cost. I can store a lot of data in Azure Tables for a lot less money than I can in SQL Azure. SQL Azure is giving me a lot more features to use (joins, relationships, etc.), but it does cost more."
We are given every indication by Microsoft that SQL Azure will have far more capability in the future – akin to the SQL Server you might find in your data centers today. So some of the comparison above may be moot or less important over time. Additional functionality is also being proposed to Table Storage as well. For example, support of secondary (non-key) indexes is already been suggested for a future release (see here). However, key architectural differences between SQL Azure and Table Storage will remain and leave application designers having to pick the best option for their systems. Welcome to cloud computing. There is a lot of ROI to be had by running in the cloud, but only with proper application design and architecture.
If Intertech can help you negotiate the issues of cloud computing, please contact Ryan McCabe at firstname.lastname@example.org.