All applications have inherent limits in what they can support. As organizations grow, so do their expectations on the apps they use. Development and operations teams need to plan for large scale from day one — not react to it when it finally happens.
When you hit the limitations of your system, you’ll need to determine how best to scale up. This can be done by changing the cluster hardware configuration, patching or replacing the database software, re-designing schemas, migrating data to other services, or a host of other tasks that often exacerbate issues in the short term or prolong downtime. The process can be costly and error-prone as organizations learn what pitfalls their database management systems (DBMS) vendor hasn’t told them about or hasn’t yet discovered.
The limitations of the traditional, relational approach to the DBMS are becoming more exposed as enterprise cloud services become more ingrained in daily business. NoSQL databases offer a solution to large scale but can be complex to implement. So what do you need to know?
Traditional relational databases are designed for consistency. There is a raft of NoSQL databases that target different use cases, and often emphasise availability of the data. The differences lie in either using expensive hardware that is powerful enough to provide database consistency at large scale (‘vertical scaling’), or using new database software that supports availability across groups of commodity servers (‘horizontal scaling’).
Popular relational databases like MySQL scale well vertically but are complex to scale horizontally. As demand grows for enterprise cloud applications, there are good reasons for using NoSQL databases that are designed to scale horizontally:
- Size of data — if your data set doesn't fit on one machine, you need two
- Concurrency — if a server can handle 10,000 req/sec and you get 20,000, you need two machines
When a database needs more than one machine to handle demand, you hit CAP theorem. Posed by Eric Brewer, CAP theorem states that a distributed database system can prioritize two of three properties: consistency, high availability, and partition tolerance (more on Brewer’s CAP theorem can be found at http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed).
While horizontally scaled databases can effectively and easily scale with cloud applications, multi-machine system introduces the problem of partitioning events; nodes will fail, networks will get cut off. Partitioning leads us to a choice between high availability partitions (AP) and consistency partitions (CP). With CP, all nodes must agree before data is written. This means that availability is hampered during a network partition as no one is able to write. For AP, consistency is sacrificed to allow the storage layer to remain available and usable to the application. The drawback is that different clients may see different data during periods when nodes haven’t all received the latest data.
Plan for success
Worrying about scaling issues when they hit is no longer appropriate, especially in a world where demand can no longer be controlled (e.g. the ‘app store effect’). Users hit servers immediately, with no limits on when and how people are using apps. In fact, the only way to assert control in the app store era is to lock down an application. You either make users wait until servers can handle more load, or pull the application entirely. Such draconian measures hardly qualify as planning for success.
Large, heavily used systems bring with them a high probability that a portion of the system will fail. A database engineered around this assumption that prioritises availability and eventual consistency is better suited to keeping your application online.
ATMs are a great example. Inconsistent banking data is why it’s still possible to overdraft money without realizing it. It is unrealistic to present a consistent view of your account balance throughout the entire banking system if every node in the network needs to halt and record this figure before continuing operations. It’s better to make the system highly available.
Enterprise collaboration software poses a new problem in the age of mobile data. When mobile devices lack network access and go offline, there are essentially two disconnected systems on which users are updating data. Allowing them to have mutable data on their phones or tablets while the network is offline would be an important feature. Syncing these updates for thousands of users when these devices come back online is a huge problem.
Deal with NoSQL complexity now
The view that we take with Cloudant’s database-as-a-service, along with CouchDB and other NoSQL databases, is that it’s better to expose developers to the complexities of large scale early in the design process. Address scaling issues head on, so that you don’t have to solve them at 3am or right before that important demo.
Building these systems is challenging, which is why there are plenty of vendors who offer consulting and support for managing distributed NoSQL databases, and others who will host and manage these systems for you. Whether you decide to have help or build it yourself, when you’re launching a new cloud application for your business, keep application availability at large scale in mind. The enterprise version of the app store effect is coming.