Distributed computing and data centers - why database design matters

Spending on cloud continues to go up. IDC predicted that spending on public cloud services and infrastructure would reach $160 billion worldwide in 2018, going up by more than 23 percent compared to the previous year. There is no indication that this is going to slow for 2019. Public cloud growth is driven by two things - building of new applications in the cloud, and migration of existing applications.

But how will these two trends affect data center strategies? Will this lead to big changes in how we run our operations, or will this be more of the same? And most importantly, can you do all this without massive application redesigns and rewrites?

The move to cloud

The headline number predicted by IDC points towards a big shift in infrastructure spending. This will lead to a massive shift in deployment. New applications built in software containers or running on serverless functions can take advantage of the benefits of cloud around on-demand scalability. Rather than needing complete operating systems under each application component, containers can be set up with only the necessary elements included. This takes up far less resources to achieve the same amount of work. If more resources are needed, then additional container images can be used.

Similarly, serverless functions is an interesting new trend that can be used to meet specific application needs once they are needed. If and when a function is triggered, the function consumes resources and delivers the resultant work back to the rest of the application. Both approaches focus on the results that are needed and aim to reduce the upkeep overhead for developers and operations.

However, the foundation of this work will be data. As these new applications create more and more data, that information will have to go somewhere. While compute can be distributed and moved into hybrid or multi-cloud environments more easily, data storage and management has not been such an easy problem to solve.

Container management systems like Kubernetes are growing in popularity as a way to make hybrid and multi-cloud management easier. After all, if you can run containers and manage them with Kubernetes across multiple places, then you have achieved a measure of independence from any particular cloud provider, right? However, while this is true on the application side, it does not provide the same level of support for data.

Organizations want to run hybrid and multi-cloud environments. The challenge of running across multiple sites - whether this is a combination of on-premises infrastructure and public cloud, or several public cloud providers together - is that consistency of approach around handling, managing and storing data over time. This means looking at your databases and how they function when they are in the cloud. A database just running in the cloud is not the same as a cloud database.

Distributed data management and data center design

For companies either running in the cloud - or looking to move an application to a multi-cloud approach - implementing a hybrid cloud database involves understanding some distributed computing theory but not a lot. Luckily this is a mostly solved problem. When you implement a distributed computing environment, you have to run with one location or “node” in charge, or in a fully distributed and “masterless” environment.

For applications with a lead node, all operations are directed and managed from one location and all the others have to follow this one. For environments like traditional data centers, having one node that is “in charge” is fine as everything is local and one the same network. However, when you have locations distributed across multiple clouds, this approach does not work as well - for example, it is hard to scale beyond a certain level of requests due to having a single node coordinating the whole application. When apps are geographically distributed, this coordination latency makes the performance side even harder. No matter where your customers are, they still have to go to the same server. That can make your application almost unusable.

The alternative approach - running masterless and fully distributed - can solve those issues. However, this is not as simple as it sounds either. Distributed computing involves managing operations across multiple locations; if you are running in a hybrid cloud, then you have to have the same data layer or database platform implemented across all your locations. For a fully cloud native database service, you have to be able to run across all those options without any variation in service level or any changes to the applications themselves.

Equally, this approach has to support full portability of data. You may be happy with your approach now, but you should not want to tie yourself locked into a specific public cloud provider’s data management platform forever. If the situation around an application or batch of services changes, then you should be free to move your data from one cloud or data center location to another without penalty. More importantly, you should not have to redevelop that application simply to keep services running.

From a database perspective, this involves supporting a distributed computing environment without getting locked into specific cloud providers’ offerings. By keeping the cloud database independent of the cloud provider or the infrastructure, you should be able to avoid some of the potential issues over time. Rather than being tied to a single cloud strategy, you should be able to maintain the benefits of your internal and existing data center infrastructure as well as using cloud where it makes sense. The benefits to architecture and application development will pay dividends in the long term.

Maintaining control over data strategy

For data center professionals, dealing with these issues and supporting application development teams means thinking through the tangled web of storage, database and application infrastructure components that are normally in place. This mix of different technologies - old and new - is difficult to unpick and replace from scratch, even with a full-scale migration to cloud. Instead, companies are already looking at how APIs and applications can be integrated successfully together to meet those business results.

Internal data centers are not going away any time soon. While companies are making the most of public cloud, the number of companies that can shift wholly to public cloud is limited. For large enterprises, the digital spaghetti of traditional applications like ERP connected together through integrations and APIs will make this impossible. Instead, running across multiple locations in hybrid and multi-cloud modes will be the commonest way to deliver applications for the foreseeable future. What this means is that more thought around distributed computing and database design will be needed in order to keep up.

Distributed computing and data centers - why database design matters

The move to cloud

Distributed data management and data center design

Maintaining control over data strategy

Tags

The make vs. buy decision for data center infrastructure management software – A clear choice

2023 Data Center Market Trends: Hong Kong Asia's Connectivity Hub

Emerging Energy Storage Technologies

Success story: Kao Data and Cadence