Author: Cameron Harding, Storage Lead Architect | Title: The Rise of Distributed Storage | Published on 20th May 2019
The Rise of Distributed Storage
It’s not a secret that the past few years have witnessed a fundamental shift in the way storage is offered. Previously, the choice between mid-tier storage systems was driven by the protocols and topologies required by a range of data centre workloads. These days, with the widespread adoption of hybrid cloud, virtualisation has become prevalent, making the underlying storage less relevant and placing a higher priority on workload mobility.
This is a welcome change for most storage professionals as it removes the architectural focus from topologies that deliver an identical result, moving the choice to solutions with unique benefits. Currently, the popular alternative to traditional shared storage is hyper-converged infrastructure based on distributed storage.
For the first blog in this Storage series, I’d like to explore the enabling technologies that have made distributed storage suitable for a wide range of applications. We will set aside the pros and cons of traditional storage versus distributed storage for the next issue.
What is distributed storage?
Let’s start by taking a look at what distributed storage is. In the simplest terms, it is a method of storing data across multiple hosts and managing the presentation of that data as a single system. For this to be useful for data centre applications, we expect it to provide similar advantages to a shared storage system including availability, reliability, scalability, efficiency and performance.
Storage distribution is typically done by breaking data into chunks and then writing multiple copies of the chunks to different computers on a network. The location of those chunks is then stored in a global index so that the system can recall data in its correct order. Some systems use parity as a fault tolerance method rather than multiple copies but we’ll leave a detailed explanation and comparison of data distribution methods for another time.
The concept of distributed storage sounds quite simple so why haven’t we always used it?
To answer that, we probably need to look back at how it has developed over time. Distributed storage is nothing new. Carnegie Mellon University (CMU) developed distributed file systems such as AFS and Coda in the 80’s, about the same time as IBM/Microsoft developed SMB and the same time that Sun introduced NFS. It didn’t start to see widespread use until High Performance Computing (HPC) left traditional supercomputers behind in the 20th century in favour of clusters and grids containing massive quantities of interconnected servers. Lustre (developed at CMU starting from work on Coda) has seen common use as a highly available file system in HPC solutions requiring large scale and high throughput storage. Probably the most well known massive scale distributed storage system is the Google File System (GFS) which uses commodity hardware to provide high throughput primary storage for the Google search engine. Commercial scale-out appliances have also been available for some time from Panasas, Isilon and a number of other vendors.
How did distributed storage with virtual infrastructure go from having a bad rep to a praised storage solution?
Originally, due to its high cost in comparison to traditional shared storage, and the compromise of its latency in favour of high throughput, distributed storage was less than appealing for use with virtual infrastructure. So what has changed? There are a few requirements that the majority of distributed storage products have in common. Let’s have a look at each of these and what has changed to reduce the price of systems suitable for virtual infrastructure.
A client (typically a hypervisor) will only get an acknowledgement of a successful write once data has been written to persistent storage on at least two nodes. This requires fast communication between clustered servers. In the majority of HPC clusters, low-latency high-throughput connectivity was provided using dedicated and expensive InfiniBand switches. With the proliferation of low cost 10Gb/25Gb top-of-rack Ethernet switches, providing this connectivity is no longer cost prohibitive. Cisco, Broadcom, Mellanox, Dell EMC and HPE all produce ultra-low latency switches capable of supporting converged storage and network traffic.
In an environment where data is written to individual or small groups of disk as opposed to parallel writes to a large array of disks, the storage system
needs to be able to stage writes and optimise them before being flushed out to long term storage. The staging area also needs to be persistent and provide availability to ensure that power outages or network failures won’t result in substantial data loss or corruption. This requires NVRAM of some kind across multiple nodes or all nodes in the cluster. In most appliance based platforms, the NVRAM was provided by using high cost battery backed DRAM devices. With the introduction of SATA, SAS and NVMe connected flash storage, the cost of providing this high performance persistent layer has dropped significantly and made it possible to deliver a solution using commodity hardware rather than relying on proprietary devices.
Distributed storage that requires dedicated hardware can be seen to be just as complex if not more complex than traditional shared storage. Dedicated devices were required for storage systems in the past because of high memory and compute requirements. With today’s high core count CPUs and increasing RAM sizes, the percentage of resources required to run storage services is only a fraction of what can be provided. Hyper-convergence takes advantage of the abundance of compute resources by running virtual machines and storage services on the same units to provide a full virtual infrastructure stack across a single commodity hardware unit. This not only reduces cost of the solution but also simplifies management and maintenance by providing the environment as a single clustered system.
With these three key advances in Ethernet switching, flash storage and hyper-convergence, we are now seeing widespread adoption of server based storage products, consuming close to one third of the enterprise storage market.
In the next Outcomex Tech Blog on storage, to provide an insight into which storage type fits which requirement, we’ll have a look at the pros and cons of distributed storage, traditional shared storage, and cloud storage.
Cameron Harding is Outcomex’s Storage Lead Architect. He has been evolving in the Australian IT industry for 25 years. With in-depth knowledge and technical skills, Cameron holds certifications with the industry storage leaders: Cisco, NetApp, Pure Storage, VMware, and Dell EMC.