Tuesday, May 05, 2015

The rise of micro storage services

Current and emergent storage solutions are composed of sophisticated building blocks: dedicated fabric, RAID controller, layered cache, object storage etc. There is the feeling that storage is going against the current evolution of the overall industry where complex services are composed of small, independent processes and services, each organized around individual capabilities.

What surprised me is that most of the storage innovation trends focus on very high level solutions that try to encompass as many features possible in a single package. Presently, insufficient efforts are being made to build storage systems based on small, independent, low-care and indeed low-cost components. In short, nimble, independent modules that can be rearranged to deliver optimal solution based on the needs of the customer, without the requirement to roll out a new storage architecture every time is simply lacking - a "jack of all trade" without,or limited "master of none" drawbacks or put another way modules that extend or mimic what is happening in the container - microservices space.

Ethernet Connected Drives

Despite this, all could change rapidly as an enabler (or precursor, depending how you look at it), of this alternative solution as it is currently emerging and surprisingly, coming from the Hard Drive vendors : Ethernet Connected Drives [slides][Q&A].This type of storage technology is going to enable the next generation of hyperscale cloud storage solution. Therefore, massive scale out potential with better simplicity and maintainability,not to mention lower TCO.

Ethernet Connected Drives are a step in the right direction as they allow a reduction in capital and operating costs by reducing:
  • software stack (File System, Volume Manager, RAID system);
  •  corresponding server infrastructure; connectivity costs and complexity; 
  • granularity which enable greater variable costs by application (e.g. cold storage, archiving, etc.).
Currently, there are two vendors offering this solution : Seagate with Kinetic and HGST with the Open Ethernet Drive. In fact we are already seeing some rather interesting applications of the technology. Seagate released a port of SheepDog project onto its Kinect product [Kinect-sheepdog] there by enabling the delivery of a distributed object storage system for volume and container services that doesn't requires dedication. Indeed there is a proof of concept presented HEPiX of HGST drive running CEPH or Dcache. While these solutions don’t fit all the scenarios, nevertheless, both of these solutions demonstrate the versatility of the technology and its scalability potential (not to mention the cost savings).

What these technologies enables is basically the transformation of the appliances that house masses of HDD into switches thereby eliminating the need for a block or file header as there is now a straight IP connectivity to the drive making these ideal for object based backends.

Emergence of fabric connected Hardware storage:

What we should see over the next couple of years is the emergence of a new form of storage appliance acting as a fabric facilitator for a large amount of compute and network enable storage devices. To a certain extend it would be similar to HP's moonshot except with a far greater density.

Rather than just focusing on Ethernet, it would be easy to see PCI, Intel photonic, Infiniband or more exotic fabrics been used. Obviously Ethernet still remains the preferred solution due to its ubiquity in the datacenter. However, we should not underestimate the need for a rack scale approach which would deliver greater benefit if designed correctly.
While HGST Open Ethernet solution is one good step towards the nimble storage device, the drive enclosure form factor is still quite big and I wouldn't be surprised if we see a couple of start-ups coming out of stealth mode in the next couple of months with fabric (PCIe most likely) connected Flash. This would be an equivalent of the Ethernet connected drive interconnected using a switch + backplane fabric as shown in the crudely designed diagram below.

Is it all about hardware?

No, indeed quite the opposite. That said, there is a greater chance of penetration of new hardware in the storage ecosystem as compared to the server market. This is probably where ARM has a better chance of establishing a beach head within the hyperscale datacenter as the microserver path seems to have failed.
What this implies is that it is often easier to deliver and sell a new hardware or appliance solution in the storage ecosystem than a pure software one. Software solutions tend to take a lot longer to get accepted, but when they pierce through, they quickly take over and replace the hardware solution. Look at the object storage solution such as CEPH or other hyper-converged solution. They are a major threat to the likes of Netapp and EMC.
To get back on the software side as a solution, I would predict that history repeats itself to varying degrees of success or failure. Indeed, like the microserver story we see, hardware micro storage solutions while rising, at the same time we see the emergence of software solutions that will deliver more nimble storage features than before.

In conclusion, I feel that we are going to see the emergence of many options for a massive scale-out, using different variants of the same concept: take the complex storage system and break it down to its bare essential components; expose each single element as its own storage service; and then build the overall offer dynamically from the ground up. Rather than leveraging complexed pooled storage services we would have dynamically deployed storage applications for specific demands composed of a suite of small services, each running in its own process and communicating with lightweight mechanisms.These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a minimum of centralized management relating to these services, which may be written in different programming languages and use different data storage technologies, . which is just the opposite of current offers where there are a lot of monolithic storage applications (or appliances) that are then scaled by replicating across servers.

This type of architecture would enable a true, on-demand dynamic tiered storage solution. To reuse a current buzzword, this would be a “lambda storage architecture”.
But this is better left for another day’s post that would look into such architecture and lifecycle management entities associated with it.