Wednesday, May 04, 2011

Architecture Overview of an Open Source Low TCO cloud storage system

I present here a possible solution for a low TCO open source cloud storage system, For those out there creating their own cloud (or hosting service).

I am not claiming that it will suits everyone  needs but at least i hope it will give you some valuable pointers and alternatives.
Also you might want to adapt it for your specific needs because you might not require every single feature of the system.


Summary:

This setup allows you to build your own redundant storage network with common PC hardware, easier but far more expensive way to achieve this would be to get a SAN and some fiber channel attached hosts. This setup provide similar feature as the one provided by Amazon ESB as well as a HR cluster file system for your cloud storage.

Features: 
  • High availability (DRBD , cluster file system)
  • High reliability ( DRBD)
  • Flexible Dynamic storage resource managment 
    • File system export or Block device " amazon ESB style"
  • Dynamic fail over  configuration ( Pacemaker Corosync )
    • Active / Passive ; N+1 ; N to N ; Split site

Overview:
  • A set of paired storage back end composed of  hosts that use DRBD to keep the data redundant between each paired hosts.
  • On top of DRBD we have LVM (or CLVM ) using LVM we can do on-the-fly logical partition resizing, snapshots ,including hosting snapshot+diffs,you can even resize a logical partition across multiple underlying DRBD  partition .
    •  note: LVM can be used as a front end and backend of DRBD
  • LVM block device  will be exported to the cluster nodes using GNBD. Another node makes a GNBD import and the block device appears to be a local block device there, ready to mount into the file hierarchy.
  • OCFS2 as a cluster file system allow all cluster nodes to access this file system concurrently.
    •  Another possibility is to export a GNBD device for each virtual machine (but you still need a distributed/ network file system for config etc..).
  • Use of Pacemaker and Corosync to manage resources for  HA / HR 
  • For the managment / control and monitoring part, a custom made solution might be needed. 
    • GRAM could be used to expose the resource management
    • Any monitoring framework should be able to do the trick

"Simple" Schema: 








Pro:
  • Most of the independent parts are Proven solution used in large scale production environment
  • Open source / readily available tools
  • OCFS2 provide back-end storage for Image while GNBD can provide on demand block storage for the cloud instance
  • Easy Accounting : as any other file system ( might need custom build tools thought depending of the needs/ requirement) 
  • COTS components


Con :
  • DRBD provide HA/HR through replication ( think RAID 1 ) which means you have  HA/ HR and speed at the expense of  half of your storage( slightly more if you are using raid for your actual disk storage)
  • Complex / and risk of cascading failure due to dominoes effect ( similar to what happen in Amazon cloud recently with their ESB) 
  • Performance will be extremely dependent of the set of physical resource available as well as the topology usage :
    • It will require a lot of tweaking / customization to extract the best performance ( ex dual head for DRBD, load balancing etc.. ) and every setup will be different
    • Creation of dedicated monitoring tools will be require in order to manage and automate the performance tweaking
  • Require to create custom tool for management, scheduling / job management etc..( the hard bit)



Tools , Links / Pointers :

Voila!