I am not claiming that it will suits everyone needs but at least i hope it will give you some valuable pointers and alternatives.
Also you might want to adapt it for your specific needs because you might not require every single feature of the system.
Summary:
This setup allows you to build your own redundant storage network with common PC hardware, easier but far more expensive way to achieve this would be to get a SAN and some fiber channel attached hosts. This setup provide similar feature as the one provided by Amazon ESB as well as a HR cluster file system for your cloud storage.
Features:
- High availability (DRBD , cluster file system)
- High reliability ( DRBD)
- Flexible Dynamic storage resource managment
- File system export or Block device " amazon ESB style"
- Dynamic fail over configuration ( Pacemaker Corosync )
- Active / Passive ; N+1 ; N to N ; Split site
Overview:
- A set of paired storage back end composed of hosts that use DRBD to keep the data redundant between each paired hosts.
- On top of DRBD we have LVM (or CLVM ) using LVM we can do on-the-fly logical partition resizing, snapshots ,including hosting snapshot+diffs,you can even resize a logical partition across multiple underlying DRBD partition .
- note: LVM can be used as a front end and backend of DRBD
- LVM block device will be exported to the cluster nodes using GNBD. Another node makes a GNBD import and the block device appears to be a local block device there, ready to mount into the file hierarchy.
- OCFS2 as a cluster file system allow all cluster nodes to access this file system concurrently.
- Another possibility is to export a GNBD device for each virtual machine (but you still need a distributed/ network file system for config etc..).
- Use of Pacemaker and Corosync to manage resources for HA / HR
- For the managment / control and monitoring part, a custom made solution might be needed.
- GRAM could be used to expose the resource management
- Any monitoring framework should be able to do the trick
"Simple" Schema:
Pro:
- Most of the independent parts are Proven solution used in large scale production environment
- Open source / readily available tools
- OCFS2 provide back-end storage for Image while GNBD can provide on demand block storage for the cloud instance
- Easy Accounting : as any other file system ( might need custom build tools thought depending of the needs/ requirement)
- COTS components
Con :
- DRBD provide HA/HR through replication ( think RAID 1 ) which means you have HA/ HR and speed at the expense of half of your storage( slightly more if you are using raid for your actual disk storage)
- Complex / and risk of cascading failure due to dominoes effect ( similar to what happen in Amazon cloud recently with their ESB)
- Performance will be extremely dependent of the set of physical resource available as well as the topology usage :
- It will require a lot of tweaking / customization to extract the best performance ( ex dual head for DRBD, load balancing etc.. ) and every setup will be different
- Creation of dedicated monitoring tools will be require in order to manage and automate the performance tweaking
- Require to create custom tool for management, scheduling / job management etc..( the hard bit)
Tools , Links / Pointers :
- DRBD: http://www.drbd.org/
- LVM or CLVM : http://goo.gl/WQHO / http://sourceware.org/cluster/
clvm/ - GNBD: http://sourceware.org/cluster/
gnbd/ - OCFS2 (you can use GFS if you prefer) : http://www.oracle.com/us/
technologies/linux/025995.htm - For clustering automatic resource management and fail over ( take the hassle out of the HA/HR management):
- Pacemaker: http://www.clusterlabs.org/
- Corosync : http://corosync.org
- GRAM http://www.globus.org/toolkit/docs/2.4/gram/
- Some link providing a step in the right direction. However, none of them provide the full range of features i presented in the overview / schema , but it shouldn't be to hard for you to figure out how to get there ( if i have time i might post the actual how-to).
- Ubuntu corosync/pacemaker/drbd8/ocfs2 : http://goo.gl/WWhTB
- Xen with DRBD, GNBD and OCFS2 : http://xenamo.sourceforge.net/
- Using LVM + DRBD + NFS + Heartbeat + VTun To Gain Data Persistence, Redundancy, Automatic Fail-Over, and Read/Write Disk Access Across Multiple EC2 Nodes http://goo.gl/wVgQu
- DRBD for xen ( but lack of flexibility) : http://goo.gl/ap4d8
- KVM Cluster with DRBD/GFS http://goo.gl/MFDZ3
Voila!