Monday, January 09, 2012

KVM Post Copy Live Migration with Kernel RDMA transport


Live Migration move running virtual machines from one physical server to another with no impact to end users. It allows you to  keeps your IT environment up and running, giving you unprecedented flexibility and availability to meet the increasing demands of your business and end users.
  • Reduce IT costs and improve flexibility with server consolidation
  • Decrease downtime and improve reliability with business continuity and disaster recovery
  • Increase energy efficiency by running fewer servers and dynamically powering down unused servers with our green IT solutions
However, limitations of the current migration technology start to appear when they are applied on larger application systems such as SAP ERP or SAP ByDesign. Such systems consume a large amount of memory and cannot be transferred as seamlessly as smaller ones, creating service interruption. Limiting the impact and optimising migration becomes even more important with the generalisation of Service Level Agreement (SLA). This strand of research within the hecatonchire project aim at improving the live migration of VMs running large enterprise applications without severely disrupting their live services, even across the Internet.

How it works : 

Full post copy live migration
  1. Stop the VM at the beginning
  2. Sending all the CPU and device states to the destination including the memory
  3. Send the RAM information and unmap the whole RAM memory region on Host B for RDMA connection 
  4. .Immediately start KVM on Host B
  5.  Host B will start page faulting and pull the page from Host A on demand ( + background prefetching)
Pre copy Vs Post copy live migration

Hybrid  post copy live migration 

Hybrid Post copy live migration provide a middle ground between the full post copy and the pre-copy approach. It limit the impact of the page faulting by enabling the pre copy phase while providing a deterministic with reduce performance impact during the overall execution of the live migration.

Hybrid Live migration


Post copy live migration
If the VM touches a not-yet-transferred memory page, the VM page fault and initialise a memory request over RDMA using an In-kernel RDMA engine. This engine will  copy the content of the memory page from the source and resolve the page fault.

Prototype  / Demo :

We present the design, implementation, and evaluation of post-copy based live migration for virtual machines (VMs) across a Gigabit LAN. Post-copy migration defers the transfer of a VM's memory contents until after its processor state has been sent to the target host. This deferral is in contrast to the traditional pre-copy approach, which first copies the memory state over multiple iterations followed by a final transfer of the processor state. The post-copy strategy can provide a "win-win" by reducing total migration time while maintaining the liveness of the VM during migration.
The follwing Video demonstrate three different  post copy live migration scenario
  1.  Full post copy
  2.  Hybrid : 10 second timeout before switching to full post copy from standard live migration
  3.  Hybrid 60 second time out : standard live migration finish  within allocated time ( however we don’t follow the standard process as there is no stop and copy , just stop  , send over cpu status and restart, missing page will be fetched on demand or by the background thread).
In comparison with the traditional approach we  demonstrated that post-copy improves several metrics including pages transferred, total migration time, and network overhead. it also provide a deterministic live migration features which is missing with traditional approach as the system administrator has no control of workload placement and transfer.

Comparison between Yabusame and RDMA kernel approach : 

Yabusame  rely on a special character device driver allows transparent memory page retrievals
from a source host for the running VM at the destination. however as shown in the diagram above this require a lot of communication between different part as well as context switching which tend to be less than optimal. With the aproach we are proposing we are able to eliminate most of the overhead associated with memory transfer while improving overall performance.
Also with  Yabusame the VM touches a not-yet-transferred memory page, it  pause the VM temporarily. while with our approach we make full usage fo the asynchronous page fault system allowing us to avoid as much as possible to pause the system.

Future Work : Flash Cloning :

Virtual Machine (VM) fork is a new cloud computing abstraction that instantaneously clones a VM into multiple replicas running on different hosts. All replicas share the sa e initial state, matching the intuitive semantics of statefull worker creation. VM fork thus enables the straightforward creation and efficient deployment of many tasks demanding swift instantiation of stateful workers in a cloud environment, e.g. excess load handling, opportunistic job placement, or parallel computing.
Lack of instantaneous stateful cloning forces users of cloud computing into ad hoc prac tices to manage application state and cycle provisioning. As a result, we aim to provides sub-second VM cloning, scales to hundreds of workers, consumes few cloud I/O resources, with negligible runtime overhead.

Tuesday, January 03, 2012

Live Migration Optimization for VM Running Large Enterprise Applications

The problem with  Enterprise class application:

  • Bigger than average resource requirement
  • Average SAP ERP 16GB + per VM with 32 GB of swap more than common
  • OLTP system such as ERP are very sensitive to time variation. 
  • Rely heavily on precise scheduling capabilities, triggers, timers and on the ACID compliance of the underlying

As a result there is many challenge when migrating VMs running such application:

  • Disconnection of services: 
    • Gigabit Ethernet timeout  ≈ 5 seconds (>500 MB memory left in stop and copy phase )
    • Downtime is workload dependent
  • Disruption of services:
    • Migration progressively increasing the amount of resource dedicated to itself => gradually degrade performance of the coexisting systems / VMs.
    • Difficulty to maintain consistency and transparency
  • Unpredictability and rigidity

With some of my colleague we developed optimisation that enables a smother migration of such system while reducing the overall down time and virtually eliminating any disruption of services.

You can watch the video explaining the change we are proposing Here of the presentation at KVM forum :

And download the slide deck there: