Overview:
Live Migration move running virtual machines from one physical server to another with no 
impact to end users. It allows you to  keeps your IT environment up and running, giving 
you unprecedented flexibility and availability to meet the increasing demands of 
your business and end users.
- Reduce IT costs and improve flexibility with server consolidation
- Decrease downtime and improve reliability with business continuity and disaster recovery
- Increase energy efficiency by running fewer servers and dynamically powering down unused servers with our green IT solutions
However, limitations of the current migration technology start to appear when 
they are applied on larger application systems such as SAP ERP or SAP ByDesign. 
Such systems consume a large amount of memory and cannot be transferred as 
seamlessly as smaller ones, creating service interruption. Limiting the impact 
and optimising migration becomes even more important with the generalisation of 
Service Level Agreement (SLA). This strand of research within the hecatonchire 
project aim at improving the live migration of VMs running large enterprise 
applications without severely disrupting their live services, even across the 
Internet. 
How it works :
Full post copy live migration 
- Stop the VM at the beginning
- Sending all the CPU and device states to the destination including the memory
- Send the RAM information and unmap the whole RAM memory region on Host B for RDMA connection
- .Immediately start KVM on Host B
- Host B will start page faulting and pull the page from Host A on demand ( + background prefetching)
|  | 
| Pre copy Vs Post copy live migration | 
Hybrid  post copy live migration  
Hybrid Post copy live migration provide a middle ground between the full post copy and the pre-copy approach. It limit the impact of the page faulting by enabling the pre copy phase while providing a deterministic with reduce performance impact during the overall execution of the live migration. 
|  | 
| Hybrid Live migration | 
Architecture:
|  | 
| Post copy live migration | 
If the VM touches a not-yet-transferred memory page, the VM page fault and initialise a memory request over RDMA using an In-kernel RDMA engine. This engine will  copy the content of the memory page from the 
source and resolve the page fault. 
Prototype / Demo :
We present the design, implementation, and evaluation of post-copy based live 
migration for virtual machines (VMs) across a Gigabit LAN. Post-copy migration 
defers the transfer of a VM's memory contents until after its processor state 
has been sent to the target host. This deferral is in contrast to the 
traditional pre-copy approach, which first copies the memory state over multiple 
iterations followed by a final transfer of the processor state. The post-copy 
strategy can provide a "win-win" by reducing total migration time while 
maintaining the liveness of the VM during migration.
The follwing Video demonstrate three different  post copy live migration 
scenario
- Full post copy
- Hybrid : 10 second timeout before switching to full post copy from standard live migration
- Hybrid 60 second time out : standard live migration finish within allocated time ( however we don’t follow the standard process as there is no stop and copy , just stop , send over cpu status and restart, missing page will be fetched on demand or by the background thread).
In comparison with the traditional approach we  demonstrated that post-copy 
improves several metrics including pages transferred, total migration time, and 
network overhead. it also provide a deterministic live migration features which 
is missing with traditional approach as the system administrator has no control 
of workload placement and transfer.
Comparison between Yabusame and RDMA kernel approach :
Yabusame rely on a special character device driver allows transparent memory page retrievals
from a source host for the running VM at the destination. however as shown in the diagram above this require a lot of communication between different part as well as context switching which tend to be less than optimal. With the aproach we are proposing we are able to eliminate most of the overhead associated with memory transfer while improving overall performance.
Also with Yabusame the VM touches a not-yet-transferred memory page, it pause the VM temporarily. while with our approach we make full usage fo the asynchronous page fault system allowing us to avoid as much as possible to pause the system.
Lack of instantaneous stateful cloning forces users of cloud computing into ad hoc prac tices to manage application state and cycle provisioning. As a result, we aim to provides sub-second VM cloning, scales to hundreds of workers, consumes few cloud I/O resources, with negligible runtime overhead.
Comparison between Yabusame and RDMA kernel approach :
Yabusame rely on a special character device driver allows transparent memory page retrievals
from a source host for the running VM at the destination. however as shown in the diagram above this require a lot of communication between different part as well as context switching which tend to be less than optimal. With the aproach we are proposing we are able to eliminate most of the overhead associated with memory transfer while improving overall performance.
Also with Yabusame the VM touches a not-yet-transferred memory page, it pause the VM temporarily. while with our approach we make full usage fo the asynchronous page fault system allowing us to avoid as much as possible to pause the system.
Future Work : Flash Cloning :
Virtual Machine (VM) fork is a new cloud computing abstraction that instantaneously clones a VM into multiple replicas running on different hosts. All replicas share the sa e initial state, matching the intuitive semantics of statefull worker creation. VM fork thus enables the straightforward creation and efficient deployment of many tasks demanding swift instantiation of stateful workers in a cloud environment, e.g. excess load handling, opportunistic job placement, or parallel computing.Lack of instantaneous stateful cloning forces users of cloud computing into ad hoc prac tices to manage application state and cycle provisioning. As a result, we aim to provides sub-second VM cloning, scales to hundreds of workers, consumes few cloud I/O resources, with negligible runtime overhead.

 
