Live Migration is a process of moving a running virtual machine from one physical system to another. There are usually two ways, precopy and postcopy. Postcopy migration is used for virtual machines which have a high memory dirtying rate, (higher than the bandwidth of the network that will be used to perform migration). The disadvantage of postcopy migration is, if there is a network failure then we are left with an inconsistent vm at both the source and the destination side.
The aim of this project is to recover from the network failure, and complete the migration process. The project will include completing the process of sending the leftover memory pages.
Short term Tasks
- Set up the environment, and perform migration. (completed)
- Try breaking the migration process. (completed)
- Understanding the path which causes the destination to quit in case of network failure during a postcopy migration.
- Understanding why a "info cpus" query on the destination side is making the monitor hang.
- Find some way to stop the destination from quitting in case of network failure during a postcopy migration.
- Understanding the code base and the flow of execution. (ongoing)
Long term Tasks
- In case of a network failure, keeping the state of the migration intact at both the source and the destination side.
- Establishing a connection between the source and the destination when the network issue is resolved.
- Testing the connection recovery for all possible failure scenarios.
- Resume migration, restart the vCPUs, send the rest of the memory pages.
- Sending the rest of the left-out pages and completing the migration.