Features/QED/OnlineDefrag

From QEMU
Revision as of 14:45, 11 October 2016 by Paolo Bonzini (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

QED's performance characteristics are very similar to raw except that it causes additional fragmentation. The following discusses how to implement online defragmentation in QED to attempt to address this over the life time of the VM

Identify Defragmentation Candidates

Search the L2 cache to find two clusters that when swapped will lead to less fragmentation. The following algorithm could be used for this:

def is_fragmented(l2e):
    if (l2e.file_offset - header_size) != l2e.virtual_offset:
        return True
    return False

def schedule_defrag(l2e, target_l2e):
    new_cluster = allocate_new_cluster()

    freeze_future_writes(target_l2e)
    copy_cluster(target_l2e, new_cluster)
    update_l2e_entry(target_l2e, new_cluster)

    freeze_future_writes(l2e)
    copy_cluster(l2e, target_l2e)
    old_cluster = l2e.file_offset
    update_l2e_entry(l2e, target_l2e.file_offset)

    copy_cluster(target_l2e, l2e)
    update_l2e_entry(target_l2e, old_cluster)

    unfreeze_future_writes(target_l2e)
    unfreeze_future_writes(l2e)
    free_cluster(new_cluster)

def try_to_defrag(l2e):
    target_l2e = qed_find_l2_cache_entry(l2e.file_offset - header_size)
    if target_l2e:
        schedule_defrag(l2e, target_l2e)
        return True
    return False

def defrag():
    for l2_table in l2_cache:
        for l2e in l2_table:
            if is_fragmented(l2e) and try_to_defrag(l2e):
                return True
    return False

Considerations

If power is lost during the schedule_defrag() routine, a block will be leaked. If we have support in the header for detecting an disk image that is not closed properly, we can identify the leaked block during fsck and attempt to allocate to it

L1 and L2 entries will never be defragmented with this algorithm. Meta data represents a small portion of the disk though so it's not clear that it's worth it initially.

Blocks are only defragmented if their metadata is cached. The current L2 cache will completely cache the meta data for a 100GB image.

This algorithm only works when it's possible to put a block at an exact location. A more sophisticated algorithm could attempt to not achieve best placement but merely achieve better placement in order to optimize contiguous regions.