LVM mirroring: the right way (amended)

LVM now supports mirroring inside of LVM, rather than requiring that you put mirrors underneath LVM physical volumnes. This provides much more flexibility, and some volumnes can be mirrored, some not (such as swap partitions), and different RAID algorithms can be used. LVM uses the same underlying mechanisms as Linux RAID system (mdadm) to do the RAID operations, so there is no change in overall performance.

Lucas and I learnt on the Hydra project that creating a mirror as follows:

lvconvert -m 1 –corelog /dev/nv0/time1root

or at lvcreate time: lvcreate -L 4G –name time1root -m 1 –corelog –nosync /dev/nv0

while it works, produces a mirror that keeps certain meta-info in memory only. Should the machine reboot in an uncontrolled way, the mirror will be marked as bad and rebuilt in order to validate the meta-data.

On a machine with with VMs running (nvxen-0, crtlXX) after a reboot it can take hours for the mirror to rebuild. The correct answer it turns out is to use –mirrorlog mirrored, and an option to put the mirror logs anywhere.

lvconvert -m 1 –mirrorlog mirrored –alloc anywhere /dev/nv0/time1root
The allocation policy of “anywhere” permits the two 4M mirror logs (4M is the minimum allocation that LVM can do) to be kept on the same disks as the data they are mirroring. Otherwise, if you have only two physical volumnes, you can not put the log anywhere and the default policy (which I think is wrong) is to insist that the mirrorlogs go on different volumnes than the data. (I don’t know why this necessary)

[updated 2014-09-08] NOTE that –alloc anywhere on lvconvert from a corelog to a mirrorlog will likely do it right, and put your meta data on two volumes. If you do this with lvcreate, you might wind up with two copies of your data on a single physical volume, and the meta data might wind up the same.

With many releases of LVM, but not apparently ones starting with SLES 11 SP3 (LVM version XXX), you can now do this without the –alloc anywhere, and it does what you would expect, putting a copy of the data, and a copy of the meta-data on each of two volumes without complaints.

Converting between is a pain: the only way I found to do this is to remove the mirroring and then re-create it.

ionice -c3 lvconvert  -m 0 /dev/nv0/time1root
ionice -c3 lvconvert  -m 1 --mirrorlog mirrored --alloc anywhere /dev/nv0/time1root

I wrote a script to process the output of lvs and do this. The ionice keeps the process in the background, not chewing up I/O.

On the fresh boot after the crash however, you may find your system is almost completely unresponsive as it tries to resync dozens of mirrors. On that, /dev/md0-style raid devices get it right. How to fix: find the kcopyd kernel processes and run ionice on them:

ps ax grep kcopyd awk ‘{print $1}’ while read pid; do sudo ionice -i3 -p$pid; done

once you have done this, you can then get in long enough to run the lvconvert. I suggest you remove all the mirrors first (-m 0) as that stops the resync operation from getting in the way of the resync you will have to anyway.