Hard disk partition pregap and grub2

A week or so back, I was at a customer to upgrade his server to wheezy. This server had originally been installed, not as squeeze but as something older, and was installed using an LVM-on-mdraid setup.

Booting off such a setup (without a /boot partition outside of the raid/LVM) is certainly possible with several boot loaders, including grub2; however, it was having issues in this particular instance, producing an error message along the likes of:

/sbin/grub2-setup: warn: Your embedding area is unusually small. core.img won't fit in it..

It took me a while to figure out what that meant, but the Internet to the rescue:

Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b3c3b

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   312580095   156289024   83  Linux

That partition starts at the 2048th block, resulting in a 1MiB gap before the first partition. The older default was to start on the 63rd block, for a pregap of just short of 32KiB.

Unfortunately, the only way to fix this issue without ugly hacks (such as the chainloading suggested in this fedora forum thread) is to repartition. Luckily, thanks to the magic of LVM, this isn't difficult (if a little involved). Steps:

pvmove -v /dev/md0
vgreduce -a
mdadm -S /dev/md0
(...repartition the hard disks that make up /dev/md0...)
mdadm -C /dev/md0 (... other options required to recreate /dev/md0 on the desired RAID level...)
pvcreate /dev/md0
vgextend /dev/vg /dev/md0

... and then rinse, repeat for md1. At that point, installing grub will work as usual.

An alternative (since we were using RAID1) could have been to remove one disk from the RAID array, repartition it, create a new RAID1 array in degraded mode, pvmove over to that, destroy the original array, and add the second disk to the new array. This would probably be a better idea when you only have one RAID array, for instance, or when you have more than one array and have more data than fits on a single array. However, that wasn't the case here (at least not after a resize2fs -M call followed by an appropriate lvresize one), and so I thought that not reducing redundancy, even if only temporarily, was a better move.

Recommendation
I was hoping to see a recommended size or requirement for using grub2.
Comment by Paul Tötterman Mon Aug 5 13:06:26 2013
Why so big?

Why wasn't 32 KiB enough anyway? Probably the core.img includes more modules than strictly necessary; I'd have looked to reduce its size. /boot being within LVM+RAID is probably a factor, as is perhaps the filesystem being used (e.g. the btrfs module is more than 2x bigger than ext2).

There may be a way to convert to a GPT disklabel in-situ, within the 63 sectors at the beginning of the drive (and needing a similar space at the end of the disk). If there is also space to create a small GPT BIOS Boot Partition anywhere, you could write the core.img there instead.

And finally, there's always the grub-legacy package...

Comment by Steven C. (steven@pyro.eu.org) Mon Aug 5 15:50:34 2013
Your alternative suggestion is how I have fixed this problem for me before. When migrating from a 2x 1T disk RAID1 to a 2x 2T disk RAID1 with AF partitions and larger sectors. Usually I would degrade the raid, add the new disk, then sync the raid. Couldn't do that due to the need to change to AF 4k sectors. Because I had LVM available I simply added the 2T disks, partitioned it for AF, created the RAID1, then used pvmove as you described to migrate the data. Then run grub-install onto the new disks. The machine was up and online through the migration. The only downtime was the reboots to add the new 2x 2T disks initially and again later to boot to the new disks in the new config afterward. Having LVM made something difficult into something mostly pretty easy.
Comment by Bob Proulx (bob@proulx.com) Mon Aug 5 21:44:27 2013
Alternative

I thought the default was to have the first partition start at the beginning of the first cylinder, thus leaving most of the zeroth cylinder as a pre-gap usable to store GRUB's core image.

Now, I prefer partitioning as GPT and defining a small BIOS Boot Partition to store GRUB's core image. Or an EFI System Partition when I boot in UEFI, of course.

Comment by Elessar (tanguy+grep.be@ortolo.eu) Wed Aug 7 10:53:44 2013