LVM (Logical Volume Manager) Tutorial
LVM is the Logical Volume Manager provided by the Linux kernel. Its main purpose is to allow storage devices to be aggregated and subdivided. This is done by:
- formatting each storage device as an LVM ‘physical volume’,
- aggregating the physical volumes to form one or more storage pools called ‘volume groups’, then
- creating virtual block devices called ‘logical volumes’ within those volume groups.
Other capabilities include striping (equivalent to RAID-0), mirroring (equivalent to RAID-1) and snapshotting.
This tutorial refers to version 2 of the Logical Volume Manager (commonly referred to as LVM2). The original LVM had a broadly similar architecture, but lacked some of the features described here.
A physical volume is a block device that has been formatted for use by the logical volume manager. This is done using the
pvcreate command, for example:
Physical volumes are usually detected automatically (see below). A list of known physical volumes can be obtained using either the
pvs command (for a summary) or the
pvdisplay command (for a detailed description of each).
A physical volume need not be a physical device per se. For example, it is not uncommon for software RAID devices to be made into physical volumes so that they can be divided into a number of smaller block devices.
Devices used as physical volumes do not generally need a partition table, however it is permissible to have one if you wish. For example, given a device named
/dev/sda, the physical volume could be created on either
/dev/sda1. If there is a partition table you should take care to ensure that the offset to the start of the LVM physical volume does not cause a misalignment that affects performance. The same applies other types of intervening logical device such as RAID volumes.
Physical volumes can be expanded while in use using the
pvresize command. This can be useful if the underlying device is resizable (as is often the case when, for example, LVM is being run on a virtual machine). At the time of writing (as of version 2.02.98) reduction in size is supported only where this can be achieved without moving already allocated data. Data can be moved from one physical volume to another (within the same volume group) using the
|See:||Replace one of the physical volumes in an LVM volume group|
A volume group is a pool of storage that is provided by one or more physical volumes. Its purpose is to act as a source of storage capacity for use by logical volumes. Each volume group has a name, which must be unique within the context of the machine to which it is attached.
Volume groups are created using the
vgcreate command, for example:
vgcreate foo /dev/sda /dev/sdb /dev/sdc
Further physical volumes can be added later if required using the
vgextend command. The reverse is also possible, using the
vgreduce command, provided that any data located on the physical volumes in question has been either deleted or moved elsewhere beforehand.
Volume groups must be activated before they can be used, but this usually happens automatically (see below). A list of available volume groups can be obtained using either the
vgs command (for a summary) or the
vgdisplay command (for a detailed description of each).
|See:||Increase the capacity of an LVM volume group|
A logical volume is a virtual storage device composed from storage capacity provided by a volume group. It is presented as a block device, and can be used for purposes such as holding a filesystem or swap area. Each logical volume has a name, which must be unique within the volume group of which it is a member.
Logical volumes are created using the
lvcreate command, for example:
lvcreate --name bar --size 4G /dev/foo
One of the main advantages of logical volumes over disc partitions is that they can be resized very easily. This is done using the
lvresize command, or alternatively using the
lvreduce commands. Note that if the volume contains existing data such as a filesystem then this must be resized separately:
- When reducing, the filesystem must be reduced first otherwise you are likely to suffer partial or complete data loss.
- When expanding, the filesystem must be expanded afterwards in order to make the extra capacity usable.
The same applies to any other type of data on the volume that you wish to preserve. Note that some filesystems can be resized while mounted, some only when umounted, and some not at all.
A list of available logical volumes can be obtained using either the
lvs command (for a summary) or the
lvdisplay command (for a detailed description of each).
Each logical volume is presented as a block device with a pathname of the form
foo is the volume group and
bar is the logical volume. This can then be formatted and mounted in the same way as any other block device. (It will also be accessible through second pathname, of the form
/dev/mapper/foo-bar, but this is merely a side-effect of how LVM uses the Device Mapper to provide some of its functionality.)
|See:||Create a logical volume using LVM|
|Increase the size of an LVM logical volume|
|Reduce the size of an LVM logical volume|
A snapshot is a copy of a logical volume with the following characteristics:
- It is writable (unless you explicitly configure it to be read-only).
- Changes to the original do not affect the snapshot, nor vice versa.
- Physical copying occurs only when and where the snapshot and the original diverge; otherwise, their content is shared.
- The snapshot derives from the original as it was at a single moment in time.
Snapshots are created using the
-s option of the
lvcreate command, for example:
lvcreate -s --name qux --size 4G /dev/foo/bar
(This would create a snapshot named
qux of the logical volume
bar in the volume group
foo. Snapshots must be located in the same volume group as the original to which they refer.)
Common uses of snapshots include:
- creating a frozen copy of a filesystem so that a consistent backup can be made;
- making a disposable copy of a virtual machine image.
The technique of sharing unmodified content is known as ‘copy-on-write’. This avoids the need for any immediate bulk copying when creating a snapshot. Some copying of metadata is necessary, but this is normally several orders of magnitude smaller than the data itself. This often makes snapshotting practicable in circumstances where a full copy would take too long.
Perhaps more important than the elapsed time is the fact that snapshot creation is logically instantaneous. This removes the risk of the original volume changing part way through the process, which could otherwise result in an inconsistent copy being made (part relating to one moment in time and part to another).
Whether the content is in a consistent state to begin with depends on what it is being used for. Most filesystems try to maintain the disc image in a state that is at least recoverable if it is not cleanly unmounted. LVM will also attempt to suspend locally-mounted filesystems before snapshotting them in order to obtain a clean copy, but it does not have the ability to do this in all circumstances where it would be desirable.
The storage capacity allocated to a snapshot can be smaller than the original volume from which it is derived. This does not cause the snapshot to be truncated; rather, it limits the extent to which the snapshot and the original can diverge. If this limit is exceeded then the snapshot will become permanently inoperable. For snapshots which need to exist for a short time only (for example, those used for making backups) an allocation of 1 to 5% of the full volume size is often sufficient provided that you are willing to accept a finite risk of failure. Otherwise, setting the capacity equal to the size of the original volume ensures that the snapshot cannot run out of space.
LVM manages storage in units called ‘extents’. The default extent size is 4MB, but a different value can be chosen when a volume group is created (and it is often useful to do so).
The extent size must be a power of two. Once chosen, it is difficult to change without recreating the relevant volume group from scratch. For this reason it is worth giving some thought to the choice of extent size when creating a new volume group.
The author’s recommendation would be to aim for an extent size that divides the volume group into a few thousand extents. Much larger than this and you risk losing significant amounts of storage capacity to rounding; much smaller and the extra bookkeeping costs are likely to outweigh any benefit from increased granularity.
Striping refers to the practice of distributing the content of a logical volume over two or more physical volumes so that sufficiently large read and write operations are evenly spread over the corresponding physical devices.
This has the advantage that, with the right pattern of usage, throughput to and from the logical volume can be higher than any individual device would be able to provide.The main drawback is that the content of the logical volume is much less likely to be recoverable if one of the physical devices were to fail.
The amount of contiguous data that is written before switching to a different device is called the ‘stride length’. The optimum stride length is a balance between two competing considerations:
- If strides are too long then medium-size read and write operations are less likely to span multiple devices, and thereby benefit from increased bandwidth.
- If strides are too short then then small read and write operations (which are more likely to be limited by latency than bandwidth) may be delayed by having to wait for a greater number of devices to become ready.
It is a particularly bad idea to use a stripe length that is smaller than the block size of the underlying physical volumes, as this can result in read-modify-write operations that would not otherwise be necessary. If you decide to use striping, the author’s recommendation would be to use a stripe length of between 1 and 4 times the underlying physical block length. Be aware that the block length a device presents to the operating system is not always equal to the length that it uses internally.
LVM detects existing physical volumes by scanning for block devices that contain a volume label. On modern systems this usually happens automatically at boot time and when new devices becomes available. You can request a rescan using the
pvscan command, but this should rarely be necessary.
When a new physical volume is detected, the list of available volume groups is updated automatically. You can request a rescan using the
vgscan command, but as with
pvscan you should rarely need to do this.
Volume groups must be activated for the logical volumes within them to become accessible. This process is analogous to mounting a filesystem, except that the default behaviour is to activate all volume groups automatically. If required you can manually activate or deactivate a volume group using the
vgchange -a y /dev/foo vgchange -a n /dev/foo
(There is also an
lvscan command, but it is really the discovery and activation of volume groups which governs the availability of logical volumes.)
Some devices should not be scanned for physical volumes. For example, if a physical volume were created on a software RAID device then the volume label would be visible both on that device and on one or more of the underlying devices that make up the array. This can be prevented by blacklisting any devices that should not be scanned using the
filter configuration option. Alternatively, in the particular case of the MD subsystem, the
md_component_detection option will cause any component devices to be skipped automatically.
A standard installation of LVM is not cluster-aware, but there is an extension called CLVM (the Clustered Logical Volume Manager) which provides this capability. With CLVM it is possible to safely use a volume group located on a shared storage device from several machines at the same time. However, since there is significant administrative complexity in running any form of cluster, normal practice is to leave CLVM disabled unless you have a specific need for it.
Note that CLVM does not by itself make it safe to use individual logical volumes from multiple machines. For that you also need a cluster-aware filesystem such as GFS or OCFS2.
It is safe to activate a volume group on multiple machines without using CLVM provided that you refrain from making any changes to the metadata while it is multiply-activated. Successful use of this technique requires some care because you effectively need to do the work of CLVM manually (and with nothing to protect you from the consequences of any mistakes).