Enterprise Volume Management System
EVMS 2.0 Architecture Overview
The Enterprise Volume Management System (EVMS) provides a framework for unifying all aspects of volume management. The architecture uses a plug-in model that allows for easy expansion and customization of various levels of volume management.
Visit the EVMS project hosted by SourceForge™ at http://evms.sourceforge.net/.
EVMS runs in user space. The EVMS Engine is a shared object that provides APIs for configuring the system. EVMS has several user interface programs that invoke Engine APIs.
The Engine provides a pluggable framework. The plug-ins do the actual work of discovering and configuring the particular volume management schemes that they handle. The Engine coordinates the commands received from the external APIs with commands to the plug-ins.
2.1.1 Why New Terms?
Different volume management implementations use different terms for their components. Sometimes a term used in one volume management scheme can mean something different in another volume management scheme. For example, consider the Multi-Disk (MD) driver, which implements RAID devices, and the Logical Volume Manager (LVM).
MD takes in disks or devices and exports volumes. A disk can be any block device, such as a physical disk, a partition, or a volume exported by MD.
LVM takes in physical volumes (PVs) and exports groups and from groups it exports logical volumes (LVs). A PV can be any block device, such as a physical disk, a partition, a volume exported by MD, or an LV.
As you can see, even between two volume management schemes there is an inconsistency with terms. Both MD and LVM take in block devices. MD calls them disks, LVM calls them physical volumes. In either case it doesn't have to be a disk, nor does it have to come straight from a physical device. It could be any block device, even a device exported by MD or by LVM. Both MD and LVM export volumes. In reality, they export block devices which could result in Linux volumes. An exported block device could be used as input to another volume manager, in which case it would not be a Linux volume.
Because of the different terms used to describe the components in different volume management schemes, we developed a set of terms specific to EVMS. The terms are intended to describe the various components of EVMS and not conflict with terms used by other volume management schemes.
2.1.2 The Terms
The following list defines general terms used with EVMS.
2.2 Storage Objects
The descriptions of storage objects above hinted at a hierarchy between the different types. The different types of storage objects — disks, segments, regions, and EVMS objects — constitute different layers in the EVMS architecture. The objects in each layer can be comprised of objects in their own layer or any layer beneath them.
Containers are used to group together a set of storage objects. New storage objects can then be created from the group. Containers "consume" storage objects (the objects which the container comprises) and "produce" storage objects, as illustrated in the following diagram.
In this example, the container
Note that containers themselves are not mountable, that is, there is no block device for the container. They are merely an abstraction for a group of storage objects.
Volumes can be made from any storage object — a disk, a segment, a region, or an EVMS feature — as shown in the following diagram.
EVMS maintains a distinction between storage objects and volumes. In other volume management schemes the creation of the block device also creates the volume (device node) so that the block device can be mounted. In EVMS when the user creates a storage object EVMS does not automatically create a volume for the storage object. The user must explicitly tell EVMS to make a storage object into a volume.
The reason for keeping volumes separate from storage objects is to prevent device nodes being made for intermediate nodes in a volume stack. For example, the user may RAID together a set of disks, make an LVM group from the RAID and then carve out volumes from the LVM group. If EVMS made a volume for every storage object, there would be a volume for the RAID. The user should not be able to mount the RAID since it is being used by LVM. Dangerous things can happen if the user is allowed to do I/O to the RAID without going through LVM.
2.5 Data example
The illustration below shows an example of the data structures built to represent a system configuration.
The system has four disks — hda, hdb, sda, and sdb.
Disk hda is partitioned into three segments — hda1, hda5, and hda6. Disk hdb is partitioned into two segments — hdb5 and hdb6. Disk sda is partitioned into two segments — sda5, and sda6. In each case, the segment manager for the disk produces the segments from the disk. Each of the segments has its corresponding disk listed as its child. Each of the disks has the segments produced from the disk listed as its parents. For example, disk sda has segments sda5 and sda6 in its parent list. Disk sdb is not partitioned.
Segments hda5 and hdb5 are combined to form the region md/md0 (perhaps a RAID1 mirror). md/md0 has segments hda5 and hdb5 in its child list. Segments hda5 and hdb5 each have md/md0 in their parent list. Similarly, segments hda6, hdb6, and sda5 are combined to form region md/md1 (perhaps a RAID5 array).
Region md/md1 is placed in container lvm/group1. Region md/md1 has container lvm/group1 listed as its consuming container. Container lvm/group1 has region md/md1 in its list of objects consumed.
Regions lvm/group1/reg1 and lvm/group1/reg2 are produced from container lvm/group1. Regions lvm/group1/reg1 and lvm/group1/reg2 each have container lvm/group1 listed as their producing container. Container lvm/group1 has regions lvm/group1/reg1 and lvm/group1/reg2 in its list of objects produced.
The LVM plug-in, which manages container lvm/group1 and regions lvm/group1/reg1 and lvm/group1/reg2, will set up the parent and child objects lists for regions lvm/group1/reg1, lvm/group1/reg2, and md/md1. Since both of the produced regions reside on the consumed region, the LVM plug-in will put region md/md1 into the child objects lists of regions lvm/group1/reg1 and lvm/group1/reg2. Similarly, the LVM plug-in puts regions lvm/group1/reg1 and lvm/group1/reg2 into the parent objects list of region md/md1.
Segment sda6 has the Bad Block Relocation feature (BBR) applied on it. Feature object BBR_sda6 has segment sda6 in its child object list. Segment sda6 has BBR_sda6 in its parent objects list. Similarly, disk sdb has the BBR feature applied. Feature object BBR_sdb has disk sdb in its child object list. Disk sdb has BBR_sdb in its parent objects list.
Feature objects BBR_sda6 and BBR_sdb are combined together by the Drive Linking feature to produce feature object drive_link_object. Feature object drive_link_object has feature objects BBR_sda6 and BBR_sdb in its child objects list. Feature objects BBR_sda6 and BBR_sdb each have the feature object drive_link_object in their parent objects list.
Segment hda1 is made into compatibility volume /dev/evms/hda1. Region md/md0 is made into compatibility volume /dev/evms/md/md0. Regions lvm/group1/reg1 and lvm/group1/reg2 are made into compatibility volumes /dev/evms/lvm/group1/reg1 and /dev/evms/lvm/group1/reg respectively. Feature object drive_link_object is made into EVMS volume /dev/evms/Data.
The architecture for the code falls along the same lines as the data architecture. Plug-ins in each layer create objects from their own layer or layers below.
The Engine is a shared object with external APIs that user interfaces call. The Engine also has a interfaces for communicating with the plug-ins. The Engine converts the initiating call of the external APIs into the appropriate calls to the plug-ins.
3.2.1 External Interface
EVMS provides several user interface programs which communicate to the Engine through its external API.
The external APIs are defined in
3.2.2 Internal Interfaces
The Engine communicates with the plug-ins through several interfaces.
All of the internal interfaces are defined in
220.127.116.11 Engine Services
The Engine provides a variety of services for the plug-ins to do their work — allocate and free storage objects, name registry services, device-mapper services, clustering services, logging and messaging services, and services to do I/O to storage objects and volumes, to name a few..
18.104.22.168 Storage Object Interface
The Engine communicates with device manager, segment manager, region manager, and EVMS feature plug-ins through the storage object interface.
The storage object interface contains a variety of commands for querying and manipulating storage objects.
For example, it includes such functions as
22.214.171.124 Container Interface
A plug-in that manages storage objects may also implement containers, but it is not required.
If a plug-in supports the container interface the Engine will call the plug-in's container functions in response to the user invoking container type external APIs, such as
126.96.36.199 FSIM Interface
A File System Interface Module (or FSIM (pronounced "eff-sim")) plug-in provides an interface to the file system's utilities, such as mkfs, fsck, expanding the file system, and shrinking the file system.
FSIMs only operate on volumes.
The Engine communicates with the FSIMs in response to the user invoking volume type external APIs, such as
188.8.131.52 Cluster Manager Interface
Cluster manager plug-ins provide services for the Engine to work in a clustered environment, such as
3.2.3 The Plug-ins
The following diagram shows the EVMS plug-ins and how they fit into the architecture.
The device manager, segment manager, region manager, and EVMS feature plug-ins appear in their respective layers of the architecture.
The FSIMs are placed on top, although they really don't work on storage objects. FSIMs work on volumes which are conceptually above storage objects.
The cluster manager plug-ins are listed on the side. They provide cluster support services; they don't work with storage objects or volumes. The Engine works with the cluster manager plug-ins to implement the support for a clustered environment.
The Replace plug-in is in a class by itself. When the Engine replaces an object, it copies the contents of the source object to the target object and then puts the target object in the place of the source object in the data structures. The Replace plug-in helps the Engine implement the replace function.
3.3 Code Flow
At a high level, a typical session with the Engine has four stages, and several of these stages involve interaction with plug-ins. The Engine flow includes the following stages:
3.3.1 Open the Engine
The Engine is opened by a call to its evms_open_engine() function. When the Engine is opened it does some basic initialization such as opening the log file, allocating data structures, and ensuring there is a device node for communicating with device-mapper in the kernel. Once that setup is complete, the Engine performs the following functions to get up and running:
Once the Engine is opened, it can be called on a variety of APIs to manipulate the system configuration. The Engine handles some of the APIs itself, but most of the APIs are translated into calls to the plug-ins that accomplish the work. Some of the functions that the Engine API provides are:
All of the changes are performed on the in-memory copy of the objects. None of the changes are written to disk until the evms_commit_changes() API is called. Keeping the changes in memory allows the user to experiment with various configurations without affecting the system.
3.3.3 Commit changes
When the Engine's evms_commit_changes() API is called, the Engine coordinates with the plug-ins to get the necessary data written to disk for the new system configuration.
3.3.4 Close the Engine
The Engine is closed by a call to its evms_close_engine() function. In the process of closing, the Engine calls each of the plug-ins to do its cleanup for exiting. The Engine then frees up its own data structures and returns from the call to evms_close_engine().