Enterprise Volume Management System

Documentation

Screen Shots

Downloads

Related Projects

Hosted By:

EVMS 2.0 Architecture Overview

Version 2.0.1
2003/05/01

Steve Dobbelstein
steved@us.ibm.com

1. Introduction

The Enterprise Volume Management System (EVMS) provides a framework for unifying all aspects of volume management. The architecture uses a plug-in model that allows for easy expansion and customization of various levels of volume management.

Visit the EVMS project hosted by SourceForge™ at http://evms.sourceforge.net/.

EVMS runs in user space. The EVMS Engine is a shared object that provides APIs for configuring the system. EVMS has several user interface programs that invoke Engine APIs.

The Engine provides a pluggable framework. The plug-ins do the actual work of discovering and configuring the particular volume management schemes that they handle. The Engine coordinates the commands received from the external APIs with commands to the plug-ins.

2. Data

2.1 Terminology

2.1.1 Why New Terms?

Different volume management implementations use different terms for their components. Sometimes a term used in one volume management scheme can mean something different in another volume management scheme. For example, consider the Multi-Disk (MD) driver, which implements RAID devices, and the Logical Volume Manager (LVM).

MD takes in disks or devices and exports volumes. A disk can be any block device, such as a physical disk, a partition, or a volume exported by MD.

LVM takes in physical volumes (PVs) and exports groups and from groups it exports logical volumes (LVs). A PV can be any block device, such as a physical disk, a partition, a volume exported by MD, or an LV.

As you can see, even between two volume management schemes there is an inconsistency with terms. Both MD and LVM take in block devices. MD calls them disks, LVM calls them physical volumes. In either case it doesn't have to be a disk, nor does it have to come straight from a physical device. It could be any block device, even a device exported by MD or by LVM. Both MD and LVM export volumes. In reality, they export block devices which could result in Linux volumes. An exported block device could be used as input to another volume manager, in which case it would not be a Linux volume.

Because of the different terms used to describe the components in different volume management schemes, we developed a set of terms specific to EVMS. The terms are intended to describe the various components of EVMS and not conflict with terms used by other volume management schemes.

2.1.2 The Terms

The following list defines general terms used with EVMS.

Sector: The lowest level of addressability on a block device. This definition is in keeping with the standard meaning found in other management systems.
Logical Disk (or Disk): An ordered set of physically contiguous sectors that represents a physical device.
Disk Segment (or Segment): An ordered set of physically contiguous sectors residing on a logical disk or other disk segment. For example, DOS partitions are considered segments in the EVMS architecture.
Storage Region (or Region): An ordered set of logically contiguous sectors that may or may not be physically contiguous. The underlying mapping can be to logical disks, disk segments, or other storage regions. For example, the "volumes" exported by MD and LVM are considered regions in the EVMS architecture.
Feature Object (or feature, or EVMS feature, or EVMS object): A logically contiguous address space created from one or more disks, segments, regions, or other feature objects through the use of an EVMS feature.
Storage Object: Any memory structure in EVMS that is capable of being a block device. Disks, segments, regions, and feature objects are all storage objects.
Storage Container (or Container): A collection of storage objects. Storage containers provide a re-mapping of the collection into a new set of storage objects that the storage container exports. For example, volume groups, such as in AIX®, and Linux LVM, are considered containers in the EVMS architecture.
Logical Volume (or Volume): A mountable storage object.
EVMS Logical Volume (or EVMS Volume): A mountable storage object that has EVMS metadata on it which gives the volume a user specified name.
Compatibility Logical Volume (or Compatibility Volume): A mountable storage object that does not contain any EVMS metadata. Many plug-ins in EVMS provide support for the capabilities of other volume management schemes. Volumes that are designated as compatibility are insured to be backwards compatible to that particular scheme because they do not contain any EVMS metadata.

2.2 Storage Objects

The descriptions of storage objects above hinted at a hierarchy between the different types. The different types of storage objects — disks, segments, regions, and EVMS objects — constitute different layers in the EVMS architecture. The objects in each layer can be comprised of objects in their own layer or any layer beneath them.

	Disks are at the first layer of the architecture.
	The next layer is for segments. Segments can be made from disks or from other segments.
	The third layer comprises regions. Regions can be made from disks, segments, or other regions.
	The fourth layer is for EVMS features. Features can be made from disks, segments, regions, or other features.

2.3 Containers

Containers are used to group together a set of storage objects. New storage objects can then be created from the group. Containers "consume" storage objects (the objects which the container comprises) and "produce" storage objects, as illustrated in the following diagram.

In this example, the container group1 consumes segments hda5, hdb5 and disk hdc and produces regions users, data, temp and backup.

Note that containers themselves are not mountable, that is, there is no block device for the container. They are merely an abstraction for a group of storage objects.

2.4 Volumes

Volumes can be made from any storage object — a disk, a segment, a region, or an EVMS feature — as shown in the following diagram.

EVMS maintains a distinction between storage objects and volumes. In other volume management schemes the creation of the block device also creates the volume (device node) so that the block device can be mounted. In EVMS when the user creates a storage object EVMS does not automatically create a volume for the storage object. The user must explicitly tell EVMS to make a storage object into a volume.

The reason for keeping volumes separate from storage objects is to prevent device nodes being made for intermediate nodes in a volume stack. For example, the user may RAID together a set of disks, make an LVM group from the RAID and then carve out volumes from the LVM group. If EVMS made a volume for every storage object, there would be a volume for the RAID. The user should not be able to mount the RAID since it is being used by LVM. Dangerous things can happen if the user is allowed to do I/O to the RAID without going through LVM.

2.5 Data example

The illustration below shows an example of the data structures built to represent a system configuration.

The system has four disks — hda, hdb, sda, and sdb.

Disk hda is partitioned into three segments — hda1, hda5, and hda6. Disk hdb is partitioned into two segments — hdb5 and hdb6. Disk sda is partitioned into two segments — sda5, and sda6. In each case, the segment manager for the disk produces the segments from the disk. Each of the segments has its corresponding disk listed as its child. Each of the disks has the segments produced from the disk listed as its parents. For example, disk sda has segments sda5 and sda6 in its parent list. Disk sdb is not partitioned.

Segments hda5 and hdb5 are combined to form the region md/md0 (perhaps a RAID1 mirror). md/md0 has segments hda5 and hdb5 in its child list. Segments hda5 and hdb5 each have md/md0 in their parent list. Similarly, segments hda6, hdb6, and sda5 are combined to form region md/md1 (perhaps a RAID5 array).

Region md/md1 is placed in container lvm/group1. Region md/md1 has container lvm/group1 listed as its consuming container. Container lvm/group1 has region md/md1 in its list of objects consumed.

Regions lvm/group1/reg1 and lvm/group1/reg2 are produced from container lvm/group1. Regions lvm/group1/reg1 and lvm/group1/reg2 each have container lvm/group1 listed as their producing container. Container lvm/group1 has regions lvm/group1/reg1 and lvm/group1/reg2 in its list of objects produced.

The LVM plug-in, which manages container lvm/group1 and regions lvm/group1/reg1 and lvm/group1/reg2, will set up the parent and child objects lists for regions lvm/group1/reg1, lvm/group1/reg2, and md/md1. Since both of the produced regions reside on the consumed region, the LVM plug-in will put region md/md1 into the child objects lists of regions lvm/group1/reg1 and lvm/group1/reg2. Similarly, the LVM plug-in puts regions lvm/group1/reg1 and lvm/group1/reg2 into the parent objects list of region md/md1.

Segment sda6 has the Bad Block Relocation feature (BBR) applied on it. Feature object BBR_sda6 has segment sda6 in its child object list. Segment sda6 has BBR_sda6 in its parent objects list. Similarly, disk sdb has the BBR feature applied. Feature object BBR_sdb has disk sdb in its child object list. Disk sdb has BBR_sdb in its parent objects list.

Feature objects BBR_sda6 and BBR_sdb are combined together by the Drive Linking feature to produce feature object drive_link_object. Feature object drive_link_object has feature objects BBR_sda6 and BBR_sdb in its child objects list. Feature objects BBR_sda6 and BBR_sdb each have the feature object drive_link_object in their parent objects list.

Segment hda1 is made into compatibility volume /dev/evms/hda1. Region md/md0 is made into compatibility volume /dev/evms/md/md0. Regions lvm/group1/reg1 and lvm/group1/reg2 are made into compatibility volumes /dev/evms/lvm/group1/reg1 and /dev/evms/lvm/group1/reg respectively. Feature object drive_link_object is made into EVMS volume /dev/evms/Data.

3. Code

3.1 Layers

The architecture for the code falls along the same lines as the data architecture. Plug-ins in each layer create objects from their own layer or layers below.

Logical Device Managers

The first layer is the logical device managers. Device managers do not consume objects from lower layers since there is no layer below them. Device managers examine the system and create disks, the first layer of storage objects. Currently, all local devices (most IDE and SCSI disks) are handled by a single plug-in, the Local Disk Manager.

Segment Managers

The second layer is the segment managers. In general, these plug-ins handle the segmenting, or partitioning, of disk drives. These Engine components can replace programs, such as fdisk and diskdruid. Segment managers can also be "stacked," meaning that one segment manager can take input from another segment manager. Segment managers consume disks or segments and produce segments.

Region Managers

The third layer is the region managers. This layer is intended to provide a place for plug-ins that ensure compatibility with existing volume management schemes in Linux or other operating systems. Region managers are intended to model systems that provide a logical abstraction above disks or segments.

As with the segment managers, region managers can also be stacked. Therefore, region manager consume disks, segments, or other regions and produce regions.

EVMS Features

The fourth layer is the EVMS features. EVMS features add functionality to objects in the lower layers. For example, the Drive Linking feature can take in objects from the lower layers, concatenate them, and export them as a single, large object. Like segment managers and region managers, EVMS features can also be stacked. Therefore, features can consume disks, segments, regions, or another feature objects and produce feature objects.

3.2 Interfaces

The Engine is a shared object with external APIs that user interfaces call. The Engine also has a interfaces for communicating with the plug-ins. The Engine converts the initiating call of the external APIs into the appropriate calls to the plug-ins.

3.2.1 External Interface

EVMS provides several user interface programs which communicate to the Engine through its external API. The external APIs are defined in include/appAPI.h. The data structures used with the external APIs are defined in include/appstructs.h.

3.2.2 Internal Interfaces

The Engine communicates with the plug-ins through several interfaces. All of the internal interfaces are defined in include/plugfuncs.h. Plug-ins manipulate internal data structures in the Engine. The data structures are defined in include/enginestructs.h.

3.2.2.1 Engine Services

The Engine provides a variety of services for the plug-ins to do their work — allocate and free storage objects, name registry services, device-mapper services, clustering services, logging and messaging services, and services to do I/O to storage objects and volumes, to name a few..

3.2.2.2 Storage Object Interface

The Engine communicates with device manager, segment manager, region manager, and EVMS feature plug-ins through the storage object interface. The storage object interface contains a variety of commands for querying and manipulating storage objects. For example, it includes such functions as discover(), create(), delete(), expand(), shrink(), and commit_changes().

3.2.2.3 Container Interface

A plug-in that manages storage objects may also implement containers, but it is not required. If a plug-in supports the container interface the Engine will call the plug-in's container functions in response to the user invoking container type external APIs, such as evms_create_container().

3.2.2.4 FSIM Interface

A File System Interface Module (or FSIM (pronounced "eff-sim")) plug-in provides an interface to the file system's utilities, such as mkfs, fsck, expanding the file system, and shrinking the file system. FSIMs only operate on volumes. The Engine communicates with the FSIMs in response to the user invoking volume type external APIs, such as evms_mkfs().

3.2.2.5 Cluster Manager Interface

Cluster manager plug-ins provide services for the Engine to work in a clustered environment, such as get_my_nodeid(), get_membership(), send_msg().

3.2.3 The Plug-ins

The following diagram shows the EVMS plug-ins and how they fit into the architecture.

The device manager, segment manager, region manager, and EVMS feature plug-ins appear in their respective layers of the architecture.

The FSIMs are placed on top, although they really don't work on storage objects. FSIMs work on volumes which are conceptually above storage objects.

The cluster manager plug-ins are listed on the side. They provide cluster support services; they don't work with storage objects or volumes. The Engine works with the cluster manager plug-ins to implement the support for a clustered environment.

The Replace plug-in is in a class by itself. When the Engine replaces an object, it copies the contents of the source object to the target object and then puts the target object in the place of the source object in the data structures. The Replace plug-in helps the Engine implement the replace function.

3.3 Code Flow

At a high level, a typical session with the Engine has four stages, and several of these stages involve interaction with plug-ins. The Engine flow includes the following stages:

Open the Engine
Manipulate objects
Commit the changes
Close the Engine

Phases 2 and 3 can be repeated several times within an Engine session.

3.3.1 Open the Engine

The Engine is opened by a call to its evms_open_engine() function. When the Engine is opened it does some basic initialization such as opening the log file, allocating data structures, and ensuring there is a device node for communicating with device-mapper in the kernel. Once that setup is complete, the Engine performs the following functions to get up and running:

Load plug-ins: The Engine tries to load all the files in $(prefix)/lib/evms. If a file loads successfully, the Engine verifies that the file is an EVMS user space plug-in. If the file is a user space plug-in, the Engine establishes communication with the plug-in.
Discovery: Once the plug-ins are loaded, the Engine begins the discovery process. Starting from the Device Manager layer and working up, the Engine calls the plug-ins at each layer asking them to discover any of their objects from the list of objects produced by the previous layer or by previous passes through the current layer. As the plug-ins discover their objects, the view of the system objects is built in memory. When discovery is finished, the Engine is ready to handle configuration changes and returns from the call to evms_open_engine().

3.3.2 Manipulation

Once the Engine is opened, it can be called on a variety of APIs to manipulate the system configuration. The Engine handles some of the APIs itself, but most of the APIs are translated into calls to the plug-ins that accomplish the work. Some of the functions that the Engine API provides are:

Build a new storage object from existing storage objects. For example, build a RAID0 MD region from four segments.
Create a volume from a storage object.
Put a file system on a volume.
Remove a storage object.
Build a container for a set of storage objects.
Expand a volume.
Shrink a volume.

All of the changes are performed on the in-memory copy of the objects. None of the changes are written to disk until the evms_commit_changes() API is called. Keeping the changes in memory allows the user to experiment with various configurations without affecting the system.

3.3.3 Commit changes

When the Engine's evms_commit_changes() API is called, the Engine coordinates with the plug-ins to get the necessary data written to disk for the new system configuration.

3.3.4 Close the Engine

The Engine is closed by a call to its evms_close_engine() function. In the process of closing, the Engine calls each of the plug-ins to do its cleanup for exiting. The Engine then frees up its own data structures and returns from the call to evms_close_engine().