Enterprise Volume Management System

Home

Project Page

Mailing Lists

IRC Channel

Documentation

Screen Shots

Downloads

Related Projects

Hosted By:
SourceForge Logo

EVMS Cluster Design Document

version 2.0
Ram Pai (linuxram@us.ibm.com)

Introduction:

EVMS provides a flexible and extensible framework for volume management. With its pluggable architecture, it is easy to support any volume manager configuration imaginable. With the right set of plug-in modules EVMS can discover and create volumes that were originally created using other volume management software on Linux or on other operating systems. This flexibility and extensibility has been augmented in yet another dimension with support for shared storage management using cluster manager plug-ins. These plug-ins extend the EVMS Engine's capabilities by enabling it to interact with any cluster manager residing on the machine.

Cluster Feature Set:

A typical cluster environment contains a set of machines sharing several disks on a SAN. EVMS allows the administrator to configure a subset of the shared disks in one manageable unit called a cluster container. Storage objects and volumes can be carved out from the space available in a cluster container, and ownership policies applied. These ownership policies are also inherited by all the storage objects and volumes residing in that container.

A cluster container can be owned exclusively by one particular node of the cluster. Such a container is called a private container. This means that volumes created from that container can only be imported and activated by the owning cluster node. The ownership of a private container can be changed to some other node in the cluster by the administrator. This feature is very handy in high-availability configurations when the owning node dies. One of the surviving nodes can force its ownership on that container and make the volumes available from its node.

Alternatively, a cluster container can be owned by all the nodes in the cluster. Such containers are called shared cluster containers. The volumes created from shared cluster containers can be imported and simultaneously accessed by all the nodes of the cluster. Distributed databases, Clustered-File-Systems and any applications that can coordinate safe access to shared volumes find this feature very useful.

EVMS allows the administrator to configure the local, private and shared storage of all cluster nodes from any node of the cluster. This feature makes it convenient for the cluster administrator to have a single point of access, instead of juggling around all the cluster nodes to configure storage.

Overall Architecture of EVMS with Cluster support:

Please refer to the EVMS architecture document to understand the architecture of the basic EVMS without cluster support.

The Cluster Segment Manager and ECE provide additional capabilities for EVMS to support the cluster feature set.

The following figure details the component interaction as viewed on a single node of a cluster. The engine is a shared library that links with the EVMS user-interfaces. To create or modify the local configuration, the engine does not interact with the daemon. It accomplishes its job as on a non-clustered machine.

However to create or modify cluster storage objects, it coordinates the task among all the EVMS daemons running on all nodes of the cluster

The figure below provides a detailed view of the evms architecture as running on multiple cluster nodes.

Cluster Segment Manager plug-in:

The Cluster Segment Manager has the responsibility of managing cluster containers. It provides the capability to group together a set of shared disks into a cluster container, assign ownership and enforce the ownership policies on the cluster container. It imports and activates volumes residing on private cluster containers as well as those residing on shared containers. In addition to this it is also responsible for enforcing i/o fencing on the cluster containers when a node looses quorum and resuming access when that node regains quorum. In other words, the Cluster Segment Manager acts as a gatekeeper to the shared disks, allowing i/o only when it is safe.

Cluster Segment Manager metadata format:

TO BE ADDED

Cluster Manager Plug-in (ECE):

The Cluster Manager Plug-in, also called the EVMS Cluster Engine (ECE), acts as a conduit between the EVMS Engine and the Cluster Manager. It transparently provides membership and messaging services to the Engine through a standard set of published ECE APIs. In other words, it acts as a gatekeeper for communication with other nodes of the cluster. These interfaces are discussed in the ECE API Guide.

EVMS Engine:

The EVMS Engine is the brain of the EVMS volume manager. It's responsible for determining which plug-in to interact with in order to accomplish a task. Apart from the variety of responsibilities that it handles, EVMS Engine also ensures proper coordination with plug-ins on other nodes to ensure correct creation of cluster containers and their corresponding objects.

EVMS Daemon:

The EVMS Daemon can be viewed as the EVMS engine without the brain. It loads the relevant plug-ins and waits for instructions from the Engine. In other words, it is a remote extension of the Engine.

Cluster Manager:

Currently EVMS supports the Linux-HA cluster manager. Support for IBM's RSCT cluster manager will be available soon. EVMS depends on the following services from the cluster manager.

  • Consensus membership and quorum:

    Membership is the set of nodes that are actively participating in the cluster from a given node's point of view. This membership semantic leads to confusion, because different nodes in the cluster may have different views of the membership. Consensus membership is the membership agreed upon by all the members. Network failures can lead to cluster partitioning. In a partitioned cluster, there can be multiple partitions each having their own consensus membership view. The cluster-manager ensures that only one partition has the quorum to access shared resources and fences off all the other nodes from accessing the shared resources. The cluster partition having quorum is also called as the primary partition.

    EVMS depends on the cluster manager for consensus membership and quorum service. On receipt of a quorum-loss event, EVMS fences off i/os on all volumes residing in shared containers and resumes on quorum regain. The cluster manager may STONITH (power-off and reboot) the nodes that have lost quorum. EVMS provides an additional layer of safety, by blocking i/o in case the cluster manager does not support fencing.

  • Reliable messaging service:

    EVMS assumes point-to-point messaging service with the following guarantee.

    • message is not dropped
    • message is not duplicated
    • message is not corrupted
  • Resource Management service:

    EVMS provides the mechanism for failing over private containers. However, it depends on the resource management service to determine when to failover the containers.

Linux-HA Cluster manager:

The Linux-HA cluster manager provides all the above mentioned services. However, its resource management service is limited to two node cluster. Also, it supports fencing through STONITH on a 2-node cluster.