EVMS Cluster Design Document
version 2.0
Ram Pai (linuxram@us.ibm.com)
Introduction:
EVMS provides a flexible and extensible framework for volume management.
With its pluggable architecture, it is easy to support any volume
manager configuration imaginable. With the right set of plug-in modules
EVMS can discover and create volumes that were originally created using
other volume management software on Linux or on other operating systems.
This flexibility and extensibility has been augmented in yet
another dimension with support for shared storage management
using cluster manager plug-ins. These plug-ins extend the EVMS
Engine's capabilities by enabling it to interact with any cluster manager
residing on the machine.
Cluster Feature Set:
A typical cluster environment contains a set of machines sharing
several disks on a SAN. EVMS allows the administrator to configure
a subset of the shared disks in one manageable unit called a cluster
container. Storage objects and volumes can be carved
out from the space available in a cluster container, and ownership policies
applied. These ownership policies
are also inherited by all the storage objects and volumes residing
in that container.
A cluster container can be owned exclusively
by one particular node of the cluster. Such a container is called a
private container. This means that volumes created from that container
can only be imported and activated by the owning cluster node. The
ownership of a private container can be changed to some other node in the
cluster by the administrator. This feature is very handy
in high-availability configurations when the owning node dies. One of the
surviving nodes can force its ownership on that container and make the volumes
available from its node.
Alternatively, a cluster container can be owned by all the nodes in
the cluster. Such containers are called shared cluster containers.
The volumes created from shared cluster containers can be imported
and simultaneously accessed by all the nodes of the cluster.
Distributed databases, Clustered-File-Systems and any applications that
can coordinate safe access to shared volumes find this feature very useful.
EVMS allows the administrator to configure the local, private and shared
storage of all cluster nodes from any node of the cluster. This
feature makes it convenient for the cluster administrator to have a single
point of access, instead of juggling around all the cluster nodes to configure
storage.
Overall Architecture of EVMS with Cluster support:
Please refer to the EVMS architecture
document to understand the architecture of the basic EVMS without
cluster support.
The Cluster Segment Manager and ECE
provide additional
capabilities for EVMS to support the cluster feature set.
The following figure details the component interaction as viewed on
a single node of a cluster. The engine is a shared library that links
with the EVMS user-interfaces. To create or modify the local configuration,
the engine does not interact with the daemon. It accomplishes its job as
on a non-clustered machine.
However to create or modify cluster storage objects, it coordinates
the task among all the EVMS daemons running on all nodes of the cluster
The figure below provides a detailed view of the evms architecture as running
on multiple cluster nodes.
The Cluster Segment Manager has the responsibility of managing cluster
containers. It provides the capability to group together a set of shared
disks into a cluster container, assign ownership and enforce
the ownership policies on the cluster container. It imports and
activates volumes residing on private cluster containers as well as
those residing on shared containers. In addition to this it
is also responsible for enforcing i/o fencing on the cluster containers
when a node looses quorum and resuming access when that node regains
quorum. In other words, the Cluster Segment Manager acts as a
gatekeeper to the shared disks, allowing i/o only when it is safe.
Cluster Segment Manager metadata format:
TO BE ADDED
The Cluster Manager Plug-in, also called the EVMS Cluster Engine (ECE),
acts as a conduit between the EVMS Engine and the Cluster Manager.
It transparently provides membership and messaging services to the
Engine through a standard set of published ECE APIs. In other
words, it acts as a gatekeeper for communication with other nodes of the
cluster. These interfaces are discussed in the
ECE API Guide.
EVMS Engine:
The EVMS Engine is the brain of the EVMS
volume manager. It's responsible for determining
which plug-in
to interact with in order to accomplish a task. Apart from the variety
of responsibilities that it handles, EVMS Engine also ensures proper
coordination with plug-ins on other nodes to ensure correct creation of
cluster containers and their corresponding objects.
EVMS Daemon:
The EVMS Daemon can be viewed as the EVMS engine
without the brain. It loads the relevant plug-ins and waits
for instructions from the Engine. In other words, it is a remote extension
of the Engine.
Cluster Manager:
Currently
EVMS supports the Linux-HA cluster manager. Support for IBM's RSCT
cluster manager will be available soon. EVMS depends on the following services
from the cluster manager.
Consensus membership and quorum:
Membership is the set of nodes that are actively
participating in the cluster from a given node's point of view. This membership
semantic leads to confusion, because different nodes in the cluster may
have different views of the membership. Consensus membership is
the membership agreed upon by all the members. Network failures
can lead to cluster partitioning. In a partitioned cluster, there can be
multiple partitions each having their own consensus membership view.
The cluster-manager ensures that only one partition has the quorum to
access shared resources and fences off all the other nodes from accessing
the shared resources. The cluster partition having quorum is also
called as the primary partition.
EVMS depends on the cluster manager for consensus membership and
quorum service. On receipt of a quorum-loss event, EVMS fences
off i/os on all volumes residing in shared containers and resumes on quorum
regain. The cluster manager may STONITH (power-off and reboot)
the nodes that have lost quorum. EVMS provides an additional
layer of safety, by blocking i/o in case the cluster manager does not support
fencing.
Reliable messaging service:
EVMS assumes point-to-point messaging service with the following
guarantee.
- message is not dropped
- message is not duplicated
- message is not corrupted
Resource Management service:
EVMS provides the mechanism for failing over private containers.
However, it depends on the resource management service to determine when to
failover the containers.
Linux-HA Cluster manager:
The Linux-HA cluster manager provides
all the above mentioned services. However, its resource management
service is limited to two node cluster. Also, it supports fencing
through STONITH on a 2-node cluster.
|