A VMScluster, originally known as a VAXcluster, is a
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
involving a group of computers running the
OpenVMS
OpenVMS, often referred to as just VMS, is a multi-user, multiprocessing and virtual memory-based operating system. It is designed to support time-sharing, batch processing, transaction processing and workstation applications. Customers using Op ...
operating system. Whereas
tightly coupled multiprocessor systems run a single copy of the
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
, a VMScluster is
loosely coupled: each machine runs its own copy of OpenVMS, but the disk storage,
lock manager, and security domain are all cluster-wide, providing a
single system image abstraction. Machines can join or leave a VMScluster without affecting the rest of the cluster. For enhanced availability, VMSclusters support the use of dual-ported disks connected to two machines or storage controllers simultaneously.
Initial release
Digital Equipment Corporation
Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
(DEC) first announced VAXclusters in May 1983. At that stage, clustering required specialised communications hardware, as well as some major changes to low-level subsystems in the VMS operating system. The software and hardware were designed jointly. VAXcluster support was first added in VAX/VMS V4.0, which was released in 1984. This version only supported clustering over DEC's proprietary ''Computer Interconnect'' (CI).
At the center of each cluster was a
star coupler, to which every ''node'' (computer) and
data storage device
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted form ...
in the cluster was connected by one or two pairs of ''CI cables''. Each pair of cables had a transmission rate of 70 megabits per second, a high speed for that era. Using two pairs gave an aggregate transmission rate of 140 megabits per second, with redundancy in case one cable failed; the star couplers also had redundant wiring for better availability.
Each CI cable connected to its computer via a ''CI Port'', which could send and receive
packets without any CPU involvement. To send a packet, a CPU had only to create a small data structure in memory and append it to a "send" queue; similarly, the CI Port would append each incoming message to a "receive" queue. Tests showed that a VAX-11/780 could send and receive 3000 messages per second, even though it was nominally a 1-
MIPS machine. The closely related
Mass Storage Control Protocol (MSCP) allowed similarly high performance from the mass storage subsystem. In addition, MSCP packets were easily transported over the CI allowing remote access to storage devices.
VAXclustering was an early clustering system to achieve commercial success (along with AT&T,
Tandem Computers
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for Automated teller machine, ATM networks, banks, stock exchanges, telephone switching centers, 911 systems, and other similar commercial transaction proc ...
, and
Stratus Computers), and was a major selling point for VAX systems.
Later developments
DEC's
MicroVAX
The MicroVAX is a discontinued family of low-cost minicomputers developed and manufactured by Digital Equipment Corporation (DEC). The first model, the MicroVAX I, shipped in 1984. The series uses processors that implement the VAX instruction se ...
minicomputer was incapable of VAXclustering at first.
In 1986, DEC added VAXclustering support to it, running over
Ethernet
Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
instead of special-purpose hardware. While not giving the
high-availability advantages of the CI hardware, these ''Local Area VAXclusters'' (LAVc) provided an attractive expansion path for buyers of low-end minicomputers. LAVc also allowed
diskless ''satellite nodes'' to bootstrap over the network using the system disk of a ''bootnode''.
Later versions of OpenVMS (V5.0 and later) supported "mixed interconnect" VAXclusters (using both CI and Ethernet), and VAXclustering over DSSI (
Digital Systems and Storage Interconnect),
SCSI
Small Computer System Interface (SCSI, ) is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced ...
and
FDDI
Fiber Distributed Data Interface (FDDI) is a standard for data transmission in a local area network.
It uses optical fiber as its standard underlying physical medium.
It was also later specified to use copper cable, in which case it may be c ...
, among other transports. Eventually, as high-bandwidth wide area networking became available, clustering was extended to allow satellite data links and long-distance terrestrial links. This allowed the creation of ''disaster-tolerant clusters''; by locating the single VAXcluster in several diverse geographical areas, the cluster could survive infrastructure failures and natural disasters.
VAXclustering was greatly aided by the introduction of
terminal servers using the
LAT protocol. By allowing ordinary serial terminals to access the host nodes via Ethernet, it became possible for any terminal to rapidly and easily connect to any host node. This made it much simpler to accomplish
fail over
Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer ...
of the user terminals from one node of the cluster to another.
Support for clustering over TCP/IP was added in OpenVMS version 8.4, which was released in 2010. With
Gigabit Ethernet
In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use in ...
now common and
10 Gigabit Ethernet being introduced, standard networking cables and cards are quite sufficient to support VMSclustering.
Features
OpenVMS supports up to 96 nodes in a single cluster, and allows mixed-architecture clusters, where VAX and Alpha systems, or Alpha and Itanium systems can co-exist in a single cluster (Various organizations have demonstrated triple-architecture clusters and cluster configurations with up to 150 nodes, but these configurations are not officially supported).
Unlike many other clustering solutions, VMScluster offers transparent and fully distributed read-write with record-level locking, which means that the same disk and even the same file can be accessed by several cluster nodes at once; the locking occurs only at the level of a single record of a file, which would usually be one line of text or a single record in a database. This allows the construction of high-availability multiply redundant database servers.
Cluster connections can span upwards of , allowing member nodes to be located in different buildings on an office campus, or in different cities.
Host-based volume shadowing allows volumes (of the same or of different sizes) to be shadowed (mirrored) across multiple controllers and multiple hosts, allowing the construction of disaster-tolerant environments.
Full access into the
distributed lock manager (DLM) is available to application programmers, and this allows applications to coordinate arbitrary resources and activities across all cluster nodes. This includes file-level coordination, but the resources and activities and operations that can be coordinated with the DLM are completely arbitrary.
With the supported capability of rolling upgrades and multiple system disks, cluster configurations can be maintained on-line and upgraded incrementally. This allows cluster configurations to continue to provide application and data access while a subset of the member nodes are upgraded to newer software versions.
Cluster uptimes are frequently measured in years with the current longest uptime being at least sixteen years.
Uptimes Project breakdown for VMSclusters
References
Further reading
*Nancy P. Kronenberg, Henry M. Levy, William D. Strecker
"VAXcluster: a {{sic, hide=y, closely, -coupled distributed system"
''ACM Transactions on Computer Systems'' 4 (2), 1986
This issue was devoted to VAXclusters and FDDI networking. (Archived as PDF files.)
Cluster computing
OpenVMS
Hewlett-Packard