Tandem Computers Inc.
   HOME

TheInfoList



OR:

Tandem Computers, Inc. was the dominant manufacturer of
fault-tolerant computer system Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission critical, mission-critical, or even life-critical sys ...
s for ATM networks,
bank A bank is a financial institution that accepts Deposit account, deposits from the public and creates a demand deposit while simultaneously making loans. Lending activities can be directly performed by the bank or indirectly through capital m ...
s,
stock exchange A stock exchange, securities exchange, or bourse is an exchange where stockbrokers and traders can buy and sell securities, such as shares of stock, bonds and other financial instruments. Stock exchanges may also provide facilities for ...
s, telephone switching centers, 911 systems, and other similar commercial
transaction processing In computer science, transaction processing is information processing that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially c ...
applications requiring maximum uptime and no data loss. The company was founded by Jimmy Treybig in 1974 in
Cupertino, California Cupertino ( ) is a city in Santa Clara County, California, United States, directly west of San Jose, California, San Jose on the western edge of the Santa Clara Valley with portions extending into the foothills of the Santa Cruz Mountains. The ...
. It remained independent until 1997, when it became a server division within
Compaq Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
. It is now a server division within
Hewlett Packard Enterprise The Hewlett Packard Enterprise Company (HPE) is an American multinational information technology company based in Spring, Texas. It is a business-focused organization which works in servers, storage, networking, containerization software and ...
, following
Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
's acquisition of Compaq and the split of Hewlett-Packard into
HP Inc. HP Inc. is an American multinational information technology company with its headquarters in Palo Alto, California, that develops personal computers (PCs), printers and related supplies, as well as 3D printing services. It is the world's s ...
and Hewlett Packard Enterprise. Tandem's NonStop systems use a number of independent identical processors, redundant storage devices, and redundant controllers to provide automatic high-speed "
failover Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer ...
" in the case of a hardware or software failure. To contain the scope of failures and of corrupted data, these multi-computer systems have no shared central components, not even main memory. Conventional multi-computer systems all use shared memories and work directly on shared data objects. Instead, NonStop processors cooperate by exchanging messages across a reliable fabric, and software takes periodic snapshots for possible rollback of program memory state. Besides masking failures, this " shared-nothing" messaging system design also scales to the largest commercial workloads. Each doubling of the total number of processors doubles system throughput, up to the maximum configuration of 4000 processors. In contrast, the performance of conventional multiprocessor systems is limited by the speed of some shared memory, bus, or switch. Adding more than 4–8 processors in that manner gives no further system speedup. NonStop systems have more often been bought to meet scaling requirements than for extreme fault tolerance. They compete against IBM's largest mainframes, despite being built from simpler minicomputer technology.


Founding

Tandem Computers was founded in 1974 by James Treybig. Treybig first saw the market need for fault tolerance in OLTP (online transaction processing) systems while running a marketing team for
Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
's
HP 3000 The HP 3000 series is a family of 16-bit computing, 16-bit and 32-bit computing, 32-bit minicomputers from Hewlett-Packard. It was designed to be the first minicomputer with full support for time-sharing in the hardware and the operating system, ...
computer division, but HP was not interested in developing for this niche. He then joined the venture capital firm
Kleiner Perkins Kleiner Perkins, formerly Kleiner Perkins Caufield & Byers (KPCB), is an American venture capital firm which specializes in investing in incubation, early stage and growth companies. Since its founding in 1972, the firm has backed entrepreneur ...
and developed the Tandem business plan there. Treybig pulled together a core engineering team hired away from the
HP 3000 The HP 3000 series is a family of 16-bit computing, 16-bit and 32-bit computing, 32-bit minicomputers from Hewlett-Packard. It was designed to be the first minicomputer with full support for time-sharing in the hardware and the operating system, ...
division: Mike Green, Jim Katzman, Dave Mackie and Jack Loustaunou. Their business plan called for ultra-reliable systems that never had outages and never lost or corrupted data. These were modular in a new way that was safe from all " single-point failures" yet would be only marginally more expensive than conventional non-fault-tolerant systems. They would be less expensive and support more throughput than some existing ad-hoc toughened systems that used redundant but usually required "hot spares". Each engineer was confident they could quickly pull off their own part of this complex new design but doubted that others' areas could be worked out. The parts of the hardware and software design that did not have to be different were largely based on incremental improvements to the familiar hardware and software designs of the HP 3000. Many subsequent engineers and programmers also came from HP. Tandem headquarters in Cupertino, California, were a quarter mile away from the HP offices. Initial venture capital investment in Tandem Computers came from Tom Perkins, who was formerly a general manager of the HP 3000 division. The business plan included detailed ideas for building a unique corporate culture reflecting Treybig's values. The design of the initial ''Tandem/16'' hardware was completed in 1975, and the first system shipped to
Citibank Citibank, N.A. ("N. A." stands for "National bank (United States), National Association"; stylized as citibank) is the primary U.S. banking subsidiary of Citigroup, a financial services multinational corporation, multinational corporation. Ci ...
in May 1976. The company enjoyed uninterrupted exponential growth through 1983. '' Inc.'' magazine ranked Tandem as the fastest-growing public company in America. By 1996, Tandem was a $2.3 billion company employing approximately 8,000 people worldwide.


Tandem NonStop (TNS) stack machines

Over 40 years, Tandem's main NonStop product line grew and evolved in an upward-compatible way from the initial T/16 fault-tolerant system, with three major changes to its top-level modular architecture or its programming-level instruction set architecture. Within each series, there have been several major re-implementations as chip technology progressed. While conventional systems of the era, including large
mainframes A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise ...
, had mean-time-between-failures (MTBF) on the order of a few days, the NonStop system was designed to failure intervals 100 times longer, with
uptime Uptime is a Measurement, measure of system reliability, expressed as the period of system time, time a machine, typically a computer, has been continuously working and available. Uptime is the opposite of downtime. It is often used as a measure ...
s measured in years. Nevertheless, the NonStop was designed to be price-competitive with conventional systems, with a simple 2-CPU system priced at just over twice that of a competing single-processor mainframe, as opposed to four or more times of other fault-tolerant solutions.


NonStop I

The first system was the Tandem/16 or T/16, later re-branded NonStop I. The machine consisted of between two and 16 CPUs, organized as a fault-tolerant
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
packaged in a single rack. Each CPU had its own private, unshared memory, its own I/O processor, its own private I/O bus to connect to I/O controllers, and dual connections to all the other CPUs over a custom inter-CPU backplane
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a motor vehicle that carries significantly more passengers than an average car or van, but fewer than the average rail transport. It is most commonly used ...
called Dynabus. Each disk controller or network controller was duplicated and had dual connections to both CPUs and devices. Each disk was mirrored, with separate connections to two independent disk controllers. If a disk failed, its data was still available from its mirrored copy. If a CPU, controller or bus failed, the disk was still reachable through alternative CPU, controller, and/or bus. Each disk or network controller was connected to two independent CPUs. Power supplies were each wired to only one side of a pair of CPUs, controllers, or buses, so that the system would keep running without loss of connections if one power supply failed. The careful complex arrangement of parts and connections in customers' larger configurations were documented in a Mackie diagram, named after lead salesman David Mackie, who invented the notation. None of these duplicated parts were wasted "hot spares"; everything added to system throughput during normal operations. Besides recovering well from failed parts, the T/16 was also designed to detect as many kinds of intermittent failures as possible, as soon as possible. This prompt detection is called "fail fast". The point was to find and isolate corrupted data before it was permanently written into databases and other disk files. In the T/16, error detection was by added custom circuits that added little cost to the total design; no major parts were duplicated to get error detection. The T/16 CPU was a proprietary design. It was greatly influenced by the HP 3000 minicomputer. They were both microprogrammed,
16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
, stack-based machines with segmented,
16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
virtual addressing. Both were intended to be programmed exclusively in high-level languages, with no use of assembler. Both were initially implemented via standard low-density TTL chips, each holding a 4-bit slice of the 16-bit ALU. Both had a small number of top-of-stack, 16-bit data registers plus some extra address registers for accessing the memory stack. Both used
Huffman encoding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
of operand address offsets, to fit a large variety of address modes and offset sizes into the 16-bit instruction format with good code density. Both relied heavily on pools of indirect addresses to overcome the short instruction format. Both supported larger 32- and 64-bit operands via multiple ALU cycles, and memory-to-memory string operations. Both used "big-endian" addressing of long versus short memory operands. These features had all been inspired by Burroughs B5500–B6800 mainframe stack machines. The T/16 instruction set changed several features from the HP 3000 design. The T/16 supported paged virtual memory from the beginning. The HP 3000 series did not add paging until the PA-RISC generation, 10 years later (although via MPE V it had a form of paging using the APL firmware, in 1978). Tandem added support for 32-bit addressing in its second machine; HP 3000 lacked this until its PA-RISC generation. Paging and long addresses were critical for supporting complex system software and large applications. The T/16 treated its top-of-stack registers in a novel way; the compiler, not the microcode, was responsible for deciding when full registers were spilled to the memory stack and when empty registers were re-filled from the memory stack. On the HP 3000, this decision took extra microcode cycles in every instruction. The HP 3000 supported
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
with several instructions for calculating directly on arbitrary-length BCD (binary-coded decimal) strings of digits. The T/16 simplified this to single instructions for converting between BCD strings and 64-bit binary integers. In the T/16, each CPU consisted of two boards of TTL logic and SRAMs, and ran at about 0.7 MIPS. At any instant, it could access only four virtual memory segments (System Data, System Code, User Data, User Code), each limited to 128 KB in size. The 16-bit address spaces were already small for major applications when it shipped. The first release of T/16 had only a single programming language,
Transaction Application Language Transaction Application Language or TAL (originally "Tandem Application Language") is a block-structured, procedural language optimized for use on Tandem (and later HP NonStop) hardware. TAL resembles a cross between C and Pascal. It was the ori ...
(TAL). This was an efficient machine-dependent systems programming language (for operating systems, compilers, etc.) but could also be used for non-portable applications. It was derived from HP 3000's System Programming Language (SPL). Both had semantics similar to C but a syntax based on Burroughs'
ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
. Subsequent releases added support for Cobol74,
Basic Basic or BASIC may refer to: Science and technology * BASIC, a computer programming language * Basic (chemistry), having the properties of a base * Basic access authentication, in HTTP Entertainment * Basic (film), ''Basic'' (film), a 2003 film ...
, Fortran,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, C, C++, and
MUMPS MUMPS ("Massachusetts General Hospital Utility Multi-Programming System"), or M, is an imperative, high-level programming language with an integrated transaction processing key–value database. It was originally developed at Massachusetts Gen ...
. The Tandem NonStop series ran a custom
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
which was significantly different from Unix or HP 3000's MPE. It was initially called T/TOS (Tandem Transactional Operating System) but soon named Guardian for its ability to protect all data from machine faults and software faults. In contrast to all other commercial operating systems, Guardian was based on message passing as the basic way for all processes to interact, without shared memory, regardless of where the processes were running. This approach easily scaled to multiple-computer clusters and helped isolate corrupted data before it propagated. All file system processes and all transactional application processes were structured as master/slave pairs of processes running in separate CPUs. The slave process periodically took snapshots of the master's memory state and took over the workload if and when the master process ran into trouble. This allowed the application to survive failures in any CPU or its associated devices, without data loss. It further allowed recovery from some intermittent-style software failures. Between failures, the monitoring by the slave process added some performance overhead but this was far less than the 100% duplication in other system designs. Some major early applications were directly coded in this checkpoint style, but most instead used various Tandem software layers which hid the details of this in a semi-portable way.


NonStop II

In 1981, all T/16 CPUs were replaced by the NonStop II. Its main difference from the T/16 was support for occasional 32-bit addressing via a user-switchable "extended data segment". This supported the next ten years of growth in software and was an advantage over the T/16 or HP 3000. Visible registers remained 16-bit, and this unplanned addition to the instruction set required executing many instructions per memory reference compared to most 32-bit minicomputers. All subsequent TNS computers were hampered by this instruction set inefficiency. As the NonStop II lacked wider internal data paths, it had to use additional microcode steps for 32-bit addresses. A NonStop II CPU had three boards, using chips and design similar to the T/16. The NonStop II also replaced core memory with battery-backed DRAM memory.


NonStop TXP

In 1983, the NonStop TXP CPU was the first entirely new implementation of the TNS instruction set architecture. It was built from standard TTL chips and Programmed Array Logic chips, with four boards per CPU module. It had Tandem's first use of cache memory. It had a more direct implementation of 32-bit addressing, but still sent them through 16-bit adders. A wider microcode store allowed a major reduction in the cycles executed per instruction; speed increased to 2.0 MIPS. It used the same rack packaging, controllers, backplane, and buses as before. The Dynabus and I/O buses had been overdesigned in the T/16 so they would work for several generations of upgrades.


FOX

Up to 14 TXP and NonStop II systems could now be combined via FOX, a long-distance fault-tolerant
fibre optic An optical fiber, or optical fibre, is a flexible glass or plastic fiber that can transmit light from one end to the other. Such fibers find wide usage in fiber-optic communications, where they permit transmission over longer distances and at ...
bus for connecting TNS clusters across a business campus; a cluster of clusters with a total of 224 CPUs. This allowed further scale-up for taking on the largest mainframe applications. Like the CPU modules within the computers, the Guardian operating system could failover entire task sets to other machines in the network. Worldwide clusters of 4000 CPUs could also be built via conventional long-haul network links.


NonStop VLX

In 1986, Tandem introduced a third generation CPU, the NonStop VLX. It had 32-bit data paths, wider microcode, 12 MHz cycle time, and a peak rate of one instruction per cycle. It was built from three boards of ECL gate array chips (with TTL pinout). It had a revised Dynabus with speed raised to 20 MB/s per link, 40 MB/s total. Later, FOX II increased the physical diameter of TNS clusters to 4 kilometers. Tandem's initial database support was only for hierarchical, non-relational databases via the
ENSCRIBE Enscribe is the native hierarchical database in the commercial HP NonStop (Tandem) servers. It is designed for fault tolerance and scalability and is currently offered by Hewlett Packard Enterprise. The product was originally developed by Ta ...
file system. This was extended into a relational database called ENCOMPASS. In 1986 Tandem introduced the first fault-tolerant
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
database,
NonStop SQL NonStop SQL is a commercial relational database management system that is designed for fault tolerance and scalability, currently offered by Hewlett Packard Enterprise. The latest version is SQL/MX 3.4. The product was originally developed by Tan ...
. Developed totally in-house, NonStop SQL includes a number of features based on Guardian to ensure data validity across nodes. NonStop SQL is known for scaling linearly in
performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...
with the number of nodes added to the system, whereas most databases had performance that plateaued quite quickly, often after just two CPUs. A later version released in 1989 added transactions that could be spread over nodes, a feature that remained unique for some time. NonStop SQL continued to evolve, first as NonStop SQL/MP and then NonStop SQL/MX, which transitioned from Tandem to Compaq to HP. The code remains in use in both HP's NonStop SQL/MP, NonStop SQL/MX and the Apache Trafodion project.


NonStop CLX

In 1987, Tandem introduced the NonStop CLX, a low-cost less-expandable minicomputer system. Its role was for growing the low end of the fault-tolerant market, and for deploying on the remote edges of large Tandem networks. Its initial performance was roughly similar to the TXP; later versions improved to where they were about 20% slower than a VLX. Its small cabinet could be installed into any "copier room" office environment. A CLX CPU was one board, containing six "compiled silicon"
ASIC An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
CMOS Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss ", , ) is a type of MOSFET, metal–oxide–semiconductor field-effect transistor (MOSFET) semiconductor device fabrication, fabrication process that uses complementary an ...
chips. The CPU core chip was duplicated and lock stepped for maximal error detection. This added no additional fault tolerance but assured data integrity as each CPU included checking logic that made certain that the results of both CPU chips were identical. Other processors would provide fault tolerance. Pinout was a main limitation of this chip technology. Microcode, cache, and TLB were all external to the CPU core and shared a single bus and single bank of SRAM. As a result, CLX required at least two machine cycles per instruction.


NonStop Cyclone

In 1989, Tandem introduced the NonStop Cyclone, a fast but expensive system for the mainframe end of the market. Each self-checking CPU took three boards full of hot-running ECL gate array chips, plus memory boards. Despite being microprogrammed, the CPU was
superscalar A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
, often completing two instructions per cache cycle. This was accomplished by having a separate microcode routine for every common pair of instructions. That fused pair of stack instructions generally accomplished the same work as a single instruction of normal 32-bit minicomputers. Cyclone processors were packaged as sections of four CPUs each, and the sections joined by a fiber optic version of Dynabus. Like Tandem's prior high-end machines, Cyclone cabinets were styled with much angular black to suggest strength and power. Advertising videos directly compared Cyclone to the
Lockheed SR-71 Blackbird The Lockheed SR-71 "Blackbird" is a retired long-range, high-altitude, Mach 3+ strategic reconnaissance aircraft developed and manufactured by the American aerospace company Lockheed Corporation. Its nicknames include " Blackbird" and ...
Mach 3 spy plane. Cyclone's name was supposed to represent its "unstoppable speed in roaring through OLTP workloads". Announcement day was October 17, 1989. That afternoon, the region was struck by the magnitude 6.9
Loma Prieta earthquake On October 17, 1989, at 5:04 p.m. PST, the Loma Prieta earthquake occurred at the Central Coast of California. The shock was centered in The Forest of Nisene Marks State Park in Santa Cruz County, approximately 10 mi (16 km) ...
, causing freeway collapses in
Oakland Oakland is a city in the East Bay region of the San Francisco Bay Area in the U.S. state of California. It is the county seat and most populous city in Alameda County, with a population of 440,646 in 2020. A major West Coast port, Oakland is ...
and major fires in
San Francisco San Francisco, officially the City and County of San Francisco, is a commercial, Financial District, San Francisco, financial, and Culture of San Francisco, cultural center of Northern California. With a population of 827,526 residents as of ...
. Tandem offices were shaken, but no one was badly hurt on site.


Other product lines


Rainbow

In 1980–1983, Tandem attempted to re-design its entire hardware and software stack to put its NonStop methods on a stronger foundation than its inherited HP 3000 traits. Rainbow's hardware was a 32-bit register-file machine that aimed to be better than a
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
VAX VAX (an acronym for virtual address extension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The V ...
. For reliable programming, the main programming language was "TPL", a subset of Ada. At that time, programmers barely understood how to compile Ada to unoptimized code. There was no migration path for existing NonStop system software coded in TAL. The OS, database and Cobol compilers were entirely redesigned. Customers would see it as a totally disjoint product line requiring all-new software from them. The software side of this project took much longer than planned. The hardware was already obsolete and outperformed by TXP before its software was ready, resulting in the Rainbow project being abandoned. All subsequent efforts emphasized upward compatibility and easy migration paths. Development of Rainbow's advanced client/server application development framework called "Crystal" continued awhile longer and was spun off as the "Ellipse" product of Cooperative Systems Incorporated.


Dynamite PC

In 1985, Tandem attempted to grab a piece of the rapidly growing
personal computer A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...
market with its introduction of the
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
based Dynamite PC/workstation. Numerous design compromises (including a unique 8086-based hardware platform incompatible with expansion cards of the day and extremely limited compatibility with
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
-based PCs) relegated the Dynamite to serving primarily as a smart terminal. It was quietly and quickly withdrawn from the market. The company in 1986 introduced the 6AT, an
IBM PC AT The IBM Personal Computer AT (model 5170, abbreviated as IBM AT or PC/AT) was released in 1984 as the fourth model in the IBM Personal Computer line, following the IBM PC/XT and its IBM Portable PC variant. It was designed around the Intel 802 ...
-compatible computer. Tandem only sold the 6AT to existing customers; "we are not going to go out and innovate", it said.


Integrity

Tandem's message-based NonStop operating system had advantages for scaling, extreme reliability, and efficiently using expensive "spare" resources. But many potential customers wanted just good-enough reliability in a small system, using a familiar Unix operating system and industry-standard programs. Tandem's various fault-tolerant competitors all adopted a simpler hardware-only memory-centric design where all recovery was done by switching between hot spares. The most successful competitor was
Stratus Technologies Stratus Technologies, Inc. is a major producer of fault tolerant computer servers and software. The company was founded in 1980 as Stratus Computer, Inc. in Natick, Massachusetts, and adopted its present name in 1999. The current CEO and presi ...
, whose machines were re-marketed by IBM as "IBM System/88". In such systems, the spare processors do not contribute to system throughput between failures, but merely redundantly execute exactly the same data thread as the active processor at the same instant, in "lock step". Faults are detected by seeing when the cloned processors' outputs diverged. To detect failures, the system must have two physical processors for each logical, active processor. To also implement automatic failover recovery, the system must have three or four physical processors for each logical processor. The triple or quadruple cost of this sparing is practical when the duplicated parts are commodity single-chip microprocessors. Tandem's products for this market began with the Integrity line in 1989, using MIPS processors and a "NonStop UX" variant of Unix. It was developed in Austin, Texas. In 1991, the Integrity S2 used TMR, Triple Modular Redundancy, where each logical CPU used three MIPS R2000 microprocessors to execute the same data thread, with voting to find and lock out a failed part. Their fast clocks could not be synchronized as in strict lock stepping, so voting instead happened at each interrupt. Some other versions of Integrity used 4x "pair and spares" redundancy. Pairs of processors ran in lock-step to check each other. When they disagreed, both processors were marked untrusted, and their workload was taken over by a hot-spare pair of processors whose state was already current. In 1995, the Integrity S4000 was the first to use ServerNet (a networked "bus" structure) and moved toward sharing peripherals with the NonStop line.


Wolfpack

In 1995–1997, Tandem partnered with Microsoft to implement high-availability features and advanced SQL configurations in clusters of commodity
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
Windows NT Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
machines. This project was codenamed "Wolfpack" and first shipped as
Microsoft Cluster Server Microsoft Cluster Server (MSCS) is a computer program that allows server computers to work together as a computer cluster, to provide failover and increased availability of applications, or parallel calculating power in case of high-performanc ...
in 1997. Microsoft benefited greatly from this partnership; Tandem did not.


TNS/R NonStop migration to MIPS

When Tandem was formed in 1974, every computer company designed and built its CPUs from basic circuits, using its own proprietary instruction set, compilers, etc. With each year of semiconductor progress with Moore's Law, more of a CPU's core circuits could fit into single chips and run faster and cheaper as a result. However, it became increasingly expensive for a computer company to design those advanced custom chips or build the plants to fabricate the chips. Facing the challenges of this changing marketplace and manufacturing landscape, Tandem partnered with MIPS and adopted its
R3000 The R3000 is a 32-bit RISC microprocessor chipset developed by MIPS Computer Systems that implemented the MIPS I instruction set architecture (ISA). Introduced in June 1988, it was the second MIPS implementation, succeeding the R2000 microprocesso ...
and successor chipsets and their advanced optimizing compiler. Subsequent NonStop Guardian machines using the
MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies ...
were known to programmers as TNS/R machines and had a variety of marketing names.


Cyclone/R

In 1991, Tandem released the Cyclone/R, also known as CLX/R. This was a low-cost mid-range system based on CLX components but used R3000 microprocessors instead of the much slower CLX stack machine board. To minimize time to market, this machine was initially shipped without any MIPS native-mode software. Everything, including its NonStop Kernel (NSK) operating system (a follow-on to Guardian) and NonStop SQL database, was compiled to TNS stack machine code. That object code was then translated to equivalent partially optimized MIPS instruction sequences at kernel install time by a tool called the Accelerator. Less-important programs could also be executed directly without pre-translation, via a TNS code
interpreter Interpreting is translation from a spoken or signed language into another language, usually in real time to facilitate live communication. It is distinguished from the translation of a written text, which can be more deliberative and make use o ...
. These migration techniques were successful and remain in use today. End-user software was brought over without extra work, the performance was good enough for mid-range machines, and programmers could ignore the instruction differences, even when debugging at machine code level. These Cyclone/R machines were updated with a faster native-mode NSK operating system in a follow-up release. The R3000 and later microprocessors had only a typical amount of internal error checking, insufficient for Tandem's needs. So, the Cyclone/R ran pairs of R3000 processors in lock step, running the same data thread. This was for purposes of data integrity, and not fault-tolerance – fault tolerance was handled by the other mechanisms still in place. It used a variation of lock stepping. The checker processor ran 1 cycle behind the primary processor. This allowed them to share a single copy of external code and data caches without putting excessive pinout load on the system bus and lowering the system clock rate. To successfully run microprocessors in lock step, the chips must be designed to be fully deterministic. Any hidden internal state must be cleared by the chip's reset mechanism. Otherwise, the matched chips can go out of sync for no visible reason and without any faults, long after the chips are restarted. Chip designers agree that these are good principles because it helps them test chips at manufacturing time. But all new microprocessor chips seemed to have bugs in this area and required months of shared work between MIPS (the third-party manufacturer used by Tandem) and Tandem to eliminate or work around the final subtle bugs.


NonStop Himalaya K-series

In 1993, Tandem released the NonStop Himalaya K-series with the faster MIPS
R4400 The R4000 is a microprocessor developed by MIPS Technologies, MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the f ...
, a native mode NSK operating system, and fully expandable Cyclone system components. These were connected by Dynabus, Dynabus+, and the original I/O bus, which by now were all running out of performance headroom.


Open System Services

In 1995, the NonStop Kernel was extended with a Unix-like
POSIX The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
environment called Open System Services. The original Guardian shell and ABI remained available.


NonStop Himalaya S-Series

In 1997, Tandem introduced the NonStop Himalaya S-Series with a new top-level system architecture based on ServerNet connections. ServerNet replaced the Dynabus, FOX, and I/O buses. It was much faster, more general, and could be extended to more than just two-way redundancy via an arbitrary fabric of point-to-point connections. Tandem designed ServerNet for its own needs but then promoted its use by others; it evolved into the
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
industry standard. All S-Series machines used MIPS processors, including the R4400,
R10000 The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Ro ...
, R12000, and R14000. The design of the later, faster MIPS cores was primarily funded by
Silicon Graphics Inc Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
. But Intel's sixth generation
Pentium Pro The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It implements the P6 (microarchitecture), P6 microarchitecture (sometimes termed i686), and was the first x86 Intel C ...
overtook the performance of
RISC In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...
designs, and also SGI's graphics business shrank. After the R10000, there was no investment in significant new MIPS core designs for high-end servers. So Tandem needed to move its NonStop product line to another microprocessor architecture with competitive fast chips.


Acquisition by Compaq, attempted migration to Alpha

Jimmy Treybig remained CEO of the company he founded until a downturn in 1996. The next CEO was Roel Pieper, who joined the company in 1996 as president and CEO. Re-branding to promote itself as a true
Wintel Wintel (portmanteau of ''Windows'' and ''Intel'') is the partnership of Microsoft and Intel producing personal computers (PCs) using Intel x86-compatible processors running Windows. Background By the early 1980s, the chaos and incompatibility ...
(Windows/Intel) platform was conducted by their in-house brand and creative team led by Ronald May, who later went on to co-found the Silicon Valley Brand Forum in 1999. The concept worked, and shortly thereafter the company was acquired by Compaq. Compaq's x86-based server division was an early outside adopter of Tandem's ServerNet/InfiniBand interconnect technology. In 1997, Compaq acquired the Tandem Computers company and NonStop customer base to balance Compaq's heavy focus on personal computers (PCs). In 1998, Compaq also acquired the much larger
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
and inherited its
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers ( ...
RISC servers with
OpenVMS OpenVMS, often referred to as just VMS, is a multi-user, multiprocessing and virtual memory-based operating system. It is designed to support time-sharing, batch processing, transaction processing and workstation applications. Customers using Op ...
and
Tru64 Unix Tru64 UNIX is a discontinued 64-bit UNIX operating system for the DEC Alpha, Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corp ...
customer bases. Tandem was then midway in porting its NonStop product line from MIPS R12000 microprocessors to Intel's new
Itanium Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
Merced microprocessors. This project was restarted with Alpha as the new target to align NonStop with Compaq's other large server lines. But in 2001, Compaq terminated all Alpha engineering investments in favor of the Itanium microprocessors, before any new NonStop products were released on Alpha.


Acquisition by Hewlett-Packard, TNS/E migration to Itanium

In 2001, Hewlett-Packard similarly made the choice to abdicate its successful
PA-RISC Precision Architecture reduced instruction set computer, RISC (PA-RISC) or Hewlett Packard Precision Architecture (HP/PA or simply HPPA), is a computer, general purpose computer instruction set architecture (ISA) developed by Hewlett-Packard f ...
product lines in favor of Intel's Itanium microprocessors that HP helped to design. Shortly thereafter, Compaq and HP announced their plan to merge and consolidate their similar product lines. This contentious merger became official in May 2002. The consolidations were painful and destroyed the DEC and "HP Way" engineer-oriented cultures, but the combined company did know how to sell complex systems to enterprises and profit, so it was an improvement for the surviving NonStop division and its customers. In some ways, Tandem's journey from HP-inspired start-up to an HP-inspired competitor, then to an HP division was "bringing Tandem back to its original roots", but this was not the same HP. The porting of the NSK-based NonStop product line from MIPS processors to Itanium-based processors was completed and was branded as "HP Integrity NonStop Servers". (This NSK Integrity NonStop was unrelated to Tandem's original "Integrity" series for Unix.) Because it was not possible to run Itanium McKinley chips with clock-level lock stepping, the Integrity NonStop machines instead lock stepped using comparisons between chip states at longer time scales, at interrupt points and at various software synchronization points in between interrupts. The intermediate synchronization points were automatically triggered at every n'th taken branch instruction and were also explicitly inserted into long loop bodies by all NonStop compilers. The machine design supported both dual and triple redundancy, with either two or three physical microprocessors per logical Itanium processor. The triple version was sold to customers needing the utmost reliability. This new checking approach was called NSAA, NonStop Advanced Architecture. As in the earlier migration from stack machines to MIPS microprocessors, all customer software was carried forward without source changes. "Native mode" source code compiled directly to MIPS machine code was simply recompiled for Itanium. Some older "non-native" software was still in TNS stack machine form. These were automatically ported onto Itanium via object code translation techniques.


Itanium migration to Intel X86

The next endeavor was to move from Itanium to the Intel x86 architecture. It was completed in 2014 with the first systems being made commercially available. The inclusion of the fault-tolerant 4X FDR (Fourteen Data Rate) InfiniBand double-wide switches provided more than 25 times increase in system interconnect capacity.


Outlook, other

NSK Guardian also became the base for the HP Neoview OS, the operating system used in the HP Neoview systems that were tailored for use in Business Intelligence and Enterprise Data Warehouse use. NonStop SQL/MX was also the starting point for Neoview SQL, which was tailored to Business Intelligence use. The code was also ported to Linux and served as the basis for the Apache Trafodion project.


Corporate culture

Treybig's business plan included detailed ideas for building a corporate culture reflecting Treybig's values, such as paid six week sabbaticals every four years for all employees, an annual gift of 100 shares of Tandem stock to all employees, a weekly all-employee party known as Beer Bust Fridays, and a world-wide closed circuit monthly telecast ("First Friday") to keep employees informed.


User groups

* ITUG ( International Tandem User Group) now part of Connect (users' group) * OzTUG The Australia and New Zealand Tandem Users Group here
OzTUG at LinkedIn

BITUG (British Isles NonStop (Tandem) User Group)



See also

*
Jim Gray (computer scientist) James Nicholas Gray (1944 – declared dead in absentia 2012) was an American computer scientist who received the Turing Award in 1998 "for seminal contributions to database and transaction processing research and technical leadership in system ...
* Thomas Perkins, longtime chairman of the board *
List of compilers This page is intended to list all current compilers, compiler generators, Interpreter (computing), interpreters, translators, tool foundations, Assembler (computing), assemblers, automatable command line interfaces (Shell (computing), shells), et ...
for a partial list of compilers, including Tandem compilers * NonStop * TACL (Tandem Advanced Command Language) *
Stratus Technologies Stratus Technologies, Inc. is a major producer of fault tolerant computer servers and software. The company was founded in 1980 as Stratus Computer, Inc. in Natick, Massachusetts, and adopted its present name in 1999. The current CEO and presi ...


References


External links


NonStop Computing Home
nbsp;– main Nonstop Computing page at Hewlett Packard Enterprise
NonStop for Dummies
nbsp;– short booklet introducing the NonStop computing platform, 2014 *   – webpage at Hewlett Packard with a number of Tandem white papers *  – a magazine of transaction processing, PDFs 1983–1994
Tandem Computers Unplugged
nbsp;– book focusing on the company history, 2014 {{Compaq Fault-tolerant computer systems Defunct computer hardware companies Defunct computer systems companies Defunct computer companies based in California Minicomputers Transaction processing Technology companies based in the San Francisco Bay Area Companies based in Cupertino, California American companies established in 1974 Computer companies established in 1974 Computer companies disestablished in 1997 1974 establishments in California 1997 disestablishments in California Defunct companies based in the San Francisco Bay Area Private equity portfolio companies Compaq acquisitions 1997 mergers and acquisitions Defunct computer companies of the United States