
Quasi-opportunistic supercomputing is a computational paradigm for
supercomputing
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instruction ...
on a large number of geographically
disperse computers.
Quasi-opportunistic supercomputing aims to provide a higher quality of service than
opportunistic resource sharing.
The quasi-opportunistic approach coordinates computers which are often under different ownerships to achieve reliable and
fault-tolerant
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
high performance with more control than opportunistic
computer grids in which computational resources are used whenever they may become available.
[''Quasi-opportunistic supercomputing in grids'' by Valentin Kravtsov, David Carmeli, Werner Dubitzky, Ariel Orda, ]Assaf Schuster
Assaf Schuster is an Israeli entrepreneur and professor of computer science whose works have been published in such journals as ''Computer Aided Verification'' and '' Journal of Systems and Software''.
Biography
Schuster was born in 1958 in the ...
, Benny Yoshpa, in IEEE International Symposium on High Performance Distributed Computing, 2007, pages 233-24
/ref>
While the "opportunistic match-making" approach to task scheduling
In computing, scheduling is the action of assigning ''resources'' to perform ''tasks''. The ''resources'' may be processors, network links or expansion cards. The ''tasks'' may be threads, processes or data flows.
The scheduling activity is car ...
on computer grids is simpler in that it merely matches tasks to whatever resources may be available at a given time, demanding supercomputer applications such as weather simulation
Numerical weather prediction (NWP) uses mathematical models of the atmosphere and oceans to predict the weather based on current weather conditions. Though first attempted in the 1920s, it was not until the advent of computer simulation in th ...
s or computational fluid dynamics
Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate t ...
have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.[''Computational Science - Iccs 2009: 9th International Conference'' edited by ]Gabrielle Allen
Gabrielle D. Allen is a British and American computational astrophysicist known for her work in astrophysical simulations and multi-messenger astronomy, and as one of the original developers of the Cactus Framework for parallel scientific computa ...
, Jarek Nabrzyski 2009 pages 387-38
/ref>
The quasi-opportunistic approach enables the execution of demanding applications within computer grids by establishing grid-wise resource allocation agreements; and Fault-tolerant system, fault tolerant message passing to abstractly shield against the failures of the underlying resources, thus maintaining some opportunism, while allowing a higher level of control.
Opportunistic supercomputing on grids
The general principle of grid computing
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
is to use distributed computing resources from diverse administrative domains to solve a single task, by using resources as they become available. Traditionally, most grid systems have approached the task scheduling
In computing, scheduling is the action of assigning ''resources'' to perform ''tasks''. The ''resources'' may be processors, network links or expansion cards. The ''tasks'' may be threads, processes or data flows.
The scheduling activity is car ...
challenge by using an "opportunistic match-making" approach in which tasks are matched to whatever resources may be available at a given time.[''Grid computing: experiment management, tool integration, and scientific workflows'' by Radu Prodan, Thomas Fahringer 2007 pages 1-4]
BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
, developed at the University of California, Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
is an example of a volunteer-based, opportunistic grid computing system.[''Parallel and Distributed Computational Intelligence'' by Francisco Fernández de Vega 2010 pages 65-68] The applications based on the BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
grid have reached multi-petaflop levels by using close to half a million computers connected on the internet, whenever volunteer resources become available. Another system, Folding@home
Folding@home (FAH or F@h) is a volunteer computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...
, which is not based on BOINC, computes protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reprodu ...
, has reached 8.8 petaflops by using clients that include GPU
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mob ...
and PlayStation 3
The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
systems. However, these results are not applicable to the TOP500
The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinc ...
ratings because they do not run the general purpose Linpack benchmark.
A key strategy for grid computing is the use of middleware
Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue".
Middleware makes it easier for software developers to implement c ...
that partitions pieces of a program among the different computers on the network.[''Languages and Compilers for Parallel Computing'' by Guang R. Gao 2010 pages 10-11] Although general grid computing
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
has had success in parallel task execution, demanding supercomputer applications such as weather simulations or computational fluid dynamics
Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate t ...
have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.
The opportunisti
Internet PrimeNet Server
supports GIMPS
The Great Internet Mersenne Prime Search (GIMPS) is a collaborative project of volunteers who use freely available software to search for Mersenne prime numbers.
GIMPS was founded in 1996 by George Woltman, who also wrote the Prime95 client and ...
, one of the earliest grid computing projects since 1997, researching Mersenne prime
In mathematics, a Mersenne prime is a prime number that is one less than a power of two. That is, it is a prime number of the form for some integer . They are named after Marin Mersenne, a French Minim friar, who studied them in the early 17 ...
numbers. , GIMPS's distributed research currently achieves about 60 teraflops as an volunteer-based computing project. The use of computing resources on " volunteer grids" such as GIMPS is usually purely opportunistic: geographically disperse distributively owned computers are contributing whenever they become available, with no preset commitments that any resources will be available at any given time. Hence, hypothetically, if many of the volunteers unwittingly decide to switch their computers off on a certain day, grid resources will become significantly reduced.[''Euro-par 2010, Parallel Processing Workshop'' edited by Mario R. Guarracino 2011 pages 274-277] Furthermore, users will find it exceedingly costly to organize a very large number of opportunistic computing resources in a manner that can achieve reasonable high performance computing
High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into a multi ...
.[''Grid Computing: Towards a Global Interconnected Infrastructure'' edited by Nikolaos P. Preve 2011 page 71]
Quasi-control of computational resources
An example of a more structured grid for high performance computing is DEISA
The Distributed European Infrastructure for Supercomputing Applications (DEISA) was a European Union supercomputer project. A consortium of eleven national supercomputing centres from seven European countries promoted pan-European research on E ...
, a supercomputer project organized by the European Community
The European Economic Community (EEC) was a regional organization created by the Treaty of Rome of 1957,Today the largely rewritten treaty continues in force as the ''Treaty on the functioning of the European Union'', as renamed by the Lisb ...
which uses computers in seven European countries. Although different parts of a program executing within DEISA may be running on computers located in different countries under different ownerships and administrations, there is more control and coordination than with a purely opportunistic approach. DEISA has a two level integration scheme: the "inner level" consists of a number of strongly connected high performance computer clusters
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
The comp ...
that share similar operating systems and scheduling mechanisms and provide a ''homogeneous computing'' environment; while the "outer level" consists of ''heterogeneous systems'' that have supercomputing capabilities.[''Euro-Par 2006 workshops: parallel processing: CoreGRID 2006'' edited by Wolfgang Lehner 2007 pages] Thus DEISA can provide somewhat controlled, yet dispersed high performance computing services to users.[''Grid computing: International Symposium on Grid Computing'' (ISGC 2007) edited by Stella Shen 2008 page 170]
The quasi-opportunistic paradigm aims to overcome this by achieving more control over the assignment of tasks to distributed resources and the use of pre-negotiated scenarios for the availability of systems within the network. Quasi-opportunistic distributed execution of demanding parallel computing software in grids focuses on the implementation of grid-wise allocation agreements, co-allocation subsystems, communication topology-aware allocation mechanisms, fault tolerant message passing libraries and data pre-conditioning. In this approach, fault tolerant
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
message passing is essential to abstractly shield against the failures of the underlying resources.
The quasi-opportunistic approach goes beyond volunteer computing
Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
on a highly distributed systems such as BOINC
The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it becam ...
, or general grid computing
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
on a system such as Globus by allowing the middleware
Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue".
Middleware makes it easier for software developers to implement c ...
to provide almost seamless access to many computing clusters so that existing programs in languages such as Fortran or C can be distributed among multiple computing resources.
A key component of the quasi-opportunistic approach, as in the Qoscos Grid
The QosCosGrid is a quasi-opportunistic supercomputing system using grid computing.''Computational Science - Iccs 2008: 8th International Conference'' edited by Marian Bubak 2008 pages 112-11/ref>
QosCosGrid acts as middleware resource management ...
, is an economic-based resource allocation model in which resources are provided based on agreements among specific supercomputer administration sites. Unlike volunteer systems that rely on altruism, specific contractual terms are stipulated for the performance of specific types of tasks. However, "tit-for-tat" paradigms in which computations are paid back via future computations is not suitable for supercomputing applications, and is avoided.[''Algorithms and architectures for parallel processing'' by Anu G. Bourgeois 2008 pages 234-242]
The other key component of the quasi-opportunistic approach is a reliable message passing
In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporti ...
system to provide distributed checkpoint restart mechanisms when computer hardware or networks inevitably experience failures. In this way, if some part of a large computation fails, the entire run need not be abandoned, but can restart from the last saved checkpoint.
See also
* Grid computing
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
* History of supercomputing
The term supercomputing arose in the late 1920s in the United States in response to the IBM tabulators at Columbia University. The CDC 6600, released in 1964, is sometimes considered the first supercomputer. However, some earlier computers were ...
* Qoscos Grid
The QosCosGrid is a quasi-opportunistic supercomputing system using grid computing.''Computational Science - Iccs 2008: 8th International Conference'' edited by Marian Bubak 2008 pages 112-11/ref>
QosCosGrid acts as middleware resource management ...
* Supercomputer architecture
Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to ...
* Supercomputer operating systems
A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer arc ...
References
{{Reflist, 2
Supercomputing
Grid computing