HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, high-throughput computing (HTC) is the use of many computing resources over long periods of time to accomplish a computational task.


Challenges

The HTC community is also concerned with robustness and reliability of jobs over a long-time scale. That is, being able to create a reliable system from unreliable components. This research is similar to
transaction processing Transaction processing is information processing in computer science that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially compl ...
, but at a much larger and distributed scale. Some HTC systems, such as
HTCondor HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks. It can be used to manage workload on a dedicated cluster of computers, or to farm out wor ...
and PBS, can run tasks on opportunistic resources. It is a difficult problem, however, to operate in this environment. On one hand the system needs to provide a reliable operating environment for the user's jobs, but at the same time the system must not compromise the integrity of the execute node and allow the owner to always have full control of their resources.


High-throughput vs. high-performance vs. many-task

There are many differences between high-throughput computing,
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
(HPC), and
many-task computing Many-task computing (MTC)I. Raicu, I. Foster, Y. Zhao. "Many-Task Computing for Grids and Supercomputers", IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08), 2008 in computational science is an approach to parallel computin ...
(MTC). HPC tasks are characterized as needing large amounts of computing power for short periods of time, whereas HTC tasks also require large amounts of computing, but for much longer times (months and years, rather than hours and days). HPC environments are often measured in terms of
FLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate me ...
. The HTC community, however, is not concerned about operations per second, but rather operations per month or per year. Therefore, the HTC field is more interested in how many jobs can be completed over a long period of time instead of how fast. As an alternative definition, the
European Grid Infrastructure European Grid Infrastructure (EGI) is a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The EGI links centres in different European countries to support international res ...
defines HTC as ''"a computing paradigm that focuses on the efficient execution of a large number of loosely-coupled tasks"'', while HPC systems tend to focus on tightly coupled
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster o ...
jobs, and as such they must execute within a particular site with low-latency interconnects. Conversely, HTC systems are independent, sequential jobs that can be individually scheduled on many different computing resources across multiple administrative boundaries. HTC systems achieve this using various
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
technologies and techniques. MTC aims to bridge the gap between HTC and HPC. MTC is reminiscent of HTC, but it differs in the emphasis of using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations.


See also

*
Batch processing Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically ...
*
eScience E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable d ...
*
Open Science Grid Consortium The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is co ...
*
Grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...


References

{{Reflist Parallel computing