Batch Job
   HOME

TheInfoList



OR:

Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.


History

The term "batch processing" originates in the traditional classification of
methods of production Production methods fall into three main categories: job (one-off production), batch (multiple items, one step at a time for all items), and flow Job production Job production is used when a product is produced with the labor of one or few work ...
as
job production Job production, sometimes called jobbing or one-off production, involves producing custom work, such as a one-off product for a specific customer or a small batch of work in quantities usually less than those of mass-market products. Job producti ...
(one-off production),
batch production Batch production is a method of manufacturing where the products are made as specified groups or amounts, within a time frame. A batch can go through a series of steps in a large manufacturing process to make the final desired product. Batch prod ...
(production of a "batch" of multiple items at once, one stage at a time), and flow production (mass production, all stages in process at once).


Early history

Early computers were capable of running only one program at a time. Each user had sole control of the machine for a scheduled period of time. They would arrive at the computer with program and data, often on punched paper cards and magnetic or paper tape, and would load their program, run and debug it, and carry off their output when done. As computers became faster the setup and takedown time became a larger percentage of available computer time. Programs called ''monitors'', the forerunners of
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, were developed which could process a series, or "batch", of programs, often from
magnetic tape Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use mag ...
prepared offline. The monitor would be loaded into the computer and run the first job of the batch. At the end of the job it would regain control and load and run the next until the batch was complete. Often the output of the batch would be written to magnetic tape and printed or punched offline. Examples of monitors were IBM's ''Fortran Monitor System'', SOS (Share Operating System), and finally IBSYS for IBM's 709x systems in 1960.


Third-generation systems

capable of
multiprogramming In computing, multitasking is the concurrent execution of multiple tasks (also known as processes) over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result ...
began to appear in the 1960s. Instead of running one batch job at a time, these systems can have multiple batch programs running at the same time in order to keep the system as busy as possible. One or more programs might be awaiting input, one actively running on the CPU, and others generating output. Instead of offline input and output, programs called spoolers read jobs from cards, disk, or remote terminals and place them in a
job queue In system software, a job queue ( batch queue, input queue), is a data structure maintained by job scheduler software containing jobs to run. Users submit their programs that they want executed, "jobs", to the queue for batch processing. The s ...
to be run. In order to prevent
deadlock In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a lo ...
s the
job scheduler A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional ''job' ...
needs to know each job's resource requirements—memory, magnetic tapes, mountable disks, etc., so various scripting languages were developed to supply this information in a structured way. Probably the most well-known is IBM's ''
Job Control Language Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. More specifically, the purpose of JCL is to say which programs to run, ...
'' (JCL). Job schedulers select jobs to run according to a variety of criteria, including priority, memory size, etc. Remote batch is a procedure for submitting batch jobs from remote terminals, often equipped with a
punch card reader A computer punched card reader or just computer card reader is a computer input device used to read computer programs in either source or executable form and data from punched cards. A computer card punch is a computer output device that punche ...
and a
line printer A line printer prints one entire line of text before advancing to another line. Most early line printers were impact printers. Line printers are mostly associated with unit record equipment and the early days of digital computing, but the ...
. Sometimes asymmetric multiprocessing is used to spool batch input and output for one or more large computers using an attached smaller and less-expensive system, as in the IBM System/360
Attached Support Processor Attached Support Processor (ASP) was an implementation of loosely coupled multiprocessing for IBM's OS/360 operating system. IBM later changed the name to Asymmetrical multiProcessor but retained the acronym ASP. ASP evolved from the design of t ...
.


Later history

The first general purpose time sharing system,
Compatible Time-Sharing System The Compatible Time-Sharing System (CTSS) was the first general purpose time-sharing operating system. Compatible Time Sharing referred to time sharing which was compatible with batch processing; it could offer both time sharing and batch proce ...
(CTSS), was compatible with batch processing. This facilitated transitioning from batch processing to
interactive computing In computer science, interactive computing refers to software which accepts input from the user as it runs. Interactive software includes commonly used programs, such as word processors or spreadsheet applications. By comparison, non-interactive ...
. From the late 1960s onwards, interactive computing such as via text-based
computer terminal A computer terminal is an electronic or electromechanical hardware device that can be used for entering data into, and transcribing data from, a computer or a computing system. The teletype was an example of an early-day hard-copy terminal a ...
interfaces (as in
Unix shell A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating system t ...
s or read-eval-print loops), and later
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows User (computing), users to Human–computer interaction, interact with electronic devices through graphical icon (comp ...
s became common. Non-interactive computation, both one-off jobs such as compilation, and processing of multiple items in batches, became retrospectively referred to as ''batch processing'', and the term ''batch job'' (in early use often "batch ''of'' jobs") became common. Early use is particularly found at the
University of Michigan , mottoeng = "Arts, Knowledge, Truth" , former_names = Catholepistemiad, or University of Michigania (1817–1821) , budget = $10.3 billion (2021) , endowment = $17 billion (2021)As o ...
, around the Michigan Terminal System (MTS). Although timesharing did exist, its use was not robust enough for corporate data processing; none of this was related to the earlier
unit record equipment Starting at the end of the nineteenth century, well before the advent of electronic computers, data processing was performed using electromechanical machines collectively referred to as unit record equipment, electric accounting machines (EAM) o ...
, which was human-operated.


Ongoing

Non-interactive computation remains pervasive in computing, both for general data processing and for system "housekeeping" tasks (using
system software System software is software designed to provide a platform for other software. Examples of system software include operating systems (OS) like macOS, Linux, Android and Microsoft Windows, computational science software, game engines, search engin ...
). A high-level program (executing multiple programs, with some additional "glue" logic) is today most often called a ''script'', and written in
scripting language A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled. A scripti ...
s, particularly
shell script A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manip ...
s for system tasks; in
IBM PC DOS IBM PC DOS, an acronym for IBM Personal Computer Disk Operating System, is a discontinued disk operating system for IBM PC compatibles. It was manufactured and sold by IBM from the early 1980s into the 2000s. Developed by Microsoft, it was als ...
and
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few oper ...
this is instead known as a
batch file Batch may refer to: Food and drink * Batch (alcohol), an alcoholic fruit beverage * Batch loaf, a type of bread popular in Ireland * A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
. That includes
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
-based computers, Microsoft Windows,
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
(whose foundation is the BSD Unix kernel), and even
smartphones A smartphone is a Mobile device, portable computer device that combines Mobile phone, mobile telephone and Mobile computing, computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities ...
. A running script, particularly one executed from an interactive
login session In computing, a login session is the period of activity between a user logging in and logging out of a (multi-user) system. On Unix and Unix-like operating systems, a login session takes one of two main forms: * When a textual user interface is ...
, is often known as a job, but that term is used very ambiguously. "There is no direct counterpart to z/OS batch processing in PC or UNIX systems. Batch jobs are typically executed at a scheduled time or on an as-needed basis. Perhaps the closest comparison is with processes run by an AT or CRON command in UNIX, although the differences are significant."


Modern systems

Batch applications are still critical in most organizations in large part because many common business processes are amenable to batch processing. While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines. Some applications are amenable to flow processing, namely those that only need data from a single input at once (not totals, for instance): start the next step for each input as it completes the previous step. In this case flow processing lowers latency for individual inputs, allowing them to be completed without waiting for the entire batch to finish. However, many applications require data from all records, notably computations such as totals. In this case the entire batch must be completed before one has a usable result: partial results are not usable. Modern batch applications make use of modern batch frameworks such as Jem The Bee, Spring Batch or implementations of JSR 352 written for
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
, and other frameworks for other programming languages, to provide the
fault tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
and
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
required for high-volume processing. In order to ensure high-speed processing, batch applications are often integrated with
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from ...
solutions to partition a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong
input/output In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...
performance and vertical
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, including modern
mainframe computers A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
, tend to provide better batch performance than alternatives.
Scripting languages A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled. A scriptin ...
became popular as they evolved along with batch processing.


Batch window

A ''batch window'' is "a period of less-intensive online activity", when the computer system is able to run batch jobs without interference from, or with, interactive online systems. A bank's ''end-of-day (EOD)'' jobs require the concept of ''cutover'', where transaction and data are cut off for a particular day's batch activity ("deposits after 3 PM will be processed the next day"). As requirements for online systems uptime expanded to support
globalization Globalization, or globalisation (English in the Commonwealth of Nations, Commonwealth English; American and British English spelling differences#-ise, -ize (-isation, -ization), see spelling differences), is the process of foreign relation ...
, the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...
, and other business needs, the batch window shrank and increasing emphasis was placed on techniques that would require online data to be available for a maximum amount of time.


Batch size

The ''batch size'' refers to the number of work units to be processed within one batch operation. Some examples are: * The number of lines from a file to load into a database before committing the transaction. * The number of messages to dequeue from a queue. * The number of requests to send within one payload.


Common batch processing usage

* Efficient bulk database updates and automated
transaction processing Transaction processing is information processing in computer science that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially compl ...
, as contrasted to interactive
online transaction processing In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of wh ...
(OLTP) applications. The
extract, transform, load In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
(ETL) step in populating data warehouses is inherently a batch process in most implementations. * Performing bulk operations on
digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with '' finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...
s such as resizing, conversion, watermarking, or otherwise editing a group of image files. * Converting computer files from one format to another. For example, a batch job may convert proprietary and legacy files to common standard formats for end-user queries and display.


Notable batch scheduling and execution environments

The
IBM mainframe IBM mainframes are large computer systems produced by IBM since 1952. During the 1960s and 1970s, IBM dominated the large computer market. Current mainframe computers in IBM's line of business computers are developments of the basic design of t ...
z/OS z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions.Starting with the earliest: ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
or platform has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution. Today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
image. Technologies that aid concurrent batch and online processing include
Job Control Language Job Control Language (JCL) is a name for scripting languages used on IBM mainframe operating systems to instruct the system on how to run a batch job or start a subsystem. More specifically, the purpose of JCL is to say which programs to run, ...
(JCL), scripting languages such as
REXX Rexx (Restructured Extended Executor) is a programming language that can be interpreted or compiled. It was developed at IBM by Mike Cowlishaw. It is a structured, high-level programming language designed for ease of learning and reading. ...
, Job Entry Subsystem (
JES2 The Job Entry Subsystem (JES) is a component of IBM's MVS mainframe operating systems that is responsible for managing batch workloads. In modern times, there are two distinct implementations of the Job Entry System called JES2 and JES3. They ar ...
and JES3),
Workload Manager In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on administrator-def ...
(WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS),
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and ...
data sharing, Parallel Sysplex, unique performance optimizations such as
HiperDispatch HiperDispatch is a workload dispatching feature found in recent IBM mainframe models (the System z10 and IBM zEnterprise System processors and later models) running recent releases of z/OS. HiperDispatch was introduced in February 2008. Support wa ...
, I/O channel architecture, and several others. The Unix programs cron, at, and batch (today batch is a variant of at) allow for complex scheduling of jobs. Windows has a
job scheduler A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional ''job' ...
. Most
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multi ...
clusters use batch processing to maximize cluster usage.


See also

* Background process *
Batch file Batch may refer to: Food and drink * Batch (alcohol), an alcoholic fruit beverage * Batch loaf, a type of bread popular in Ireland * A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
*
Batch renaming Batch renaming is a form of batch processing used to rename multiple computer files and folders in an automated fashion, in order to save time and reduce the amount of work involved. Some sort of software is required to do this. Such software can ...
- to rename lots of files automatically without human intervention, in order to save time and effort * BatchPipes - for utility that increases batch performance *
Processing modes Data processing modes or computing modes are classifications of different types of computer processing. * Interactive computing or Interactive processing, historically introduced as Time-sharing * Transaction processing * Batch processing * Real ...
* Production support - for batch job/schedule/stream support * High-throughput computing


Notes


References

{{Reflist Job scheduling