HOME

TheInfoList



OR:

A watchdog timer (sometimes called a ''computer operating properly'' or ''COP'' timer, or simply a ''watchdog'') is an electronic or software
timer A timer is a specialized type of clock used for measuring specific time intervals. Timers can be categorized into two main types. The word "timer" is usually reserved for devices that counts down from a specified time interval, while devices th ...
that is used to detect and recover from
computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
malfunctions. Watchdog timers are widely used in computers to facilitate automatic correction of temporary hardware faults, and to prevent errant or malevolent software from disrupting system operation. During normal operation, the computer regularly restarts the watchdog timer to prevent it from elapsing, or "timing out". If, due to a hardware fault or program error, the computer fails to restart the watchdog, the timer will elapse and generate a timeout signal. The timeout signal is used to initiate corrective actions. The corrective actions typically include placing the computer and associated hardware in a safe state and invoking a computer reboot.
Microcontroller A microcontroller (MCU for ''microcontroller unit'', often also MC, UC, or μC) is a small computer on a single VLSI integrated circuit (IC) chip. A microcontroller contains one or more CPUs ( processor cores) along with memory and programmabl ...
s often include an integrated, on-chip watchdog. In other computers the watchdog may reside in a nearby chip that connects directly to the
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
, or it may be located on an external expansion card in the computer's chassis.


Applications

Watchdog timers are commonly found in
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded ...
s and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to invoke a reboot if it hangs; it must be self-reliant. For example, remote embedded systems such as
space probe A space probe is an artificial satellite that travels through space to collect scientific data. A space probe may orbit Earth; approach the Moon; travel through interplanetary space; flyby, orbit, or land or fly on other planetary bodies; o ...
s are not physically accessible to human operators; these could become permanently disabled if they were unable to autonomously recover from faults. In
robot A robot is a machine—especially one programmable by a computer—capable of carrying out a complex series of actions automatically. A robot can be guided by an external control device, or the control may be embedded within. Robots may be ...
s and other automated machines, a fault in the control computer could cause equipment damage or injuries before a human could react, even if the computer is easily accessed. A watchdog timer is usually employed in cases like these. Watchdog timers are also used to monitor and limit software execution time on a normally functioning computer. For example, a watchdog timer may be used when running untrusted code in a
sandbox A sandbox is a sandpit, a wide, shallow playground construction to hold sand, often made of wood or plastic. Sandbox or Sand box may also refer to: Arts, entertainment, and media * Sandbox (band), a Canadian rock music group * ''Sand ...
, to limit the CPU time available to the code and thus prevent some types of
denial-of-service attack In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conne ...
s. In real-time operating systems, a watchdog timer may be used to monitor a time-critical task to ensure it completes within its maximum allotted time and, if it fails to do so, to terminate the task and report the failure.


Architecture and operation


Restarting

The act of restarting a watchdog timer is commonly referred to as ''kicking'' the watchdog. Kicking is typically done by writing to a watchdog control
port A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as H ...
or by setting a particular bit in a
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), th ...
. Alternatively, some tightly coupled watchdog timers are kicked by executing a special
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
instruction. An example of this is the CLRWDT (clear watchdog timer) instruction found in the instruction set of some PIC microcontrollers. In computers that are running
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, watchdog restarts are usually invoked through a
device driver In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and o ...
. For example, in the
Linux operating system Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whic ...
, a user space program will kick the watchdog by interacting with the watchdog device driver, typically by writing a zero character to or by calling a KEEPALIVE
ioctl In computing, ioctl (an abbreviation of input/output control) is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls. It takes a parameter specifying a request code; ...
. The device driver, which serves to abstract the watchdog hardware from user space programs, may also be used to configure the time-out period and start and stop the timer. Some watchdog timers will only allow kicks during a specific time window. The window timing is usually relative to the previous kick or, if the watchdog has not yet been kicked, to the moment the watchdog was enabled. The window begins after a delay following the previous kick, and ends after a further delay. If the computer attempts to kick the watchdog before or after the window, the watchdog will not be restarted, and in some implementations this will be treated as a fault and trigger corrective action.


Enabling

A watchdog timer is said to be ''enabled'' when operating and ''disabled'' when idle. Upon power-up, a watchdog may be unconditionally enabled or it may be initially disabled and require an external signal to enable it. In the latter case, the enabling signal may be automatically generated by hardware or it may be generated under software control. When automatically generated, the enabling signal is typically derived from the computer reset signal. In some systems the reset signal is directly used to enable the watchdog. In others, the reset signal is delayed so that the watchdog will become enabled at some later time following the reset. This delay allows time for the computer to boot before the watchdog is enabled. Without this delay, the watchdog would timeout and invoke a subsequent reset before the computer can run its application software — the software which kicks the watchdog — and the system would become stuck in an endless cycle of incomplete reboots.


Single-stage watchdog

Watchdog timers come in many configurations, and many allow their configurations to be altered. For example, the watchdog and CPU may share a common
clock signal In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits. A clock si ...
as shown in the block diagram below, or they may have independent clock signals. A basic watchdog timer has a single timer stage which, upon timeout, typically will reset the CPU:


Multistage watchdog

Two or more timers are sometimes cascaded to form a ''multistage watchdog timer'', where each timer is referred to as a ''timer stage'', or simply a ''stage''. For example, the block diagram below shows a three-stage watchdog. In a multistage watchdog, only the first stage is kicked by the processor. Upon first stage timeout, a corrective action is initiated and the next stage in the cascade is started. As each subsequent stage times out, it triggers a corrective action and starts the next stage. Upon final stage timeout, a corrective action is initiated, but no other stage is started because the end of the cascade has been reached. Typically, single-stage watchdog timers are used to simply restart the computer, whereas multistage watchdog timers will sequentially trigger a series of corrective actions, with the final stage triggering a computer restart.


Time intervals

Watchdog timers may have either fixed or programmable time intervals. Some watchdog timers allow the time interval to be programmed by selecting from among a few selectable, discrete values. In others, the interval can be programmed to arbitrary values. Typically, watchdog time intervals range from ten milliseconds to a minute or more. In a multistage watchdog, each timer may have its own, unique time interval.


Corrective actions

A watchdog timer may initiate any of several types of corrective action, including
maskable interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
, non-maskable interrupt,
hardware reset A hardware reset or hard reset of a computer system is a hardware operation that re-initializes the core hardware components of the system, thus ending all current software operations in the system. This is typically, but not always, followed by b ...
, fail-safe state activation,
power cycling Power cycling is the act of turning a piece of equipment, usually a computer, off and then on again. Reasons for power cycling include having an electronic device reinitialize its set of configuration parameters or recover from an unresponsive sta ...
, or combinations of these. Depending on its architecture, the type of corrective action or actions that a watchdog can trigger may be fixed or programmable. Some computers (e.g., PC compatibles) require a pulsed signal to invoke a hardware reset. In such cases, the watchdog typically triggers a hardware reset by activating an internal or external pulse generator, which in turn creates the required reset pulses. In embedded systems and control systems, watchdog timers are often used to activate fail-safe circuitry. When activated, the fail-safe circuitry forces all control outputs to safe states (e.g., turns off motors, heaters, and high-
voltage Voltage, also known as electric pressure, electric tension, or (electric) potential difference, is the difference in electric potential between two points. In a static electric field, it corresponds to the work needed per unit of charge to ...
s) to prevent injuries and equipment damage while the fault persists. In a two-stage watchdog, the first timer is often used to activate fail-safe outputs and start the second timer stage; the second stage will reset the computer if the fault cannot be corrected before the timer elapses. Watchdog timers are sometimes used to trigger the recording of system state information—which may be useful during fault recovery—or
debug In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems. Debugging tactics can involve i ...
information (which may be useful for determining the cause of the fault) onto a persistent medium. In such cases, a second timer—which is started when the first timer elapses—is typically used to reset the computer later, after allowing sufficient time for data recording to complete. This allows time for the information to be saved, but ensures that the computer will be reset even if the recording process fails. For example, the above diagram shows a likely configuration for a two-stage watchdog timer. During normal operation the computer regularly kicks Stage1 to prevent a timeout. If the computer fails to kick Stage1 (e.g., due to a hardware fault or programming error), Stage1 will eventually timeout. This event will start the Stage2 timer and, simultaneously, notify the computer (by means of a non-maskable interrupt) that a reset is imminent. Until Stage2 times out, the computer may attempt to record state information, debug information, or both. The computer will be reset upon Stage2 timeout.


Fault detection

A watchdog timer provides automatic detection of catastrophic malfunctions that prevent the computer from kicking it. However, computers often have other, less-severe types of faults which do not interfere with kicking, but which still require watchdog oversight. To support these, a computer system is typically designed so that its watchdog timer will be kicked only if the computer deems the system functional. The computer determines whether the system is functional by conducting one or more fault detection tests and will kick the watchdog only if all tests have passed. In computers that are running an operating system and multiple processes, a single, simple test may be insufficient to guarantee normal operation, as it could fail to detect a subtle fault condition and therefore allow the watchdog to be kicked even though a fault condition exists. For example, in the case of the Linux operating system, a user-space watchdog daemon may simply kick the watchdog periodically without performing any tests. As long as the daemon runs normally, the system will be protected against serious system crashes such as a
kernel panic A kernel panic (sometimes abbreviated as KP) is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher ...
. To detect less severe faults, the daemon can be configured to perform tests that cover resource availability (e.g., sufficient
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
and file handles, reasonable CPU time), evidence of expected process activity (e.g., system daemons running, specific files being present or updated), overheating, and network activity, and system-specific test scripts or programs may also be run. Upon discovery of a failed test, the computer may attempt to perform a sequence of corrective actions under software control, culminating with a software-initiated reboot. If the software fails to invoke a reboot, the watchdog timer will timeout and invoke a hardware reset. In effect, this is a multistage watchdog timer in which the software comprises the first and intermediate timer stages and the hardware reset the final stage. In a Linux system, for example, the watchdog daemon may attempt to perform a software-initiated restart, which can be preferable to a hardware reset as the file systems will be safely unmounted and fault information will be logged. However it is essential to have the insurance of the hardware timer as a software restart can fail under a number of fault conditions.


See also

*
Command Loss Timer Reset Command Loss Timer Resets are part of the CCSDS communications system to spacecraft either in Earth orbit or beyond Earth orbit. The Command Loss Timer Reset, if it is not received in a timely manner by the spacecraft, generally forces the spacecra ...
a related method to keep a spacecraft commandable * Safe mode (spacecraft) *
Immunity Aware Programming Immunity may refer to: Medicine * Immunity (medical), resistance of an organism to infection or disease * ''Immunity'' (journal), a scientific journal published by Cell Press Biology * Immune system Engineering * Radiofrequence immunity desc ...
* Dead man's switch * Heartbeat (computing) *
Keepalive A keepalive (KA) is a message sent by one device to another to check that the link between the two is operating, or to prevent the link from being broken. Description Once a TCP connection has been established, that connection is defined to be v ...


Notes


References


External links

{{Wikibooks , Embedded Systems , Watchdog Timer
Building a great watchdog
– Article by Jack Ganssle Embedded systems