systems engineering Systems engineering is an interdisciplinary field of engineering and engineering management that focuses on how to design, integrate, and manage complex systems over their life cycles. At its core, systems engineering utilizes systems thinki ...

, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In

real-time computing Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constrai ...

, dependability is the ability to provide services that can be trusted within a time-period.A. Avizienis, J.-C. Laprie, Brian Randell, and C. Landwehr,
Basic Concepts and Taxonomy of Dependable and Secure Computing
" IEEE Transactions on Dependable and Secure Computing, vol. 1, pp. 11-33, 2004. The service guarantees must hold even when the system is subject to attacks or natural failures. The

International Electrotechnical Commission The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and ...

(IEC), via its Technical Committee TC 56 develops and maintains international standards that provide systematic methods and tools for dependability assessment and management of equipment, services, and systems throughout their life cycles. The IFIP Working Group 10.4 on "Dependable Computing and Fault Tolerance" plays a role in synthesizing the technical community's progress in the field and organizes two workshops each year to disseminate the results. Dependability can be broken down into three elements: * Attributes - a way to assess the dependability of a system * Threats - an understanding of the things that can affect the dependability of a system * Means - ways to increase a system's dependability

History

Some sources hold that word was coined in the nineteen-teens in Dodge Brothers automobile print advertising. But the word predates that period, with the

Oxford English Dictionary The ''Oxford English Dictionary'' (''OED'') is the first and foundational historical dictionary of the English language, published by Oxford University Press (OUP). It traces the historical development of the English language, providing a c ...

finding its first use in 1901. As interest in fault tolerance and system reliability increased in the 1960s and 1970s, dependability came to be a measure of as measures of reliability came to encompass additional measures like safety and integrity. In the early 1980s, Jean-Claude Laprie thus chose ''dependability'' as the term to encompass studies of fault tolerance and system reliability without the extension of meaning inherent in ''reliability''.J.C. Laprie. "Dependable Computing and Fault Tolerance: Concepts and terminology," in Proc. 15th IEEE Int. Symp. on Fault-Tolerant Computing, 1985 The field of dependability has evolved from these beginnings to be an internationally active field of research fostered by a number of prominent international conferences, notably the

International Conference on Dependable Systems and Networks The International Conference on Dependable Systems and Networks (or DSN) is an annual conference on topics related to dependable computer systems and reliable networks. It typically features a number of coordinated tracks, including the main paper ...

, the

International Symposium on Reliable Distributed Systems The International Symposium on Reliable Distributed Systems (SRDS) is an academic conference covering distributed systems design and development, particularly with properties such as reliability, availability, safety, security and real time. The sym ...

and the International Symposium on Software Reliability Engineering. Traditionally, dependability for a system incorporates

availability In reliability engineering, the term availability has the following meanings: * The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at ...

, reliability, maintainability but since the 1980s,

safety Safety is the state of being "safe", the condition of being protected from harm or other danger. Safety can also refer to the control of recognized hazards in order to achieve an acceptable level of risk. Meanings There are two slightly di ...

and

security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...

have been added to measures of dependability.

Elements of dependability

Attributes

Attributes are qualities of a system. These can be assessed to determine its overall dependability using Qualitative or

Quantitative Quantitative may refer to: * Quantitative research, scientific investigation of quantitative properties * Quantitative analysis (disambiguation) * Quantitative verse, a metrical system in poetry * Statistics, also known as quantitative analysis ...

measures. Avizienis et al. define the following Dependability Attributes: *

Availability In reliability engineering, the term availability has the following meanings: * The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at ...

- readiness for correct service * Reliability - continuity of correct service *

Safety Safety is the state of being "safe", the condition of being protected from harm or other danger. Safety can also refer to the control of recognized hazards in order to achieve an acceptable level of risk. Meanings There are two slightly di ...

- absence of catastrophic consequences on the user(s) and the environment *

Integrity Integrity is the practice of being honest and showing a consistent and uncompromising adherence to strong moral and ethical principles and values. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions. In ...

- absence of improper system alteration * Maintainability - ability for easy maintenance (repair) As these definitions suggested, only Availability and Reliability are quantifiable by direct measurements whilst others are more subjective. For instance Safety cannot be measured directly via metrics but is a subjective assessment that requires judgmental information to be applied to give a level of confidence, whilst Reliability can be measured as failures over time.

Confidentiality Confidentiality involves a set of rules or a promise usually executed through confidentiality agreements that limits the access or places restrictions on certain types of information. Legal confidentiality By law, lawyers are often required ...

, i.e. ''the absence of unauthorized disclosure of information'' is also used when addressing security. Security is a composite of

, and

. Security is sometimes classed as an attribute but the current view is to aggregate it together with dependability and treat Dependability as a composite term called Dependability and Security. Practically, applying security measures to the appliances of a system generally improves the dependability by limiting the number of externally originated errors.

Threats

Threats are things that can affect a system and cause a drop in Dependability. There are three main terms that must be clearly understood: * Fault: A fault (which is usually referred to as a bug for historic reasons) is a defect in a system. The presence of a fault in a system may or may not lead to a failure. For instance, although a system may contain a fault, its input and state conditions may never cause this fault to be executed so that an error occurs; and thus that particular fault never exhibits as a failure. * Error: An error is a discrepancy between the intended behavior of a system and its actual behavior inside the system boundary. Errors occur at runtime when some part of the system enters an unexpected state due to the activation of a fault. Since errors are generated from invalid states they are hard to observe without special mechanisms, such as debuggers or debug output to logs. * Failure: A failure is an instance in time when a system displays behavior that is contrary to its specification. An error may not necessarily cause a failure, for instance an exception may be thrown by a system but this may be caught and handled using fault tolerance techniques so the overall operation of the system will conform to the specification. It is important to note that Failures are recorded at the system boundary. They are basically Errors that have propagated to the system boundary and have become observable. Faults, Errors and Failures operate according to a mechanism. This mechanism is sometimes known as a Fault-Error-Failure chain. As a general rule a fault, when activated, can lead to an error (which is an invalid state) and the invalid state generated by an error may lead to another error or a failure (which is an observable deviation from the specified behavior at the system boundary). Once a fault is activated an error is created. An error may act in the same way as a fault in that it can create further error conditions, therefore an error may propagate multiple times within a system boundary without causing an observable failure. If an error propagates outside the system boundary a failure is said to occur. A failure is basically the point at which it can be said that a service is failing to meet its specification. Since the output data from one service may be fed into another, a failure in one service may propagate into another service as a fault so a chain can be formed of the form: Fault leading to Error leading to Failure leading to Error, etc.

Means

Since the mechanism of a Fault-Error-Chain is understood it is possible to construct means to break these chains and thereby increase the dependability of a system. Four means have been identified so far: # Prevention # Removal # Forecasting # Tolerance Fault Prevention deals with preventing faults being introduced into a system. This can be accomplished by use of development methodologies and good implementation techniques. Fault Removal can be sub-divided into two sub-categories: Removal During Development and Removal During Use.
Removal during development requires verification so that faults can be detected and removed before a system is put into production. Once systems have been put into production a system is needed to record failures and remove them via a maintenance cycle. Fault Forecasting predicts likely faults so that they can be removed or their effects can be circumvented.

Fault Tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...

deals with putting mechanisms in place that will allow a system to still deliver the required service in the presence of faults, although that service may be at a degraded level. Dependability means are intended to reduce the number of failures made visible to the end users of a system.

Persistence

Based on how faults appear or persist, they are classified as: * Transient: They appear without apparent cause and disappear again without apparent cause * Intermittent: They appear multiple times, possibly without a discernible pattern, and disappear on their own * Permanent: Once they appear, they do not get resolved on their own

Dependability of information systems and survivability

Some works on dependability use structured

information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...

s, e.g. with SOA, to introduce the attribute survivability, thus taking into account the degraded services that an Information System sustains or resumes after a non-maskable failure. The flexibility of current frameworks encourage system architects to enable reconfiguration mechanisms that refocus the available, safe resources to support the most critical services rather than over-provisioning to build failure-proof system. With the generalisation of networked information systems,

accessibility Accessibility is the design of products, devices, services, vehicles, or environments so as to be usable by people with disabilities. The concept of accessible design and practice of accessible development ensures both "direct access" (i. ...

was introduced to give greater importance to users' experience. To take into account the level of performance, the measurement of performability is defined as "quantifying how well the object system performs in the presence of faults over a specified period of time".

References

{{Computer science Computing terminology Safety Safety engineering Security Formal methods Quality

History

Elements of dependability

Attributes

Threats

Means

Persistence

Dependability of information systems and survivability

See also

Further reading

Papers

Conferences

Journals

Books

Research projects

References