Semipredicate problem
   HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, a semipredicate problem occurs when a
subroutine In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a ...
intended to return a useful value can fail, but the signalling of failure uses an otherwise valid
return value In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after the instruction which called the subroutine, known as its return address. The return address is sav ...
. The problem is that the caller of the subroutine cannot tell what the result means in this case.


Example

The division operation yields a
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
, but fails when the divisor is
zero 0 (zero) is a number representing an empty quantity. Adding (or subtracting) 0 to any number leaves that number unchanged; in mathematical terminology, 0 is the additive identity of the integers, rational numbers, real numbers, and compl ...
. If we were to write a function that performs division, we might choose to return 0 on this invalid input. However, if the dividend is 0, the result is 0 too. This means that there is no number we can return to uniquely signal attempted division by zero, since all real numbers are in the
range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...
of division.


Practical implications

Early programmers handled potentially exceptional cases such as division using a convention requiring the calling routine to verify the inputs before calling the division function. This had two problems: first, it greatly encumbered all code that performed division (a very common operation); second, it violated the
Don't repeat yourself "Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of information which is likely to change, replacing it with abstractions that are less likely to change, or using data normalization which avoids r ...
and encapsulation principles, the former of which suggesting eliminating duplicated code, and the latter suggesting that data-associated code be contained in one place (in this division example, the verification of input was done separately). For a computation more complicated than division, it could be difficult for the caller to recognize invalid input; in some cases, determining input validity may be as costly as performing the entire computation. The target function could also be modified and would then expect different preconditions than would the caller; such a modification would require changes in every place where the function was called.


Solutions

The semipredicate problem is not universal among functions that can fail.


Using a custom convention to interpret return values

If the
range of a function In mathematics, the range of a function may refer either to the codomain of the function, or the image of the function. In some cases the codomain and the image of a function are the same set; such a function is called ''surjective'' or ''onto' ...
does not cover the entire
space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
corresponding to the
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
of the function's return value, a value known to be impossible under normal computation can be used. For example, consider the function index, which takes a string and a substring, and returns the
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
index of the substring in the main string. If the search fails, the function may be programmed to return −1 (or any other negative value), since this can never signify a successful result. This solution has its problems, though, as it overloads the natural meaning of a function with an arbitrary convention: * The programmer must remember specific failure values for many functions, which of course cannot be identical if the functions have different ranges. * A different
implementation Implementation is the realization of an application, execution of a plan, idea, scientific modelling, model, design, specification, Standardization, standard, algorithm, policy, or the Management, administration or management of a process or Goal ...
of the same function may choose to use a different failure value, resulting in possible bugs when programmers move from environment to environment. * If the failing function wishes to communicate useful information about why it had failed, one failure value is insufficient. * A signed integer halves the possible index range to be able to store the sign bit. * While the chosen value is an invalid result for this operation, it might be a valid input to followup operations. For example in Python str.find returns −1 if the substring is not found, but −1 is a valid index (negative indices generally start from the end).


Multivalued return

Many languages allow, through one mechanism or another, a function to return multiple values. If this is available, the function can be redesigned to return a boolean value signalling success or failure, along with its primary return value. If multiple error modes are possible, the function may instead return an enumerated return code (error code) along with its primary return value. Various techniques for returning multiple values include: * Returning a
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
of values. This is conventional in languages (such as Python) that have a built-in tuple data type and special syntax for handling these: in Python, x, y = f() calls the function f returning a pair of values and assigns the elements of the pair to two variables. * Secondary return values as in
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
. All expressions have a primary value, but secondary values might be returned to interested callers. For example, the GETHASH function returns the value of the given key in an associative map, or a default value otherwise. However, it also returns a secondary boolean indicating whether the value was found, making it possible to distinguish between the "no value was found" and "the value found was equal to default value" cases. This is different from returning a tuple, in that secondary return values are ''optional'' if a caller does not care about them, it may ignore them completely, whereas tuple-valued returns are merely
syntactic sugar In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an ...
for returning and unpacking a list, and ''every'' caller must always know about and consume all items returned. * Languages with
call by reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the ...
or equivalents, such as
call by address In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the ...
using pointers can allow for multivalued return by designating some parameters as
output parameter In computer programming, a parameter, a.k.a. formal argument, is a variable that represents an argument, a.k.a. actual argument, a.k.a. actual parameter, to a subroutine call.. A function's signature defines its parameters. A call invocation inv ...
s. In this case, the function could just return the error value, with a variable intended to store the actual result being passed to the function. This is analogous to the use of an
exit status In computing, the exit status (also exit code or exit value) of a terminated process is an integer number that is made available to its parent process (or caller). In DOS, this may be referred to as an errorlevel. When computer programs ar ...
to store an
error code In computing, an error code (or a return code) is a numeric or alphanumeric code that indicates the nature of an error and, when possible, why it occurred. Error codes can be reported to end users of software, returned from communication protoco ...
and to streams for returning content. * A variant of output parameters is used in
object-oriented language Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impleme ...
s that use call by sharing, where a mutable object is passed to a function, and the object is mutated to return values. *
Logic programming Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applyin ...
languages such as
Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
have no return values. Instead, unbound logical variables are used as output parameters, to be unified with values constructed in a predicate call.


Global variable for return status

Similar to an "out" argument, a
global variable In computer programming, a global variable is a variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the ''global environment'' or ''global ...
can store what error occurred (or simply whether an error occurred). For instance, if an error occurs, and is signalled (generally as above, by an illegal value like −1) the Unix
errno errno.h is a header file in the standard library of the C programming language. It defines macros for reporting and retrieving error conditions using the symbol errno (short form for "error number").International Standard for Programming Langu ...
variable is set to indicate which value occurred. Using a global has its usual drawbacks:
thread safety In multi-threaded computer programming, a function is thread-safe when it can be invoked or accessed concurrently by multiple threads without causing unexpected behavior, race conditions, or data corruption. As in the multi-threaded context where ...
becomes a concern (modern operating systems use a thread-safe version of errno), and if only one error global is used, its type must be wide enough to contain all interesting information about all possible errors in the system.


Exceptions

Exceptions are one widely used scheme for solving this problem. An error condition is not considered a return value of the function at all; normal
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
is disrupted, and explicit handling of the error takes place automatically. They are an example of out-of-band signalling.


Expanding the return value type


Manually created hybrid types

In C, a common approach, when possible, is to use a data type deliberately wider than strictly needed by the function. For example, the standard function getchar() is defined with return type int and returns a value in the range , 255(the range of unsigned char) on success or the value EOF ( implementation-defined, but outside the range of unsigned char) on the end of the input or a read error.


Nullable reference types

In languages with pointers or references, one solution is to return a pointer to a value, rather than the value itself. This return pointer can then be set to
null Null may refer to: Science, technology, and mathematics Astronomy *Nuller, an optical tool using interferometry to block certain sources of light Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that a data value do ...
to indicate an error. It is typically suited to functions that return a pointer anyway. This has a performance advantage over the OOP style of exception handling,Why Exceptions should be Exceptional – an example of performance comparison
with the drawback that negligent programmers may not check the return value, resulting in a crash when the invalid pointer is used. Whether a pointer is null or not is another example of the predicate problem; null may be a flag indicating failure or the value of a pointer returned successfully. A common pattern in the
UNIX Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
environment is setting a separate variable to indicate the cause of an error. An example of this is the
C standard library The C standard library, sometimes referred to as libc, is the standard library for the C (programming language), C programming language, as specified in the ISO C standard.International Organization for Standardization, ISO/International Electrote ...
fopen() function.


Implicitly hybrid types

In
dynamically typed In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usua ...
languages, such as
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
and
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
, the usual approach is to return false, none, or null when the function call fails. This works by returning a type different from the normal return type (thus expanding the type). It is a dynamically typed equivalent to returning a null pointer. For example, a numeric function normally returns a number (int or float), and while zero might be a valid response, false is not. Similarly, a function that normally returns a string might sometimes return the empty string as a valid response, but return false on failure. This process of type-juggling necessitates care in testing the return value: e.g., in PHP, use

(i.e., equal and of same type) rather than just

(i.e., equal, after automatic type conversion). It works only when the original function is not meant to return a boolean value, and still requires that information about the error be conveyed via other means.


Explicitly hybrid types

In
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
and other
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
languages, it is common to use a data type that is just as big as it needs to be to express any possible result. For example, one can write a division function that returned the type Maybe Real, and a getchar function returning Either String Char. The first is an
option type Option or Options may refer to: Computing * Option key, a key on Apple computer keyboards * Option type, a polymorphic data type in programming languages * Command-line option, an optional parameter to a command *OPTIONS, an HTTP request metho ...
, which has only one failure value, Nothing. The second case is a
tagged union In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...
: a result is either some string with a descriptive error message or a successfully read character. Haskell's
type inference Type inference, sometimes called type reconstruction, refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some bran ...
system helps ensure that callers deal with possible errors. Since the error conditions become explicit in the function type, looking at its signature immediately tells the programmer how to treat errors. Further, tagged unions and option types form monads when endowed with appropriate functions: this may be used to keep the code tidy by automatically propagating unhandled error conditions.


Example

Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
has
algebraic data type In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite data type, i.e., a data type formed by combining other types. Two common classes of algebraic types are product ty ...
s and comes with the built-in Result and Option types. fn find(key: String) -> Option The C++ programming language introduced std::optional in the
C++17 C17, C-17 or C.17 may refer to: Transportation * , a 1917 British C-class submarine Air * Boeing C-17 Globemaster III, a military transport aircraft * Lockheed Y1C-17 Vega, a six-passenger monoplane * Cierva C.17, a 1928 English experimental ...
update. std::optional find_int_in_str(std::string_view str) and std::expected in the
C++23 C++23, formally ISO/IEC 14882:2024, is the current open standard for the C++ programming language that follows C++20. The final draft of this version is N4950. In February 2020, at the final meeting for C++20 in Prague, an overall plan for C++ ...
update enum class parse_error ; std::expected parse_number(std::string_view str)


See also

*
Null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a ''null character'' (a character with an internal value of zero, called "NUL" in this article, not same a ...
*
Nullable type Nullable types are a feature of some programming languages which allow a value to be set to the special value NULL instead of the usual possible values of the data type. In statically typed languages, a nullable type is an option type, while i ...
*
Option type Option or Options may refer to: Computing * Option key, a key on Apple computer keyboards * Option type, a polymorphic data type in programming languages * Command-line option, an optional parameter to a command *OPTIONS, an HTTP request metho ...
*
Sentinel value In computer programming, a sentinel value (also referred to as a flag value, trip value, rogue value, signal value, or dummy data) is a special value in the context of an algorithm which uses its presence as a condition of termination, typically ...
*
Tagged union In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...


References

{{DEFAULTSORT:Semipredicate Problem Programming language topics