HOME

TheInfoList



OR:

A scanf format string (''scan f''ormatted) is a control parameter used in various functions to specify the layout of an input
string String or strings may refer to: * String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian ani ...
. The functions can then divide the string and translate into values of appropriate
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most prog ...
s. String scanning functions are often supplied in standard
libraries A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
.Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments. The term "scanf" comes from the C library, which popularized this type of function, but such functions predate C, and other names are used, such as readf in
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously de ...
. scanf format strings, which provide formatted input (
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
), are complementary to
printf format string The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied litera ...
s, which provide formatted output ( templating). These provide simple functionality and fixed format compared to more sophisticated and flexible parsers or template engines, but are sufficient for many purposes.


History

Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well a ...
's portable input/output library, including scanf, officially became part of Unix in Version 7.


Usage

The scanf function, which is found in C, reads input for numbers and other
datatype In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s from standard input (often a
command line interface A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive command (computing), commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invokin ...
or similar kind of a
text user interface In computing, text-based user interfaces (TUI) (alternately terminal user interfaces, to reflect a dependence upon the properties of computer terminals and not just text), is a retronym describing a type of user interface (UI) common as an ea ...
). The following C code reads a variable number of unformatted decimal
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
s from the standard input stream and prints each of them out on separate lines: #include int main(void) After being processed by the program above, an irregularly spaced list of integers such as 456 123 789 456 12 456 1 2378 will appear consistently spaced as: 456 123 789 456 12 456 1 2378 To print out a word: #include int main(void) No matter what the data type the programmer wants the program to read, the arguments (such as &n above) must be pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for. In the last example an address-of operator (&) is ''not'' used for the argument: as word is the name of an
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
of char, as such it is (in all contexts in which it evaluates to an address) equivalent to a pointer to the first element of the array. While the expression &word would numerically evaluate to the same value, semantically, it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it. This fact needs to be kept in mind when assigning scanf output to strings. As scanf is designated to read only from standard input, many programming languages with interfaces, such as
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
, have derivatives such as sscanf and fscanf but not scanf itself.


Format string specifications

The formatting placeholders in scanf are more or less the same as that in printf, its reverse function. As in printf, the POSIX extension is defined. There are rarely constants (i.e., characters that are not formatting placeholders) in a format string, mainly because a program is usually not designed to read known data, although scanf does accept these if explicitly specified. The exception is one or more
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
characters, which discards all whitespace characters in the input. Some of the most commonly used placeholders follow: * %a : Scan a floating-point number in its hexadecimal notation. * %d : Scan an integer as a signed
decimal The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numer ...
number. * %i : Scan an integer as a signed number. Similar to %d, but interprets the number as
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hex ...
when preceded by 0x and
octal The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, uses a base-10 number ...
when preceded by 0. For example, the string 031 would be read as 31 using %d, and 25 using %i. The flag h in %hi indicates conversion to a short and hh conversion to a char. * %u : Scan for decimal unsigned int (Note that in the C99 standard the input value minus sign is optional, so if a minus sign is read, no errors will arise and the result will be the
two's complement Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
of a negative number, likely a very large value. See strtoul().) Correspondingly, %hu scans for an unsigned short and %hhu for an unsigned char. * %f : Scan a
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
number in normal ( fixed-point) notation. * %g, %G : Scan a floating-point number in either normal or exponential notation. %g uses lower-case letters and %G uses upper-case. * %x, %X : Scan an integer as an unsigned
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hex ...
number. * %o : Scan an integer as an
octal The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, uses a base-10 number ...
number. * %s : Scan a
character string In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
. The scan terminates at
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
. A
null character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...
is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length. * %c : Scan a character (char). No
null character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...
is added. *
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
: Any whitespace characters trigger a scan for zero or more
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
characters. The number and type of whitespace characters do not need to match in either direction. * %lf : Scan as a double floating-point number. "Float" format with the "long" specifier. * %Lf : Scan as a long double floating-point number. "Float" format the "long long" specifier. * %n : Nothing is expected. The number of characters consumed thus far from the input is stored through the next pointer, which must be a pointer to int. This is not a conversion and does not increase the count returned by the function. The above can be used in compound with numeric modifiers and the l, L modifiers which stand for "long" and "long long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long modifiers if any, that specifies the number of characters to be scanned. An optional
asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often vo ...
(*) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable. The ff modifier in printf is not present in scanf, causing differences between modes of input and output. The ll and hh modifiers are not present in the C90 standard, but are present in the C99 standard. An example of a format string is :"%7d%s %c%lf" The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, newline, or tab is found, then consumes whitespace until the first non-whitespace character is found, then consumes that character, and finally scans the remaining characters as a double. Therefore, a robust program must check whether the scanf call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must discarded before new input can be read. An alternative method, which avoids this, is to use fgets and then examine the string read in. The last step can be done by sscanf, for example. In the case of the many float type characters , many implementations choose to collapse most into the same parser. Microsoft MSVCRT does it with , while
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by ...
does so with all four.


Vulnerabilities

scanf is vulnerable to
format string attack Uncontrolled format string is a type of software vulnerability discovered around 1989 that can be used in security exploits. Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The proble ...
s. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary and cannot be determined before the scanf function is executed. This means that %s placeholders without length specifiers are inherently insecure and exploitable for
buffer overflows In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations. Buffers are areas of mem ...
. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user-controlled files. In this case the allowed input length of string sizes cannot be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack or contain undesirable or even insecure pointers, depending on the particular implementation of varargs.


See also

*
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well a ...
*
Format string attack Uncontrolled format string is a type of software vulnerability discovered around 1989 that can be used in security exploits. Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The proble ...
*
Printf format string The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied litera ...
*
String interpolation In computer programming, string interpolation (or variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders ...


References


External links

*
C++ reference for std::scanf
{{CProLang Articles with example C code C standard library