C character classification is an operation provided by a group of functions in the
ANSI C Standard Library for the
C programming language
''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.
History
Early C-language programmers working on the
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
operating system developed
programming idioms for classifying characters into different types. For example, for the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
character set, the following expression identifies a letter, when its value is ''true'':
('A' <= c && c <= 'Z') , , ('a' <= c && c <= 'z')
As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ''ctype.h''.
Implementation
Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as
#define isdigit(x) (TABLE & 1)
Early versions of
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
used a potentially faulty method similar to the first code sample:
#define isdigit(x) ((x) >= '0' && (x) <= '9')
This can cause problems if when the macro expands, the expression substituted for ''x'' has a
side effect
In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequence ...
. For example, if one calls ''isdigit(x++)'' or ''isdigit(run_some_program())''. It is not immediately evident that the argument to ''isdigit'' is evaluated twice. For this reason, the table-based approach is generally used.
Overview of functions
The functions that operate on single-byte characters are defined in ''ctype.h'' header file (''cctype'' in C++).
The functions that operate on wide characters are defined in ''wctype.h'' header file (''cwctype'' in C++).
The classification is evaluated according to the effective locale.
References
External links
C standard library
{{Use dmy dates, date=October 2017