HOME

TheInfoList



OR:

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s. In the ASCII character set, this character is encoded by the number 26 ( hex). Standard keyboards transmit this code when the and keys are pressed simultaneously (, often documented by convention as '')''.
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
inherits this character from ASCII, but recommends that the replacement character (�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.


Uses


End of file

Historically, under PDP-6 monitor,
RT-11 RT-11 (Real-time 11) is a discontinued small, low-end, single-user real-time operating system for the full line of Digital Equipment Corporation PDP-11 16-bit computers. RT-11 was first implemented in 1970. It was widely used for real-time compu ...
, VMS, and TOPS-10, and in early PC CP/M 1 and 2
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s (and derivatives like MP/M) it was necessary to explicitly mark the end of a file (EOF) because the native filesystem could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file. This extra space was filled with 16 ( hex) characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives like Concurrent CP/M, Concurrent DOS, and
DOS Plus DOS Plus (erroneously also known as DOS+) was the first operating system developed by Digital Research's OEM Support Group in Newbury, Berkshire, UK, first released in 1985. DOS Plus 1.0 was based on CP/M-86 Plus combined with the PCM ...
) did support byte-granular files, so this was no longer a requirement, but it remained as a convention (especially for
text file A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In ope ...
s) in order to ensure backward compatibility. In CP/M, 86-DOS, MS-DOS, PC DOS, DR-DOS, and their various derivatives, the SUB character was also used to indicate the end of a character stream, and thereby used to terminate user input in an interactive command line window (and as such, often used to finish console input redirection, e.g. as instigated by the command ). While no longer technically required to indicate the end of a file, as of 2017, many text editors and program languages still support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files. In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g. PNG or
GIF The Graphics Interchange Format (GIF; or , ) is a Raster graphics, bitmap Image file formats, image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released ...
) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g. CSV-1203) still recommend a trailing EOF character to be appended as the last character in the file. However, typing does not embed an EOF character into a file in either DOS or Windows, nor do the APIs of those systems use the character to denote the actual end of a file. Some programming languages (e.g.
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic (.NET), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (classic), the original Visual Basic suppo ...
) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.), and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it. Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called " File Separator" has also been used for similar purposes.


Other uses

In Unix-like operating systems, this character is typically used in
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses Science Biology * Seashell, a hard outer layer of a marine ani ...
s as a way for the user to suspend the currently executing interactive process. The suspended process can then be resumed in ''foreground'' (interactive) mode, or be made to resume execution in '' background'' mode, or be terminated. When entered by a user at their computer terminal, the currently running foreground process is sent a "terminal stop" ( SIGTSTP) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command ( fg) or the " background" command ( bg). The Unicode Security Considerations report recommends this character as a safe replacement for unmappable characters during character set conversion. In many GUIs and applications, ( on
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
) can be used to undo the last action. In many applications, earlier actions than the last one can also be undone by pressing multiple times. was one of a handful of keyboard sequences chosen by the program designers at Xerox PARC to control text editing.


Representation

ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
and
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
representation of "substitute": * Octal code: 32 * Decimal code: 26 * Hexadecimal code: 1A, U+001A * Mnemonic symbol: SUB * Binary value: 11010


See also

* C0 and C1 control codes (
ISO 646 ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
) * U+FFFD (Unicode replacement character �) * Access key * Control-C * Control-G * Control-V * Control-X * Control-\ * Keyboard shortcut * List of file signatures * , a symbol (sometimes called by the slang term ''tofu'') used to represent a missing character **
Noto fonts Noto is a free font family comprising over 100 individual computer fonts, which are together designed to cover all the scripts encoded in the Unicode standard. , Noto covers around 1,000 languages and 162 writing systems. , Noto fonts cover a ...
, a Google project to eliminate missing characters


References

{{reflist, refs= {{cite book , title=CP/M 2.0 Interface Guide , chapter=2. Operating System Call Conventions , date=1979 , edition=1 , publisher=
Digital Research Digital Research, Inc. (DR or DRI) was a privately held American software company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser ...
, location=Pacific Grove, California, USA , page=5 , url=http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf , access-date=2020-02-28 , url-status=live , archive-url=https://web.archive.org/web/20200228175812/http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf , archive-date=2020-02-28 , quote= ..The end of an
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
file is denoted by a control-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. ..} (56 pages)
{{cite book , title=Osborne CP/M User Guide - For All CP/M Users , chapter=3. CP/M Transient Commands , author-first=Thom , author-last=Hogan , publisher= A. Osborne/McGraw-Hill , date=1982 , edition=2 , location=Berkeley, California, USA , isbn=0-931988-82-9 , pag
74
, url=https://archive.org/details/osborne-cpm-users-guide_2nd-ed , access-date=2020-02-28 , quote= .. CP/M marks the end of an
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
file by placing a CONTROL-z character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the end-of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. ..

https://archive.org/download/osborne-cpm-users-guide_2nd-ed/OsborneCpmUsersGuideSecondEdition.pdf]
{{cite book , title=PDP-6 Multiprogramming System Manual , chapter=Table of IO Device Characteristics - Console or Teletypewriters , id=DEC-6-0-EX-SYS-UM-IP-PRE00 , publisher=
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
(DEC) , publication-place=Maynard, Massachusetts, USA , date=1965 , page=43 , url=http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf , access-date=2014-07-10 , url-status=live , archive-url=https://web.archive.org/web/20140714140253/http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf , archive-date=2014-07-14 (1+84+10 pages)
{{cite book , title=PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors , volume=3 , chapter=5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line) , publisher=
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
(DEC) , date=1969 , pages=5-3 – 5-6 -5 (431), url=http://bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf , access-date=2014-07-10 , url-status=live , archive-url=https://web.archive.org/web/20111115083418/http://www.bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf , archive-date=2011-11-15 (207 pages)
{{cite web , title=Keyboard shortcuts for Windows , work=Microsoft Support , publisher=
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, url=http://support.microsoft.com/kb/126449 , access-date=2012-06-02
{{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 1.4 disc formats , url=http://www.seasip.info/Cpm/format14.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201114231913/http://www.seasip.info/Cpm/format14.html , archive-date=2020-11-14 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 2.2 disc formats , url=http://www.seasip.info/Cpm/format22.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201105204828/http://www.seasip.info/Cpm/format22.html , archive-date=2020-11-05 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 3.1 disc formats , url=http://www.seasip.info/Cpm/format31.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20211026154048/https://www.seasip.info/Cpm/format31.html , archive-date=2021-10-26 {{cite web , author-first=John C. , author-last=Elliott , date=1998 , title=CP/M 4.1 disc formats , url=http://www.seasip.info/Cpm/format41.html , access-date=2021-11-18 , url-status=live , archive-url=https://web.archive.org/web/20201105174304/http://www.seasip.info/Cpm/format41.html , archive-date=2020-11-05 {{cite web , title=Quick Reference: Unix Commands , work=IT Connect , publisher=
University of Washington The University of Washington (UW and informally U-Dub or U Dub) is a public research university in Seattle, Washington, United States. Founded in 1861, the University of Washington is one of the oldest universities on the West Coast of the Uni ...
, url=http://www.washington.edu/computing/unix/unixqr.html , access-date=2012-06-02
CSV-1203 format specification
{{Webarchive, url=http://arquivo.pt/wayback/20160516100434/http://www.mastpoint.com/csv-1203 , date=2016-05-16
Unicode Security Considerations report
/ref>


Further reading

* Federal Standard 1037C Control characters