In
computer language
A computer language is a formal language used to communicate with a computer. Types of computer languages include:
* Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
design, stropping is a method of explicitly marking letter sequences as having a special property, such as being a
keyword, or a certain type of variable or storage location, and thus inhabiting a different namespace from ordinary names ("identifiers"), in order to avoid clashes. Stropping is not used in most modern languages – instead, keywords are
reserved word
In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a r ...
s and cannot be used as identifiers. Stropping allows the same letter sequence to be used both as a keyword and as an
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
, and simplifies
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
in that case – for example allowing a variable named
if
without clashing with the keyword if.
Stropping is primarily associated with
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...
and related languages in the 1960s. Though it finds some
modern use, it is easily confused with other
similar techniques that are superficially similar.
History
The method of stropping and the term "stropping" arose in the development of
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...
in the 1960s, where it was used to represent typographical distinctions (boldface and underline) found in the publication language which could not directly be represented in the hardware language – a typewriter could have bold characters, but in encoding in punch cards, there were no bold characters. The term "stropping" arose in
ALGOL 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
, from "
apostrophe
The apostrophe ( or ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes:
* The marking of the omission of one ...
", as some implementations of ALGOL 60 used apostrophes around text to indicate boldface,
such as
'if'
to represent the keyword if. Stropping is also important in
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously ...
, where multiple methods of stropping, known as "stropping regimes", are used; the original matched apostrophes from ALGOL 60 was not widely used, with a leading period or uppercase being more common,
as in
.IF
or
IF
and the term "stropping" was applied to all of these.
Syntaxes
A range of different syntaxes for stropping have been used:
*
Algol 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
commonly used only the convention of single quotes around the word, generally as apostrophes, whence the name "stropping" (e.g.
'BEGIN'
).
*
Algol 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously ...
in some implementations treat letter sequences prefixed by a single quote,
', as being keywords (e.g.,
'BEGIN
)
In fact it was often the case that several stropping conventions might be in use within one language. For example, in
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously ...
, the choice of stropping convention can be specified by a compiler
directive (in ALGOL terminology, a "
pragmat"), namely POINT, UPPER, QUOTE, or RES:
* POINT for 6-bit (not enough characters for lowercase), as in
.FOR
– a similar convention is used in FORTRAN 77, where LOGICAL keywords are stropped as
.EQ.
etc. (see below)
* UPPER for 7-bit, as in
FOR
– with lowercase used for ordinary identifiers
* QUOTE as in ALGOL 60, as in
'for'
* RES reserved words, as used in modern languages –
for
is reserved and not available to ordinary identifiers
The various rules regimes are a
lexical specification In computer science, a lexical grammar is a formal grammar defining the syntax of tokens. The program is written using characters that are defined by the lexical structure of the language used. The character set is equivalent to the alphabet used ...
for stropped characters, though in some cases these have simple interpretations: in the single apostrophe and dot regimes, the first character is functioning as an
escape character, while in the matched apostrophes regime the apostrophes are functioning as
delimiter
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as ...
s, as in
string literal
A string literal or anonymous string is a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally " bracketed delimiters", as in x = "foo", where "foo" is a string ...
s.
Other examples:
*
Atlas Autocode
Atlas Autocode (AA)Original scans)) is a programming language developed around 1965 at the University of Manchester. A variant of the language ALGOL, it was developed by Tony Brooker and Derrick Morris for the Atlas computer. The AA compiler was ...
had the choice of three: keywords could be
underlined
using backspace and overstrike on a
Flexowriter keyboard, they could be introduced by a
%percent %symbol
, or they could be typed in
UPPER CASE
with no delimiting character ("uppercasedelimiters" mode, in which case all variables had to be in lower case).
*
Algol 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
on the
Elliott 803
The Elliott 803 is a small, medium-speed transistor digital computer which was manufactured by the British company Elliott Brothers in the 1960s. About 211 were built.
History
The 800 series began with the 801, a one-off test machine built in ...
and
Elliott 503
The Elliott 503 was a transistorized computer introduced by Elliott Brothers in 1963. It was software-compatible with the earlier Elliott 803 but was about 70 times faster and a more powerful machine. About 32 units were sold. The basic configu ...
computers used underlining. The Flexowriters (producing punched paper tape) had a non-movement key (underline _) so that typing _b_e_g_i_n produced
begin which was very readable. The vertical bar , was also a non-movement key so that typing , = produced a good approximation to ≠.
* The Kidsgrove compiler for
Algol 60
ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a k ...
on the
English Electric KDF9
KDF9 was an early British 48-bit computer designed and built by English Electric (which in 1968 was merged into International Computers Limited (ICL)). The first machine came into service in 1964 and the last of 29 machines was decommissioned ...
appears to have used at least two other stropping conventions in addition to quotation marks
exclamation marksan
Percent characters
*
ALGOL 68RS
ALGOL 68RS is the second ALGOL 68 compiler written by I. F. Currie and J. D. Morrison, at the Royal Signals and Radar Establishment (RSRE).
Unlike the earlier ALGOL 68-R, it was designed to be portable, and implemented the language of the Revis ...
programs are allowed the use of several stropping variants, even within the one language processor.
*
Edinburgh IMP
Edinburgh IMP is a development of Atlas Autocode, initially developed around 1966-1969 at the University of Edinburgh, Scotland. It is a general-purpose programming language which was used heavily for systems programming.
Expressively, IMP is ...
inherited the Atlas Autocode
%percent %symbol
prefix convention but not its other stropping options
Examples of different ALGOL 68 styles
Note the leading pr (abbreviation of pragmat)
directive, which is itself stropped in POINT or quote style, and the for comment (from "") – see
ALGOL 68: pr & co: Pragmats and Comments for details.
Other languages
For various reasons
Fortran 77 has these "logical" values and operators:
.TRUE.,
.FALSE.,
.EQ.,
.NE.,
.LT.,
.LE.,
.GT.,
.GE.,
.EQV.,
.NEQV.,
.OR.,
.AND.,
.NOT.
.AND.,
.OR. and
.XOR. are also used in combined tests in
IF
and
IFF
statements in
batch file
Batch may refer to:
Food and drink
* Batch (alcohol), an alcoholic fruit beverage
* Batch loaf, a type of bread popular in Ireland
* A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirra ...
s run under
JP Software
4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in Microsoft DOS and Windows. It was written by Rex C. Conn and Tom Rawson and first released in 1989. Compared to the default ...
's command line processors like
4DOS
4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in Microsoft DOS and Windows. It was written by Rex C. Conn and Tom Rawson and first released in 1989. Compared to the default ...
,
4OS2
4OS2 is the OS/2 analogue of 4NT and 4DOS by JP Software, Inc. JP Software discontinued 4OS2, TCMDOS2 and TCMD16, making version 3.0, 2.0, 2.0 the final version of these. The code for 4OS2 has been released, and is maintained, first by Sci ...
, and
4NT / Take Command.
Modern use
Most modern computer languages do not use stropping, with two notable exceptions:
The use of many languages in Microsoft's
.NET Common Language Infrastructure
The Common Language Infrastructure (CLI) is an open specification and technical standard originally developed by Microsoft and standardized by ISO/ IEC (ISO/IEC 23271) and Ecma International (ECMA 335) that describes executable code and a ...
(CLI) requires a way to use variables in a different language that may be keywords in a calling language. This is sometimes done by prefixes, such as
@
in C#, or enclosing the identifier in brackets, in
Visual Basic.NET.
A second major example is in many implementations of
Structured Query Language. In those languages reserved words can be used as column, table, or variable names by lexically delimiting them. The standard specifies enclosing reserved words in double quotes, but in practice the exact mechanism varies by implementation;
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
, for example, allows reserved words to be used in other contexts by enclosing them in backticks, and
Microsoft SQL Server
Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
uses square brackets.
Stropping can also be used in the
Nim programming language. In Nim, a reserved word can be used as an identifier by enclosing it in backticks.
There are other, more minor examples. For example,
Web IDL uses a leading underscore
_
to strop identifiers that otherwise collide with reserved words: the value of the identifier strips this leading underscore, making this stropping, rather than a naming convention.
by the compiler
In a
compiler frontend
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
, originally occurred during an initial
line reconstruction
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
phase, which also eliminated whitespace. This was then followed by
scannerless parsing
In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeli ...
(no tokenization); this was standard in the 1960s, notably for ALGOL. In modern use, is generally done as part of
lexical analysis
In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
. This is clear if one distinguishes the lexer into two phases of scanner and evaluator: the scanner categorizes the stropped sequence into the correct category, and then the evaluator when calculating the value. For example, in a language where an initial underscore is used to strop identifiers to avoid collisions with reserved words, the sequence
_if
would be categorized as an identifier (not as the reserved word
if
) by the scanner, and then the evaluator would give this the value
if
, yielding
(Identifier, if)
as the token type and value.
Similar techniques
A number of similar techniques exist, generally prefixing or suffixing an identifier to indicate different treatment, but the semantics are varied. Strictly speaking, stropping consists of different representations of the same name (value) in different namespaces, and occurs at the tokenization stage. For example, in ALGOL 60 with matched apostrophe stropping,
'if'
is tokenized as (Keyword, if), while
if
is tokenized as (Identifier, if) – same value in different token classes.
Using uppercase for keywords remains in use as a convention for writing grammars for lexing and parsing – tokenizing the reserved word
if
as the token class IF, and then representing an if-then-else clause by the phrase
IF Expression THEN Statement ELSE Statement
where uppercase terms are keywords and capitalized terms are
nonterminal symbol
In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
s in a
production rule (
terminal symbol
In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
s are denoted by lowercase terms, such as
identifier
or
integer
, for an
integer literal In computer science, an integer literal is a kind of literal for an integer whose value is directly represented in source code. For example, in the assignment statement x = 1, the string 1 is an integer literal indicating the value 1, while in the ...
).
Naming conventions
Most loosely, one may use
naming conventions
A naming convention is a convention (generally agreed scheme) for naming things. Conventions differ in their intents, which may include to:
* Allow useful information to be deduced from the names based on regularities. For instance, in Manhattan ...
to avoid clashes, commonly prefixing or suffixing with an underscore, as in
if_
or
_then
. A leading underscore is often used to indicate private members in object-oriented programming.
These names may be interpreted by the compiler and have some effect, though this is generally done at the semantic analysis phase, not the tokenization phase. For example, in Python, a single leading underscore is a weak private indicator, and affects which identifiers are imported on module import, while a double leading underscore (and no more than one trailing underscore) on a class attribute invokes
name mangling
In compiler construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.
It provides a way of e ...
.
Reserved words
While modern languages generally use reserved words rather than stropping to distinguish keywords from identifiers – e.g., making
if
reserved – they also frequently reserve a syntactic class of identifiers as keywords, yielding representations which can be interpreted as a stropping regime, but instead have the semantics of reserved words.
This is most notable in C, where identifiers that begin with an underscore are reserved, though the precise details of what identifiers are reserved at what scope are involved, and leading double underscores are reserved for any use;
similarly in C++ any identifier that ''contains'' a double underscore is reserved for any use, while an identifier that begins with an underscore is reserved in the global space.
Thus one can add a new keyword
foo
using the reserved word
__foo
. While this is superficially similar to stropping, the semantics are different. As a reserved word, the string
__foo
represents the identifier
__foo
in the common identifier namespace. In stropping (by prefixing keywords by
__
), the string
__foo
represents the keyword
foo
in a separate keyword namespace. Thus using reserved words, the tokens for
__foo
and
foo
are (identifier, __foo) and (identifier, foo) – different values in the same category – while in stropping the tokens for
__foo
and
foo
are (keyword, foo) and (identifier, foo) – same values in different categories. These solve the same problem of namespace clashes in a way that is the same for a programmer, but which differs in terms of formal grammar and implementation.
Name mangling
Name mangling
In compiler construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.
It provides a way of e ...
also addresses name clashes by renaming identifiers, but does this much later in compilation, during semantic analysis, not during tokenization. This consists of creating names that include scope and type information, primarily for use by linkers, both to avoid clashes and to include necessary semantic information in the name itself. In these cases the original identifiers may be identical, but the context is different, as in the functions
foo(int x)
versus
foo(char x)
, in both cases having the same identifier
foo
, but different signature. These names might be mangled to
foo_i
and
foo_c
, for instance, to include the type information.
Sigils
A syntactically similar but semantically different phenomenon are
sigils, which instead indicate properties of variables. These are common in
Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
, and various other languages to identify characteristics of variables/constants: Perl to designate the type of variable, Ruby both to distinguish variables from constants and to indicate scope. Note that this affects the ''semantics'' of the variable, not the ''syntax'' of whether it is an identifier or keyword.
Parallels in human language
Stropping is used in computer programming languages to make the
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
's (or more strictly, the
parser
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
's) job easier, i.e. within the capability of the relatively small and slow computers available in early days of computing in the 20th century. However, similar techniques have been commonly used to aid reading comprehension for people too. Some examples are:
* Placing important words in
bold
In typography, emphasis is the strengthening of words in a text with a font in a different style from the rest of the text, to highlight them. It is the equivalent of prosody stress in speech.
Methods and use
The most common methods in ...
,
such as the very first mention of stropping at the head of this page, because defining stropping is the very purpose of the page.
* Formatting new words in ''
italic type
In typography, italic type is a cursive font based on a stylised form of calligraphic handwriting. Owing to the influence from calligraphy, italics normally slant slightly to the right. Italics are a way to emphasise key points in a printed tex ...
'' when they are first introduced in text. This is commonly used in
science fiction
Science fiction (sometimes shortened to Sci-Fi or SF) is a genre of speculative fiction which typically deals with imagination, imaginative and futuristic concepts such as advanced science and technology, space exploration, time travel, Paral ...
and
fantasy
Fantasy is a genre of speculative fiction involving magical elements, typically set in a fictional universe and sometimes inspired by mythology and folklore. Its roots are in oral traditions, which then became fantasy literature and drama ...
when introducing invented plants, foods, creatures; in
travelogue and historical writing when describing unfamiliar foreign words; and so on. Also using a special font, possibly associated with the language in question, for example using a
Gothic
Gothic or Gothics may refer to:
People and languages
*Goths or Gothic people, the ethnonym of a group of East Germanic tribes
**Gothic language, an extinct East Germanic language spoken by the Goths
**Crimean Gothic, the Gothic language spoken b ...
font for
German
German(s) may refer to:
* Germany (of or related to)
**Germania (historical use)
* Germans, citizens of Germany, people of German ancestry, or native speakers of the German language
** For citizens of Germany, see also German nationality law
**Ger ...
words.
* Using a different language, typically
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
or
Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group.
*Greek language, a branch of the Indo-European language family.
**Proto-Greek language, the assumed last common ancestor ...
to signify technical terms. This is similar to using reserved words, but it is usually combined with italic text to aid readability. For example:
** the typical
binomial nomenclature
In taxonomy, binomial nomenclature ("two-term naming system"), also called nomenclature ("two-name naming system") or binary nomenclature, is a formal system of naming species of living things by giving each a name composed of two parts, b ...
or "Latin names" of plants and animals helps the reader to see that "''Erithacus rubecula''" is the special technical name of the
Erithacus rubecula
The European robin (''Erithacus rubecula''), known simply as the robin or robin redbreast in Great Britain & Ireland, is a small insectivorous passerine bird that belongs to the chat (bird), chat subfamily of the Old World flycatcher family. A ...
, in a way that "Red-breasted European thrush" does not.
** many
legal
Law is a set of rules that are created and are law enforcement, enforceable by social or governmental institutions to regulate behavior,Robertson, ''Crimes against humanity'', 90. with its precise definition a matter of longstanding debate. ...
terms where a short Latin phrase refers to a large body of law and precedent, such as ''
habeas corpus
''Habeas corpus'' (; from Medieval Latin, ) is a recourse in law through which a person can report an unlawful detention or imprisonment to a court and request that the court order the custodian of the person, usually a prison official, t ...
'', ''
sub judice
In law, ''sub judice'', Latin for "under a judge", means that a particular case or matter is under trial or being considered by a judge or court. The term may be used synonymously with "the present case" or "the case at bar" by some lawyers.
...
'', ''
in loco parentis
The term ''in loco parentis'', Latin for "in the place of a parent" refers to the legal responsibility of a person or organization to take on some of the functions and responsibilities of a parent.
Originally derived from English common law, ...
''.
** logic and mathematical terms such as ''
QED'', ''
a priori
("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...
'', ''
vice versa
References
Additional references
*
*
{{Latin phrases
V
ca:Locució llatina#V
da:Latinske ord og vendinger#V
fr:Liste de locutions latines#V
id:Daftar frasa Latin#V
it:Locuzioni latine#V
nl:Lijst van Latijnse spreekwoorden en ui ...
''…
* In written
Japanese
Japanese may refer to:
* Something from or related to Japan, an island country in East Asia
* Japanese language, spoken mainly in Japan
* Japanese people, the ethnic group that identifies with Japan through ancestry or culture
** Japanese diaspor ...
, in addition to
Kanji
are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subse ...
characters, the two distinct alphabets (more strictly,
syllabaries
In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or (more frequently) moras which make up words.
A symbol in a syllabary, called a syllabogram, typically represents an (option ...
)
Hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" ori ...
and
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
,
both representing the same set of sounds, are used to distinguish phonetically spelled-out Japanese words from imported foreign words, respectively; Katakana is also used for emphasis, much like ''italics'' in English.
See also
*
Escape character
Notes
References
Further reading
*
* {{citation , author-first=Charles Hodgson , author-last=Lindsey , author-link=Charles Hodgson Lindsey , title=An ISO-Code Representation for ALGOL 68 , journal=
ALGOL Bulletin The ALGOL Bulletin () was a periodical regarding the ALGOL 60 and ALGOL 68 programming languages. It was produced under the auspices of IFIP Working Group 2.1 IFIP Working Group 2.1 on Algorithmic Languages and Calculi is a working group of the Inte ...
, id=AB31.3.6 , publisher=ACM , issue=31 , date=March 1970 , pages=37–60 , url=http://dl.acm.org/citation.cfm?id=1061509
Parsing