Left-to-right mark
   HOME

TheInfoList



OR:

The left-to-right mark (LRM) is a
control character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
(an invisible formatting character) used in computerized
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other symbols).Dictionary.com Unabridged. Random ...
(including
word processing A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
in a program like
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
) of text containing a mix of left-to-right scripts (such as
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
and Cyrillic) and right-to-left scripts (such as
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
,
Syriac Syriac may refer to: *Syriac language, an ancient dialect of Middle Aramaic *Sureth, one of the modern dialects of Syriac spoken in the Nineveh Plains region * Syriac alphabet ** Syriac (Unicode block) ** Syriac Supplement * Neo-Aramaic languages a ...
, and
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
). It is used to set the way adjacent characters are grouped with respect to text direction.


Unicode

In
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
, the LRM character is encoded at . In
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
it is E2 80 8E. Usage is prescribed in the Unicode Bidi (bidirectional) algorithm.Unicode 12.0 standard, http://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf, p. 880


Example of use in HTML

Suppose the writer wishes to use some English text (a left-to-right script) into a paragraph written in Arabic or Hebrew (a right-to-left script) with non-alphabetic characters to the right of the English text. For example, the writer wants to translate, "The language C++ is a programming language used..." into Arabic. Without an LRM control character, the result looks like this: لغة C++ هي لغة برمجة تستخدم... With an LRM entered in the HTML after the ++, it looks like this, as the writer intends: لغة C++‎ هي لغة برمجة تستخدم... In the first example, without an LRM control character, a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
will render the ++ on the left of the "C" because the browser recognizes that the paragraph is in a right-to-left text (
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
) and applies punctuation, which is neutral as to its direction, according to the direction of the adjacent text. The LRM control character causes the punctuation to be adjacent to only left-to-right text – the "C" and the LRM – and position as if it were in left-to-right text, i.e., to the right of the preceding text. Some software requires using the
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
code ‎ or ‎ instead of the invisible Unicode control character itself . Using the invisible control character directly could also make copy editing difficult.


See also

*
Right-to-left mark ‏The right-to-left mark (RLM) is a non-printing character used in the computerized typesetting of bi-directional text containing a mix of left-to-right scripts (such as Latin and Cyrillic) and right-to-left scripts (such as Arabic, Syriac, an ...
*
Bidirectional text A bidirectional text contains two text directionalities, right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text direction in ea ...


References


External links


Unicode standard annex #9: The bidirectional algorithm


{{Unicode navigation Control characters Digital typography Unicode formatting code points