In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, threaded code is a programming technique where the
code has a form that essentially consists entirely of calls to
subroutine
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions ma ...
s. It is often used in
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, which may generate code in that form or be implemented in that form themselves. The code may be processed by an
interpreter or it may simply be a sequence of
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
call instructions.
Threaded code has better
density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematicall ...
than code generated by alternative generation techniques and by alternative
calling convention
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have b ...
s. In
cached architectures, it may
execute
Execute, in capital punishment
Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule- ...
slightly slower. However, a program that is small enough to fit in a
computer processor
In computing and computer science, a processor or processing unit is an electrical component (digital circuit) that performs operations on an external data source, usually memory or some other data stream. It typically takes the form of a micropr ...
's
cache
Cache, caching, or caché may refer to:
Places United States
* Cache, Idaho, an unincorporated community
* Cache, Illinois, an unincorporated community
* Cache, Oklahoma, a city in Comanche County
* Cache, Utah, Cache County, Utah
* Cache Coun ...
may run faster than a larger program that suffers many
cache miss
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhe ...
es.
Small programs may also be faster at
thread switch
Thread may refer to:
Objects
* Thread (yarn), a kind of thin yarn used for sewing
** Thread (unit of measurement), a cotton yarn measure
* Screw thread, a helical ridge on a cylindrical fastener
Arts and entertainment
* ''Thread'' (film), 201 ...
ing, when other programs have filled the cache.
Threaded code is best known for its use in many compilers of
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s, such as
Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
, many implementations of
BASIC
BASIC (Beginners' All-purpose Symbolic Instruction Code) is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College ...
, some implementations of
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily ...
, early versions of
B, and other languages for small
minicomputer
A minicomputer, or colloquially mini, is a class of smaller general purpose computers that developed in the mid-1960s and sold at a much lower price than mainframe and mid-size computers from IBM and its direct competitors. In a 1970 survey, ...
s and for
amateur radio satellite
An amateur radio satellite is an artificial satellite built and used by amateur radio operators. It forms part of the Amateur-satellite service. These satellites use amateur radio frequency allocations to facilitate communication between amate ...
s.
History
The common way to make computer programs is to use a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
to translate
source code
In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...
(written in some
symbolic language) to
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
. The resulting
executable
In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data fil ...
is typically fast but, because it is specific to a
hardware
Hardware may refer to:
Technology Computing and electronics
* Electronic hardware, interconnected electronic components which perform analog or logic operations
** Digital electronics, electronics that operate on digital signals
*** Computer hard ...
platform, it isn't portable. A different approach is to generate
instructions for a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
and to use an
interpreter on each hardware platform. The interpreter instantiates the virtual machine environment and executes the instructions. Thus, only the interpreter must be compiled.
Early computers had relatively little memory. For example, most
Data General Nova
The Data General Nova is a series of 16-bit minicomputers released by the American company Data General. The Nova family was very popular in the 1970s and ultimately sold tens of thousands of units.
The first model, known simply as "Nova", was ...
,
IBM 1130
The IBM 1130 Computing System, introduced in 1965, was IBM's least expensive computer at that time. A binary 16-bit machine, it was marketed to price-sensitive, computing-intensive technical markets, like education and engineering, succeeding t ...
, and many of the first
microcomputer
A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (P ...
s had only 4 kB of RAM installed. Consequently, a lot of time was spent trying to find ways to reduce a program's size, to fit in the available memory.
One solution is to use an interpreter which reads the symbolic language a bit at a time, and calls functions to perform the actions. As the source code is typically much
denser than the resulting machine code, this can reduce overall memory use. This was the reason
Microsoft BASIC
Microsoft BASIC is the foundation software product of the Microsoft company and evolved into a line of BASIC interpreters and compiler(s) adapted for many different microcomputers. It first appeared in 1975 as Altair BASIC, which was the first ...
is an interpreter: its own code had to share the 4 kB memory of machines like the
Altair 8800
The Altair 8800 is a microcomputer designed in 1974 by MITS and based on the Intel 8080 CPU. Interest grew quickly after it was featured on the cover of the January 1975 issue of Popular Electronics and was sold by mail order through advertisemen ...
with the user's source code. A compiler translates from a source language to machine code, so the compiler, source, and output must all be in memory at the same time. In an interpreter, there is no output.
Threaded code is a formatting style for compiled code that minimizes memory use. Instead of writing out every step of an operation at its every occurrence in the program, as was common in
macro assembler
Macro (or MACRO) may refer to:
Science and technology
* Macroscopic, subjects visible to the eye
* Macro photography, a type of close-up photography
* Image macro, a picture with text superimposed
* Monopole, Astrophysics and Cosmic Ray Observa ...
s for instance, the compiler writes each common bit of code into a subroutine. Thus, each bit exists in only one place in memory (see "
Don't repeat yourself
"Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of software patterns, replacing it with abstractions or using data normalization to avoid redundancy.
The DRY principle is stated as "Every piece o ...
"). The top-level application in these programs may consist of nothing but subroutine calls. Many of these subroutines, in turn, also consist of nothing but lower-level subroutine calls.
Mainframes and some early microprocessors such as the
RCA 1802
The COSMAC (Complementary Symmetry Monolithic Array Computer) is an 8-bit microprocessor family introduced by RCA. It is historically notable as the first CMOS microprocessor. The first production model was the two-chip CDP1801R and CDP1801U, w ...
required several instructions to call a subroutine. In the top-level application and in many subroutines, that sequence is constantly repeated, with only the subroutine address changing from one call to the next. This means that a program consisting of many function calls may have considerable amounts of repeated code as well.
To address this, threaded code systems used pseudo-code to represent function calls in a single operator. At run time, a tiny "interpreter" would scan over the top-level code, extract the subroutine's address in memory, and call it. In other systems, this same basic concept is implemented as a
branch table
In computer programming, a branch table or jump table is a method of transferring program control ( branching) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch or jump instruction ...
,
dispatch table
In computer science, a dispatch table is a table of pointers or memory addresses to functions or methods. Use of such a table is a common technique when implementing late binding in object-oriented programming.
Perl implementation
The followin ...
, or
virtual method table
In computer programming, a virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding).
Whe ...
, all of which consist of a table of subroutine addresses.
During the 1970s, hardware designers spent considerable effort to make subroutine calls faster and simpler. On the improved designs, only a single instruction is expended to call a subroutine, so the use of a pseudo-instruction saves no room. Additionally, the performance of these calls is almost free of additional overhead. Today, though almost all programming languages focus on isolating code into subroutines, they do so for code clarity and maintainability, not to save space.
Threaded code systems save room by replacing that list of function calls, where only the subroutine address changes from one call to the next, with a list of execution tokens, which are essentially function calls with the call opcode(s) stripped off, leaving behind only a list of addresses.
[
Steve Heller]
"Efficient C/C++ Programming: Smaller, Faster, Better"
2014.
Chapter 5: "Do you need an interpreter?"
p. 195.
Over the years, programmers have created many variations on that "interpreter" or "small selector". The particular address in the list of addresses may be extracted using an index,
general purpose register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
or
pointer
Pointer may refer to:
Places
* Pointer, Kentucky
* Pointers, New Jersey
* Pointers Airport, Wasco County, Oregon, United States
* The Pointers, a pair of rocks off Antarctica
People with the name
* Pointer (surname), a surname (including a list ...
. The addresses may be direct or indirect, contiguous or non-contiguous (linked by pointers), relative or absolute, resolved at compile time or dynamically built. No single variation is "best" for all situations.
Development
To save space, programmers squeezed the lists of subroutine calls into simple lists of subroutine addresses, and used a small loop to call each subroutine in turn. For example, the following pseudocode uses this technique to add two numbers A and B. In the example, the list is labeled thread and a variable ip (Instruction Pointer) tracks our place within the list. Another variable sp (Stack Pointer) contains an address elsewhere in memory that is available to hold a value temporarily.
start:
ip = &thread // points to the address '&pushA', not the textual label 'thread'
top:
jump *ip++ // follow ip to address in thread, follow that address to subroutine, advance ip
thread:
&pushA
&pushB
&add
...
pushA:
*sp++ = A // follow sp to available memory, store A there, advance sp to next
jump top
pushB:
*sp++ = B
jump top
add:
addend1 = *--sp // Pop the top value off the stack
addend2 = *--sp // Pop second value off the stack
*sp++ = addend1 + addend2 // Add the two values together and store the result on the top of the stack
jump top
The calling loop at
top
is so simple that it can be repeated inline at the end of each subroutine. Control now jumps once, from the end of a subroutine to the start of another, instead of jumping twice via
top
. For example:
start:
ip = &thread // ip points to &pushA (which points to the first instruction of pushA)
jump *ip++ // send control to first instruction of pushA and advance ip to &pushB
thread:
&pushA
&pushB
&add
...
pushA:
*sp++ = A // follow sp to available memory, store A there, advance sp to next
jump *ip++ // send control where ip says to (i.e. to pushB) and advance ip
pushB:
*sp++ = B
jump *ip++
add:
addend1 = *--sp // Pop the top value off the stack
addend2 = *--sp // Pop second value off the stack
*sp++ = addend1 + addend2 // Add the two values together and store the result on top of the stack
jump *ip++
This is called direct threaded code (DTC). Although the technique is older, the first widely circulated use of the term "threaded code" is probably James R. Bell's 1973 article "Threaded Code".
In 1970,
Charles H. Moore
Charles Havice Moore II (born 9 September 1938), better known as Chuck Moore, is an American computer engineer and programmer, best known for inventing the Forth programming language in 1968. He cofounded FORTH, Inc., with Elizabeth Rather in 1 ...
invented a more compact arrangement, indirect threaded code (ITC), for his Forth virtual machine. Moore arrived at this arrangement because
Nova
A nova (plural novae or novas) is a transient astronomical event that causes the sudden appearance of a bright, apparently "new" star (hence the name "nova", which is Latin for "new") that slowly fades over weeks or months. Causes of the dramati ...
minicomputers had an
indirection bit
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions i ...
in every address, which made ITC easy and fast. Later, he said that he found it so convenient that he propagated it into all later Forth designs.
Today, some Forth compilers generate direct-threaded code while others generate indirect-threaded code. The executables act the same either way.
Threading models
Practically all executable threaded code uses one or another of these methods for invoking subroutines (each method is called a "threading model").
Direct threading
Addresses in the thread are the addresses of machine language. This form is simple, but may have overheads because the thread consists only of machine addresses, so all further parameters must be loaded indirectly from memory. Some Forth systems produce direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below).
An example of a
stack machine
In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down s ...
might execute the sequence "push A, push B, add". That might be translated to the following thread and routines, where
ip
is initialized to the address labeled
thread
(i.e., the address where
&pushA
is stored).
#define PUSH(x) (*sp++ = (x))
#define POP() (*--sp)
start:
ip = &thread // ip points to &pushA (which points to the first instruction of pushA)
jump *ip++ // send control to first instruction of pushA and advance ip to &pushB
thread:
&pushA
&pushB
&add
...
pushA:
PUSH(A)
jump *ip++ // send control where ip says to (i.e. to pushB) and advance ip
pushB:
PUSH(B)
jump *ip++
add:
result = POP() + POP()
PUSH(result)
jump *ip++
Alternatively, operands may be included in the thread. This can remove some indirection needed above, but makes the thread larger:
#define PUSH(x) (*sp++ = (x))
#define POP() (*--sp)
start:
ip = &thread
jump *ip++
thread:
&push
&A // address where A is stored, not literal A
&push
&B
&add
...
push:
variable_address = *ip++ // must move ip past operand address, since it is not a subroutine address
PUSH(*variable_address) // Read value from variable and push on stack
jump *ip++
add:
result = POP() + POP()
PUSH(result)
jump *ip++
Indirect threading
Indirect threading uses pointers to locations that in turn point to machine code. The indirect pointer may be followed by operands which are stored in the indirect "block" rather than storing them repeatedly in the thread. Thus, indirect code is often more compact than direct-threaded code. The indirection typically makes it slower, though usually still faster than bytecode interpreters. Where the handler operands include both values and types, the space savings over direct-threaded code may be significant. Older FORTH systems typically produce indirect-threaded code.
For example, if the goal is to execute "push A, push B, add", the following might be used. Here,
ip
is initialized to address
&thread
, each code fragment (
push
,
add
) is found by double-indirecting through
ip
and an indirect block; and any operands to the fragment are found in the indirect block following the fragment's address. This requires keeping the ''current'' subroutine in
ip
, unlike all previous examples where it contained the ''next'' subroutine to be called.
start:
ip = &thread // points to '&i_pushA'
jump *(*ip) // follow pointers to 1st instruction of 'push', DO NOT advance ip yet
thread:
&i_pushA
&i_pushB
&i_add
...
i_pushA:
&push
&A
i_pushB:
&push
&B
i_add:
&add
push:
*sp++ = *(*ip + 1) // look 1 past start of indirect block for operand address
jump *(*++ip) // advance ip in thread, jump through next indirect block to next subroutine
add:
addend1 = *--sp
addend2 = *--sp
*sp++ = addend1 + addend2
jump *(*++ip)
Subroutine threading
So-called "subroutine-threaded code" (also "call-threaded code") consists of a series of machine-language "call" instructions (or addresses of functions to "call", as opposed to direct threading's use of "jump"). Early compilers for
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...
, Fortran, Cobol and some Forth systems often produced subroutine-threaded code. The code in many of these systems operated on a last-in-first-out (LIFO) stack of operands, for which compiler theory was well-developed. Most modern processors have special hardware support for subroutine "call" and "return" instructions, so the overhead of one extra machine instruction per dispatch is somewhat diminished.
Anton Ertl, the
Gforth
Gforth is a free and portable implementation of the Forth (programming language), Forth programming language for Unix-like systems, Microsoft Windows, and other operating systems. A primary goal of Gforth is to adhere to the ANS Forth standard. G ...
compiler's co-creator, stated that "in contrast to popular myths, subroutine threading is usually slower than direct threading". However, Ertl's most recent tests
show that subroutine threading is faster than direct threading in 15 out of 25 test cases. More specifically, he found that direct threading is the fastest threading model on Xeon, Opteron, and Athlon processors, indirect threading is fastest on Pentium M processors, and subroutine threading is fastest on Pentium 4, Pentium III, and PPC processors.
As an example of call threading for "push A, push B, add":
thread:
call pushA
call pushB
call add
ret
pushA:
*sp++ = A
ret
pushB:
*sp++ = B
ret
add:
addend1 = *--sp
addend2 = *--sp
*sp++ = addend1 + addend2
ret
Token threading
Token-threaded code implements the thread as a list of indices into a table of operations; the index width is naturally chosen to be as small as possible for density and efficiency. 1 byte / 8-bits is the natural choice for ease of programming, but smaller sizes like 4-bits, or larger like 12 or 16 bits, can be used depending on the number of operations supported. As long as the index width is chosen to be narrower than a machine pointer, it will naturally be more compact than the other threading types without much special effort by the programmer. It is usually half to three-fourths the size of other threadings, which are themselves a quarter to an eighth the size of non-threaded code. The table's pointers can either be indirect or direct. Some Forth compilers produce token-threaded code. Some programmers consider the "
p-code
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
" generated by some
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Frenc ...
compilers, as well as the
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
s used by
.NET,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
, BASIC and some
C compilers, to be token-threading.
A common approach, historically, is
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
, which typically uses 8-bit opcodes with a stack-based virtual machine. The archetypal bytecode
interpreter is known as a "decode and dispatch interpreter" and follows the form:
start:
vpc = &thread
dispatch:
addr = decode(&vpc) // Convert the next bytecode operation to a pointer to machine code that implements it
// Any inter-instruction operations are performed here (e.g. updating global state, event processing, etc)
jump addr
CODE_PTR decode(BYTE_CODE **p)
thread: /* Contains bytecode, not machine addresses. Hence it is more compact. */
1 /*pushA*/
2 /*pushB*/
0 /*add*/
table:
&add /* table = address of machine code that implements bytecode 0 */
&pushA /* table ... */
&pushB /* table ... */
pushA:
*sp++ = A
jump dispatch
pushB:
*sp++ = B
jump dispatch
add:
addend1 = *--sp
addend2 = *--sp
*sp++ = addend1 + addend2
jump dispatch
If the virtual machine uses only byte-size instructions,
decode()
is simply a fetch from
thread
, but often there are commonly used 1-byte instructions plus some less-common multibyte instructions (see
complex instruction set computer
A complex instruction set computer (CISC ) is a computer architecture in which single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) or are capable of multi-step o ...
), in which case
decode()
is more complex. The decoding of single byte opcodes can be very simply and efficiently handled by a branch table using the opcode directly as an index.
For instructions where the individual operations are simple, such as "push" and "add", the
overhead involved in deciding what to execute is larger than the cost of actually executing it, so such interpreters are often much slower than machine code. However, for more complex ("compound") instructions, the overhead percentage is proportionally less significant.
There are times when token-threaded code can sometimes run faster than the equivalent machine code when that machine code ends up being too large to fit in the physical CPU's L1 instruction cache. The higher
code density
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
of threaded code, especially token-threaded code, can allow it to fit entirely in the L1 cache when it otherwise would not have, thereby avoiding cache thrashing. However, threaded code consumes both instruction cache (for the implementation of each operation) as well as data cache (for the bytecode and tables) unlike machine code which only consumes instruction cache; this means threaded code will eat into the budget for the amount of data that can be held for processing by the CPU at any given time. In any case, if the problem being computed involves applying a large number of operations to a small amount of data then using threaded code may be an ideal optimization.
Huffman threading
Huffman threaded code consists of lists of tokens stored as
Huffman code
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algori ...
s. A Huffman code is a variable-length string of bits that identifies a unique token. A Huffman-threaded interpreter locates subroutines using an index table or a tree of pointers that can be navigated by the Huffman code. Huffman-threaded code is one of the most compact representations known for a computer program. The index and codes are chosen by measuring the frequency of calls to each subroutine in the code. Frequent calls are given the shortest codes. Operations with approximately equal frequencies are given codes with nearly equal bit-lengths. Most Huffman-threaded systems have been implemented as direct-threaded Forth systems, and used to pack large amounts of slow-running code into small, cheap
microcontroller
A microcontroller (MCU for ''microcontroller unit'', often also MC, UC, or μC) is a small computer on a single VLSI integrated circuit (IC) chip. A microcontroller contains one or more CPUs ( processor cores) along with memory and programma ...
s. Most published
uses have been in smart cards, toys, calculators, and watches. The bit-oriented tokenized code used in
PBASIC
PBASIC is a microcontroller-based version of BASIC created by Parallax, Inc. in 1992.
PBASIC was created to bring ease of use to the microcontroller and embedded processor world. It is used for writing code for the BASIC Stamp microcontrollers. A ...
can be seen as a kind of Huffman-threaded code.
Lesser-used threading
An example is string threading, in which operations are identified by strings, usually looked up by a hash table. This was used in Charles H. Moore's earliest Forth implementations and in the
University of Illinois
The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Unive ...
's experimental hardware-interpreted computer language. It is also used in
Bashforth.
RPL
HP's
RPL, first introduced in the
HP-18C
The HP-18C was a Hewlett-Packard business calculator which was quickly followed by the very similar but greatly improved HP-19B
HP-19B, introduced on 4 January 1988, along with the HP-17B, HP-27S and the HP-28S, and replaced by the HP-19BII ...
calculator in 1986, is a type of proprietary hybrid direct-threaded and indirect-threaded threaded-interpreted language that, unlike others TILs, allows embedding of RPL "objects" into the "runstream" ie. The stream of addresses through which the interpreter pointer advances. An RPL "object" can be thought of as a special data type whose in-memory structure contains an address to an "object prolog" at the start of the object, and then data or executable code follows. The object prolog determines how the object's body should be executed or processed. Using the "RPL inner loop",
[Busby, Jonathan]
"The RPL inner loop explained"
"The Museum of HP Calculators"
7 September 2018, Retrieved on 27 December 2019 which was invented and published (and patented
) by William C. Wickes in 1986 and published in "Programming Environments", Institute for Applied Forth Research, Inc., 1988, execution follows like so:
# Dereference the IP (instruction pointer) and store it into O (current object pointer)
# Increment the IP by the length of one address pointer
# Dereference O and store its address in O_1 (this is the second level of indirection)
# Transfer control to next pointer or embedded object by setting the PC (program counter) to O_1 plus one address pointer
# Go back to step 1
This can represented more precisely by:
O = I = I + Δ
PC = + Δ
Where above, O is the current object pointer, I is the interpreter pointer, Δ is the length of one address word and the "[]" operator stands for "dereference".
When control is transferred to an object pointer or an embedded object, execution continues as follows:
PROLOG -> PROLOG ( The prolog address at the start of the prolog code points to itself )
IF O + Δ =/= PC
THEN GOTO INDIRECT ( Test for direct execution )
O = I - Δ ( Correct O to point to start of embedded object )
I = I + α ( Correct I to point after embedded object where α is the length of the object )
INDIRECT ( rest of prolog )
On HP's
Saturn
Saturn is the sixth planet from the Sun and the second-largest in the Solar System, after Jupiter. It is a gas giant with an average radius of about nine and a half times that of Earth. It has only one-eighth the average density of Earth; ...
microprocessors that use RPL, there is a third level of indirection made possible by an architectural / programming trick which allows faster execution.
Branches
In all interpreters, a branch simply changes the thread pointer (
ip
) to a different address in the thread. A conditional jump-if-zero branch that jumps only if the top-of-stack value is zero could be implemented as shown below. This example uses the embedded parameter version of direct threading so the
&thread 23/code> line is the destination of where to jump if the condition is true, so it must be skipped (ip++
) over if the branch is not taken.
thread:
...
&brz
&thread 23 ...
brz:
when_true_ip = *ip++ // Get destination address for branch
if (*--sp 0) // Pop/Consume top of stack and check if it's zero
ip = when_true_ip
jump *ip++
Common amenities
Separating the data and return stacks in a machine eliminates a great deal of stack management code, substantially reducing the size of the threaded code. The dual-stack principle originated three times independently: for Burroughs large systems
The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables.E.g., 12-bit syllables for B5000, 8-bit syllables for B6500 The first machine in the family was the B5000 in ...
, Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
, and PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, ...
. It is used in some Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describ ...
s.
Three registers are often present in a threaded virtual machine. Another one exists for passing data between subroutine
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions ma ...
s ('words'). These are:
* ip or i (instruction pointer
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
) of the virtual machine (not to be confused with the program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
of the underlying hardware implementing the VM)
* w (work pointer)
* rp or r (return stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
pointer)
* sp or s (parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
stack pointer for passing parameters between words)
Often, threaded virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
s, such as implementations of Forth, have a simple virtual machine at heart, consisting of three ''primitives''. Those are:
# ''nest'', also called ''docol''
# ''unnest'', or ''semi_s'' (;s)
# ''next''
In an indirect-threaded virtual machine, the one given here, the operations are:
next:
*ip++ -> w
jump **w++
nest:
ip -> *rp++
w -> ip
next
unnest:
*--rp -> ip
next
This is perhaps the simplest and fastest interpreter or virtual machine.
See also
* Continuation-passing style In functional programming, continuation-passing style (CPS) is a style of programming in which control is passed explicitly in the form of a continuation. This is contrasted with direct style, which is the usual style of programming. Gerald Jay Su ...
, which replaces the global variable ip
with a function parameter
* Just-in-time compilation
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compiler, compilation during execution of a program (at run time (program lifecycle phase), run tim ...
* Return-oriented programming
Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.
In this technique, an attacker gains cont ...
: the rediscovery of threaded code in order to exploit remote vulnerable systems.
* Tail recursion
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
Notes
References
External links
*Anton Ertl's explanatory pag
What is Threaded Code?
describes different threading techniques and provides further references.
by Dennis M. Ritchie
Dennis MacAlistair Ritchie (September 9, 1941 – October 12, 2011) was an American computer scientist. He is most well-known for creating the C programming language and, with long-time colleague Ken Thompson, the Unix operating system and B ...
describes B (a precursor of C) as implemented using "threaded code".
Thinking Forth Project
includes the seminal (but out of print) book Thinking Forth b
Leo Brodie
published in 1984.
Starting FORTH
online version of the book Starting FORTH b
Leo Brodie
published in 1981.
*Brad Rodriguez'
covers threading techniques in depth.
* History of general purpose CPUs
The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.
1950s: Early designs
In the early 1950s, each computer design was unique. There were no upward-compatible machines or computer architectures ...
GCC extensions. Labels as Values
* {{cite web , title=What is RPL? , author-first=Joseph K. , author-last=Horn , url=http://www.hpcalc.org/hp48/docs/programming/rpl3.txt , access-date=2017-09-17 , url-status=live , archive-url=https://web.archive.org/web/20170917221524/http://www.hpcalc.org/hp48/docs/programming/rpl3.txt , archive-date=2017-09-17 (NB. Brief overview on the threaded languages, System and User RPL, used on the HP calculators like the HP 48.)
Compilers
Programming language implementation