In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, a p-code machine (portable code machine) is a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
designed to execute ''p-code'' (the
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
or
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
of a hypothetical
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
(CPU)). This term is applied both generically to all such machines (such as the
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describe ...
(JVM) and
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
precompiled code), and to specific implementations, the most famous being the p-Machine of the
Pascal-P system, particularly the
UCSD Pascal
UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (U ...
implementation, among whose developers, the ''p'' in ''p-code'' was construed to mean ''pseudo'' more often than ''portable'', thus ''pseudo-code'' meaning instructions for a pseudo-machine.
Although the concept was first implemented circa 1966—as
O-code for the Basic Combined Programming Language (
BCPL
BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still ...
) and P code for the language
Euler
Leonhard Euler ( , ; 15 April 170718 September 1783) was a Swiss mathematician, physicist, astronomer, geographer, logician and engineer who founded the studies of graph theory and topology and made pioneering and influential discoveries in ...
—the ''term'' p-code first appeared in the early 1970s. Two early
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s generating p-code were the Pascal-P compiler in 1973, by Kesav V. Nori, Urs Ammann, Kathleen Jensen, Hans-Heinrich Nägeli, and Christian Jacobi,
and the
Pascal-S compiler in 1975, by
Niklaus Wirth.
Programs that have been translated to p-code can either be
interpreted by a software program that emulates the behavior of the hypothetical CPU, or
translated
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
into the machine code of the CPU on which the program is to run and then executed. If there is sufficient commercial interest, a hardware implementation of the CPU specification may be built (e.g., the
Pascal MicroEngine Pascal MicroEngine is a series of microcomputer products manufactured by Western Digital from 1979 through the mid-1980s, designed specifically to run the UCSD p-System efficiently. Compared to other microcomputers, which use a machine language ...
or a version of a
Java processor).
Benefits and weaknesses of implementing p-code
Compared to direct translation into native
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
, a two-stage approach involving translation into p-code and execution by
interpreting
Interpreting is a translational activity in which one produces a first and final target-language output on the basis of a one-time exposure to an expression in a source language.
The most common two modes of interpreting are simultaneous interp ...
or
just-in-time compilation (JIT) offers several advantages.
* It is much easier to write a small p-code interpreter for a new machine than it is to modify a compiler to generate native code for the same machine.
* Generating machine code is one of the more complicated parts of writing a compiler. By comparison, generating p-code is much easier because no machine-dependent behavior must be considered in generating the
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
. This makes it useful for getting a compiler up and running quickly.
* Since p-code is based on an ideal virtual machine, a p-code program is often much smaller than the same program translated to machine code.
* When the p-code is interpreted, the interpreter can apply additional
run-time checks that are difficult to implement with native code.
One of the significant disadvantages of p-code is execution speed, which can sometimes be remedied via JIT compiling. P-code is often also easier to
reverse-engineer
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomp ...
than native code.
In the early 1980s, at least two
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s achieved
machine independence through extensive use of p-code. The
Business Operating System (BOS) was a cross-platform operating system designed to run p-code programs exclusively. The
UCSD p-System
UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (U ...
, developed at The University of California, San Diego, was a self-compiling and
self-hosting operating system based on p-code optimized for generation by the
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Frenc ...
language.
In the 1990s, translation into p-code became a popular strategy for implementations of languages such as
Python,
Microsoft P-Code in
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic ( ...
, and
Java bytecode
In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming langua ...
in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
.
The language
Go uses a generic, portable assembly as a form of p-code, implemented by
Ken Thompson
Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programmi ...
as an extension of the work on
Plan 9 from Bell Labs
Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has be ...
. Unlike
Common Language Runtime
The Common Language Runtime (CLR), the virtual machine component of Microsoft .NET Framework, manages the execution of .NET programs. Just-in-time compilation converts the managed code (compiled intermediate language code) into machine instru ...
(CLR) bytecode or JVM bytecode, there is no stable specification, and the Go build tools do not emit a bytecode format to be used at a later time. The Go assembler uses the generic assembly language as an
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
, and Go executables are machine-specific
statically linked
A stand-alone program, also known as a freestanding program, is a computer program that does not load any external module, library function or program and that is designed to boot with the bootstrap procedure of the target processor – it runs o ...
binaries.
UCSD p-Machine
Architecture
Like many other p-code machines, the UCSD p-Machine is a
stack machine
In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down ...
, which means that most instructions take their operands from a
stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
, and place results back on the stack. Thus, the
add
instruction replaces the two topmost elements of the stack with their sum. A few instructions take an immediate argument. Like Pascal, the p-code is
strongly typed
In computer programming, one of the many ways that programming languages are colloquially classified is whether the language's type system makes it strongly typed or weakly typed (loosely typed). However, there is no precise technical definition ...
, supporting boolean (b), character (c), integer (i), real (r), set (s), and pointer (a)
data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s natively.
Some simple instructions:
Insn. Stack Stack Description
before after
adi i1 i2 i1+i2 add two integers
adr r1 r2 r1+r2 add two reals
inn i1 s1 is1 set membership; b1 = whether i1 is a member of s1
ldi i1 i1 i1 load integer constant
mov a1 a2 a2 move
not b1 b1 -b1 boolean negation
Environment
Unlike other stack-based environments (such as
Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
and the
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describe ...
) but very similar to a real target CPU, the p-System has only one stack shared by procedure stack frames (providing
return address, etc.) and the arguments to local instructions. Three of the machine's
registers point into the stack (which grows upwards):
* SP points to the top of the stack (the
stack pointer).
* MP marks the beginning of the active stack frame (the
mark pointer).
* EP points to the highest stack location used in the current procedure (the
extreme pointer).
Also present is a constant area, and, below that, the
heap growing down towards the stack. The NP (the
new pointer) register points to the top (lowest used address) of the heap. When EP gets greater than NP, the machine's memory is exhausted.
The fifth register, PC, points at the current instruction in the code area.
Calling conventions
Stack frames look like this:
EP ->
local stack
SP -> ...
locals
...
parameters
...
return address (previous PC)
previous EP
dynamic link (previous MP)
static link (MP of surrounding procedure)
MP -> function return value
The procedure calling sequence works as follows: The call is introduced with
mst n
where
n
specifies the difference in nesting levels (remember that Pascal supports nested procedures). This instruction will ''mark'' the stack, i.e. reserve the first five cells of the above stack frame, and initialise previous EP, dynamic, and static link. The caller then computes and pushes any parameters for the procedure, and then issues
cup n, p
to call a user procedure (
n
being the number of parameters,
p
the procedure's address). This will save the PC in the return address cell, and set the procedure's address as the new PC.
User procedures begin with the two instructions
ent 1, i
ent 2, j
The first sets SP to MP +
i
, the second sets EP to SP +
j
. So
i
essentially specifies the space reserved for locals (plus the number of parameters plus 5), and
j
gives the number of entries needed locally for the stack. Memory exhaustion is checked at this point.
Returning to the caller is accomplished via
retC
with
C
giving the return type (i, r, c, b, a as above, and p for no return value). The return value has to be stored in the appropriate cell previously. On all types except p, returning will leave this value on the stack.
Instead of calling a user procedure (cup), standard procedure
q
can be called with
csp q
These standard procedures are Pascal procedures like
readln()
(
csp rln
),
sin()
(
csp sin
), etc. Peculiarly
eof()
is a p-Code instruction instead.
Example machine
Niklaus Wirth specified a simple p-code machine in the 1976 book ''
Algorithms + Data Structures = Programs''. The machine had 3 registers - a
program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, i ...
''p'', a
base register ''b'', and a
top-of-stack register ''t''. There were 8 instructions:
# lit 0, ''a'' : load constant ''a''
# opr 0, ''a'' : execute operation ''a'' (13 operations: RETURN, 5 math functions, and 7 comparison functions)
# lod ''l'', ''a'' : load variable ''l,a''
# sto ''l'', ''a'' : store variable ''l,a''
# cal ''l'', ''a'' : call procedure ''a'' at level ''l''
# int 0, ''a'' : increment t-register by ''a''
# jmp 0, ''a'' : jump to ''a''
# jpc 0, ''a'' : jump conditional to ''a''
This is the code for the machine, written in Pascal:
const
amax=2047;
levmax=3;
cxmax=200;
type
fct=(lit,opr,lod,sto,cal,int,jmp,jpc);
instruction=packed record
f:fct;
l:0..levmax;
a:0..amax;
end;
var
code: array ..cxmaxof instruction;
procedure interpret;
const stacksize = 500;
var
p, b, t: integer;
i: instruction;
s: array ..stacksizeof integer;
function base(l: integer): integer;
var b1: integer;
begin
b1 := b;
while l > 0 do begin
b1 := s 1
l := l - 1
end;
base := b1
end ;
begin
writeln(' start pl/0');
t := 0; b := 1; p := 0;
s := 0; s := 0; s := 0;
repeat
i := code p := p + 1;
with i do
case f of
lit: begin t := t + 1; s := a end;
opr:
case a of
0:
begin
t := b - 1; p := s + 3 b := s + 2
end;
1: s := -s
2: begin t := t - 1; s := s + s + 1end;
3: begin t := t - 1; s := s - s + 1end;
4: begin t := t - 1; s := s * s + 1end;
5: begin t := t - 1; s := s div s + 1end;
6: s := ord(odd(s );
8: begin t := t - 1; s := ord(s = s + 1 end;
9: begin t := t - 1; s := ord(s <> s + 1 end;
10: begin t := t - 1; s := ord(s < s + 1 end;
11: begin t := t - 1; s := ord(s >= s + 1 end;
12: begin t := t - 1; s := ord(s > s + 1 end;
13: begin t := t - 1; s := ord(s <= s + 1 end;
end;
lod: begin t := t + 1; s := s ase(l) + aend;
sto: begin s ase(l)+a:= s writeln(s ; t := t - 1 end;
cal:
begin
s + 1:= base(l); s + 2:= b; s + 3:= p;
b := t + 1; p := a
end;
int: t := t + a;
jmp: p := a;
jpc: begin if s = 0 then p := a; t := t - 1 end
end
until p = 0;
writeln(' end pl/0');
end ;
This machine was used to run Wirth's
PL/0
PL/0 is a programming language, intended as an educational programming language, that is similar to but much simpler than Pascal, a general-purpose programming language. It serves as an example of how to construct a compiler. It was originally intr ...
, a Pascal subset compiler used to teach compiler development.
Microsoft P-Code
P-Code is a name for several of
Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
's proprietary
intermediate language
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A " ...
s. They provided an alternate binary format to
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
. At various times, Microsoft have said p-code is an abbreviation for either ''packed code'' or ''pseudo code''.
Microsoft p-code was used in
Visual C++
Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tri ...
and
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic ( ...
. Like other p-code implementations, Microsoft p-code enabled a more compact
executable
In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data fil ...
at the expense of slower execution.
Other implementations
See also
*
Bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
*
Intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
*
Joel McCormack, designer of the NCR Corporation version of the p-code machine
*
Runtime system
*
Token threading
References
Further reading
*
* (NB. Has Pascal sources of the
P4 compiler and interpreter, usage instructions.)
* (NB. Has the p-code of the
P4 compiler, generated by itself.)
*
* , including packaging and pre-compiled binaries; a friendly fork of the
*
*
*
*
* (NB. Especially see the articles ''Pascal-P Implementation Notes'' and ''Pascal-S: A Subset and its Implementation''.)
External links
*
{{DEFAULTSORT:P-Code Machine
Stack-based virtual machines
Pascal (programming language)
*
Programming language implementation
Articles with example Pascal code