HOME

TheInfoList



OR:

Fetching the instruction
opcode In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
s from program
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
well in advance is known as
prefetching Prefetching in computer science is a technique for speeding up fetch operations by beginning a fetch operation whose result is expected to be needed soon. Usually this is before it is ''known'' to be needed, so there is a risk of wasting time by p ...
and it is served by using prefetch input queue (PIQ).The pre-fetched instructions are stored in data structure - namely a queue. The fetching of opcodes well in advance, prior to their need for execution increases the overall efficiency of the
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
boosting its speed. The processor no longer has to wait for the memory access operations for the subsequent instruction opcode to complete. This architecture was prominently used in the Intel 8086 microprocessor.


Introduction

Pipelining was brought to the forefront of computing architecture design during the 1960s due to the need for faster and more efficient computing. Pipelining is the broader concept and most modern processors load their instructions some
clock cycle In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits. A clock sig ...
s before they execute them. This is achieved by pre-loading
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
from memory into a ''prefetch input queue''. This behavior only applies to von Neumann computers (that is, not
Harvard architecture The Harvard architecture is a computer architecture with separate storage and signal pathways for instructions and data. It contrasts with the von Neumann architecture, where program instructions and data share the same memory and pathway ...
computers) that can run self-modifying code and have some sort of instruction pipelining. Nearly all modern high-performance computers fulfill these three requirements. Usually, the prefetching behavior of the PIQ is invisible to the
programming model A programming model is an execution model coupled to an API or a particular pattern of code. In this style, there are actually two execution models in play: the execution model of the base programming language and the execution model of the prog ...
of the CPU. However, there are some circumstances where the behavior of PIQ is visible, and needs to be taken into account by the programmer. When the x86-processor changes mode from realmode to
protected mode In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units (CPUs). It allows system software to use features such as virtual memory, paging and safe multi-taskin ...
and vice versa, the PIQ has to be flushed, or else the CPU will continue to translate the
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
as if it were written in its last mode. If the PIQ is not flushed, the processor might translate its codes wrong and generate an invalid instruction exception. When executing self-modifying code, a change in the processor code immediately in front of the current location of execution might not change how the processor interprets the code, as it is already loaded into its PIQ. It simply executes its old copy already loaded in the PIQ instead of the new and altered version of the code in its
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
and/or cache. This behavior of the PIQ can be used to determine if code is being executed inside an
emulator In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use pe ...
or directly on the hardware of a real CPU. Most emulators will ''probably'' never simulate this behavior. If the PIQ-size is zero (changes in the code ''always'' affect the state of the processor immediately), it can be deduced that either the code is being executed in an emulator or the processor invalidates the PIQ upon writes to addresses loaded in the PIQ.


Performance evaluation based on queuing theory

It was A.K Erlang (1878-1929) who first conceived of a queue as a solution to congestion in telephone traffic. Different
queueing model Queueing theory is the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the ...
s are proposed in order to approximately simulate the real time queuing systems so that those can be analysed mathematically for different performance specifications. Queuing models can be represented using Kendall's notation: :A1/A2/A3/A4 where: * A1 is the distribution of time between two arrivals * A2 is the service time distribution * A3 is the total number of servers * A4 is the capacity of system # M/M/1 Model (Single Queue Single Server/ Markovian): In this model, elements of queue are served on a first-come, first-served basis. Given the mean arrival and service rates, then actual rates vary around these average values randomly and hence have to be determined using a cumulative probability distribution function. # M/M/r Model: This model is a generalization of the basic M/M/1 model where multiple servers operate in parallel. This kind of model can also model scenarios with impatient users who leave the queue immediately if they are not receiving service. This can also be modeled using a
Bernoulli process In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. T ...
having only two states, success and failure. The best example of this model is our regular land-line telephone systems. # M/G/1 Model (Takacs' finite input Model) : This model is used to analyze advanced cases. Here the service time distribution is no longer a
Markov process A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
. This model considers the case of more than one failed machine being repaired by single repairman. Service time for any user is going to increase in this case. Generally in applications like prefetch input queue, M/M/1 Model is popularly used because of limited use of queue features. In this model in accordance with microprocessors, the user takes the role of the execution unit and server is the bus interface unit.


Instruction queue

The processor executes a program by fetching the instructions from memory and executing them. Usually the processor execution speed is much faster than the memory access speed. Instruction queue is used to prefetch the next instructions in a separate buffer while the processor is executing the current instruction. With a four stage pipeline, the rate at which instructions are executed can be up to four times that of sequential execution. The processor usually has two separate units for fetching the instructions and for executing the instructions. The implementation of a pipeline architecture is possible only if the bus interface unit and the execution unit are independent. While the execution unit is decoding or executing an instruction which does not require the use of the
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
and
address bus In computer architecture, a bus (shortened form of the Latin '' omnibus'', and historically also called data highway or databus) is a communication system that transfers data between components inside a computer, or between computers. This e ...
es, the bus interface unit fetches instruction opcodes from the memory. This process is much faster than sending out an address, reading the opcode and then decoding and executing it. Fetching the next instruction while the current instruction is being decoded or executed is called pipelining. The
8086 The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allo ...
architecture has a six-byte prefetch instruction pipeline, while the
8088 The Intel 8088 ("''eighty-eighty-eight''", also called iAPX 88) microprocessor is a variant of the Intel 8086. Introduced on June 1, 1979, the 8088 has an eight-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers a ...
has a four-byte prefetch. As the Execution Unit is executing the current instruction, the bus interface unit reads up to six (or four) bytes of opcodes in advance from the memory. The queue lengths were chosen based on simulation studies. An exception is encountered when the execution unit encounters a
branch A branch, sometimes called a ramus in botany, is a woody structural member connected to the central trunk of a tree (or sometimes a shrub). Large branches are known as boughs and small branches are known as twigs. The term '' twig'' usuall ...
instruction i.e. either a jump or a call instruction. In this case, the entire queue must be dumped and the contents pointed to by the instruction pointer must be fetched from memory.


Drawbacks

Processors implementing the instruction queue prefetch algorithm are rather technically advanced. The
CPU design Processor design is a subfield of computer engineering and electronics engineering (fabrication) that deals with creating a processor, a key component of computer hardware. The design process involves choosing an instruction set and a certain exe ...
level complexity of the such processors is much higher than for regular processors. This is primarily because of the need to implement two separate units, the BIU and EU, operating separately. As the complexity of these chips increases, the cost also increases. These processors are relatively costlier than their counterparts without the prefetch input queue. However, these disadvantages are greatly offset by the improvement in processor execution time. After the introduction of prefetch instruction queue in the 8086 processor, all successive processors have incorporated this feature.


x86 example code

code_starts_here: mov bx, ahead mov word ptr cs: x 9090h ahead: jmp near to_the_end ; Some other code to_the_end: This self-modifying program will overwrite the ''jmp to_the_end'' with two NOPs (which is encoded as ''0x9090''). The jump ''jmp near to_the_end'' is assembled into two bytes of machine code, so the two NOPs will just overwrite this jump and nothing else. (That is, the jump is replaced with a do-nothing-code.) Because the machine code of the jump is already read into the PIQ, and probably also already executed by the processor (
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
processors execute several instructions at once, but they "pretend" that they don't because of the need for
backward compatibility Backward compatibility (sometimes known as backwards compatibility) is a property of an operating system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system, especiall ...
), the change of the code will not have any change of the execution flow.


Example program to detect size

This is an example NASM-
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
self-modifying x86-
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
algorithm that determines the size of the PIQ: code_starts_here: xor bx, bx ; zero register bx xor ax, ax ; zero register ax mov dx, cs mov ode_segment dx ; "calculate" codeseg in the far jump below (edx here too) around: cmp ax, 1 ; check if ax has been altered je found_size ; 0x90 = opcode "nop" (NO oPeration) mov byte op_field+bx 0x90 inc bx db 0xEA ; 0xEA = opcode "far jump" dw flush_queue ; should be followed by offset (rm = "dw", pm = "dd") code_segment: dw 0 ; and then the code segment (calculated above) flush_queue: ; 0x40 = opcode "inc ax" (INCrease ax) mov byte op_field+bx 0x40 nop_field: times 256 nop jmp around found_size: ; ; register bx now contains the size of the PIQ ; this code is for
real mode Real mode, also called real address mode, is an operating mode of all x86-compatible CPUs. The mode gets its name from the fact that addresses in real mode always correspond to real locations in memory. Real mode is characterized by a 20- bit ...
and 16-bit protected mode, but it could easily be changed into ; running for 32-bit protected mode as well. just change the "dw" for ; the offset to "dd". you need also change dx to edx at the top as ; well. (dw and dx = 16 bit addressing, dd and edx = 32 bit addressing) ;
What this code does is basically that it changes the execution flow, and determines by
brute force Brute Force or brute force may refer to: Techniques * Brute force method or proof by exhaustion, a method of mathematical proof * Brute-force attack, a cryptanalytic attack * Brute-force search, a computer problem-solving technique People * Brut ...
how large the PIQ is. "How far away do I have to change the code in front of me for it to affect me?" If it is too near (it is already in the PIQ) the update will not have any effect. If it is far enough, the change of the code will affect the program and the program has then found the size of the processor's PIQ. If this code is being executed under multitasking OS, the
context switch In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processe ...
may lead to the wrong value.


References


External links


{{DEFAULTSORT:Prefetch Input Queue Instruction processing