History
In 1992, Urban Müller, a Swiss physics student, took over a small online archive forP′′: Brainfuck's formal "parent language"
Except for its two I/O commands, Brainfuck is a minor variation of the formal programming language+
, -
, <
, >
, , ">/code>,
/code>, Böhm provided an explicit program for each of the basic functions that together serve to compute any computable function
Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can d ...
. So the first "Brainfuck" programs appear in Böhm's 1964 paper – and they were sufficient to prove Turing completeness
In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Tu ...
.
The Infinite Abacus: Brainfuck's "grand-parent" language
A version with explicit memory addressing (rather than relative moves on a stack
Stack may refer to:
Places
* Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group
* Blue Stack Mountains, in Co. Donegal, Ireland
People
* Stack (surname) (including a list of people ...
) and a conditional jump
A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
(instead of loops) was introduced by Joachim Lambek
Joachim "Jim" Lambek (5 December 1922 – 23 June 2014) was a German-born Canadian mathematician. He was Peter Redpath Emeritus Professor of Pure Mathematics at McGill University, where he earned his PhD degree in 1950 with Hans Zassenhaus as a ...
in 1961 under the name of the Infinite Abacus
The abacus (''plural'' abaci or abacuses), also called a counting frame, is a calculating tool which has been used since ancient times. It was used in the ancient Near East, Europe, China, and Russia, centuries before the adoption of the H ...
, consisting of an infinite number of cells and two instructions:
* X+
(increment cell X)
* X- else jump T
(decrement X if it is positive else jump to T)
He proves the Infinite Abacus can compute any computable recursive function by programming Kleene
Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch o ...
set of basic μ-recursive function
In mathematical logic and computer science, a general recursive function, partial recursive function, or μ-recursive function is a partial function from natural numbers to natural numbers that is "computable" in an intuitive sense – as well as i ...
.
His machine was simulated by Melzac's machine modeling computation via arithmetic
Arithmetic () is an elementary part of mathematics that consists of the study of the properties of the traditional operations on numbers—addition, subtraction, multiplication, division, exponentiation, and extraction of roots. In the 19th c ...
(rather than binary logic) mimicking a human operator moving pebbles on an abacus, hence the requirement that all numbers must be positive. Melzac, whose one-instruction set computer
A one-instruction set computer (OISC), sometimes called an ultimate reduced instruction set computer (URISC), is an abstract machine that uses only one instructionobviating the need for a machine language opcode. With a judicious choice for the si ...
is equivalent to an Infinite Abacus, gives programs for multiplication, GCD, th prime number
A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only way ...
, representation in base , sorting by magnitude, and shows how to simulate an arbitrary Turing machine.
Language design
The language consists of eight commands
Command may refer to:
Computing
* Command (computing), a statement in a computer language
* COMMAND.COM, the default operating system shell and command-line interpreter for DOS
* Command key, a modifier key on Apple Macintosh computer keyboards
* ...
, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters (which are ignored). The commands are executed sequentially, with some exceptions: an instruction pointer begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.
The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as a one-dimensional array of at least 30,000 byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
character encoding).
Commands
The eight language commands each consist of a single character:
(Alternatively, the ]
command may instead be translated as an unconditional jump to the corresponding command, or vice versa; programs will behave the same but will run more slowly, due to unnecessary double searching.)
[
and ">/code> command, or vice versa; programs will behave the same but will run more slowly, due to unnecessary double searching.)
[
and
/code> match as parentheses usually do: each [
matches exactly one ]
and vice versa, the [
comes first, and there can be no unmatched [
or ]
between the two.
Brainfuck programs can be translated into C using the following substitutions, assuming ptr
is of type char*
and has been initialized to point to an array of zeroed bytes:
As the name suggests, Brainfuck programs tend to be difficult to comprehend. This is partly because any mildly complex task requires a long sequence of commands and partly because the program's text gives no direct indications of the program's state (computer science), state. These, as well as Brainfuck's inefficiency and its limited input/output capabilities, are some of the reasons it is not used for serious programming. Nonetheless, like any Turing complete language, Brainfuck is theoretically capable of computing any computable function or simulating any other computational model, if given access to an unlimited amount of memory. A variety of Brainfuck programs have been written. Although Brainfuck programs, especially complicated ones, are difficult to write, it is quite trivial to write an interpreter for Brainfuck in a more typical language such as C due to its simplicity. There even exist Brainfuck interpreters written in the Brainfuck language itself.
Brainfuck is an example of a so-called Turing tarpit
A Turing tarpit (or Turing tar-pit) is any programming language or computer interface that allows for flexibility in function but is difficult to learn and use because it offers little or no support for common tasks. The phrase was coined in 1982 ...
: It can be used to write ''any'' program, but it is not practical to do so, because Brainfuck provides so little abstraction that the programs get very long or complicated.
Examples
Adding two values
As a first, simple example, the following code snippet will add the current cell's value to the next cell: Each time the loop is executed, the current cell is decremented, the data pointer moves to the right, that next cell is incremented, and the data pointer moves left again. This sequence is repeated until the starting cell is 0.
+<">>+<
This can be incorporated into a simple addition program as follows:
++ Cell c0 = 2
> +++++ Cell c1 = 5
- Subtract 1 from c1
"> Start your loops with your cell pointer on the loop counter (c1 in our case)
< + Add 1 to c0
> - Subtract 1 from c1
End your loops with the cell pointer on the loop counter
At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1
but we cannot output this value to the terminal since it is not ASCII encoded.
To display the ASCII character "7" we must add 48 to the value 7.
We use a loop to compute 48 = 6 * 8.
++++ ++++ c1 = 8 and this will be our loop counter again
- Subtract 1 from c1
">< +++ +++ Add 6 to c0
> - Subtract 1 from c1
< . Print out c0 which has the value 55 which translates to "7"!
Hello World!
The following program prints "Hello World!" and a newline to the screen:
This program prints "Hello World!" and a newline to the screen, its
length is 106 active command characters. [It is not the shortest.
This loop is an "initial comment loop", a simple way of adding a comment
to a BF program such that you don't have to worry about any command
characters. Any ".", ",", "+", "-", "<" and ">" characters are simply
ignored, the "[" and "]" characters just have to be balanced. This
loop and the commands it contains are ignored because the current cell
defaults to a value of 0; the 0 value causes this loop to be skipped.
]
++++++++ Set Cell #0 to 8
[
>++++ Add 4 to Cell #1; this will always set Cell #1 to 4
[ as the cell will be cleared by the loop
>++ Add 2 to Cell #2
>+++ Add 3 to Cell #3
>+++ Add 3 to Cell #4
>+ Add 1 to Cell #5
<<<<- Decrement the loop counter in Cell #1
] Loop until Cell #1 is zero; number of iterations is 4
>+ Add 1 to Cell #2
>+ Add 1 to Cell #3
>- Subtract 1 from Cell #4
>>+ Add 1 to Cell #6
Move back to the first zero cell you find; this will
be Cell #1 which was cleared by the previous loop
<- Decrement the loop Counter in Cell #0
] Loop until Cell #0 is zero; number of iterations is 8
The result of this is:
Cell no : 0 1 2 3 4 5 6
Contents: 0 0 72 104 88 32 8
Pointer : ^
>>. Cell #2 has value 72 which is 'H'
>---. Subtract 3 from Cell #3 to get 101 which is 'e'
+++++++..+++. Likewise for 'llo' from Cell #3
>>. Cell #5 is 32 for the space
<-. Subtract 1 from Cell #4 for 87 to give a 'W'
<. Cell #3 was set to 'o' from the end of 'Hello'
+++.------.--------. Cell #3 for 'rl' and 'd'
>>+. Add 1 to Cell #5 gives us an exclamation point
>++. And finally a newline from Cell #6
For "readability", this code has been spread across many lines, and blanks and comments have been added. Brainfuck ignores all characters except the eight commands +-<>[],.
so no special syntax for comments is needed (as long as the comments do not contain the command characters). The code could just as well have been written as:
++++++++ +++>+++>+<<<<-.html">++++[>++>+++>+++>+<<<<-+>+>->>+[<">gt;++>+++>+++>+<<<<-.html" ;"title="++++[>++>+++>+++>+<<<<-">++++[>++>+++>+++>+<<<<-+>+>->>+[<-">++>+++>+++>+<<<<-">++++[>++>+++>+++>+<<<<-+>+>->>+[<">gt;++>+++>+++>+<<<<-.html" ;"title="++++[>++>+++>+++>+<<<<-">++++[>++>+++>+++>+<<<<-+>+>->>+[<->.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
ROT13
This program enciphers its input with the ROT13 cipher. To do this, it must map characters A-M (ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
65–77) to N-Z (78-90), and vice versa. Also it must map a-m (97-109) to n-z (110-122) and vice versa. It must map all other characters to themselves; it reads characters one at a time and outputs their enciphered equivalents until it reads an EOF (here assumed to be represented as either -1 or "no change"), at which point the program terminates.
The basic approach used is as follows. Calling the input character ''x'', divide ''x''-1 by 32, keeping quotient and remainder. Unless the quotient is 2 or 3, just output ''x'', having kept a copy of it during the division. If the quotient is 2 or 3, divide the remainder ((''x''-1) modulo 32) by 13; if the quotient here is 0, output ''x''+13; if 1, output ''x''-13; if 2, output ''x''.
Regarding the division algorithm, when dividing ''y'' by ''z'' to get a quotient ''q'' and remainder ''r'', there is an outer loop which sets ''q'' and ''r'' first to the quotient and remainder of 1/''z'', then to those of 2/''z'', and so on; after it has executed ''y'' times, this outer loop terminates, leaving ''q'' and ''r'' set to the quotient and remainder of ''y''/''z''. (The dividend ''y'' is used as a diminishing counter that controls how many times this loop is executed.) Within the loop, there is code to increment ''r'' and decrement ''y'', which is usually sufficient; however, every ''z''th time through the outer loop, it is necessary to zero ''r'' and increment ''q''. This is done with a diminishing counter set to the divisor ''z''; each time through the outer loop, this counter is decremented, and when it reaches zero, it is refilled by moving the value from ''r'' back into it.
-,+ >++++[>++++++++<-"> Read first character and start outer character reading loop
-[ Skip forward if character is 0
>>++++[>++++++++<- Set up divisor (32) for division loop
(MEMORY LAYOUT: dividend copy remainder divisor quotient zero zero)
<+<-[ Set up dividend (x minus 1) and enter division loop
>+>+>-[>>>] Increase copy and remainder / reduce divisor / Normal case: skip forward
<>+<-]>>+>] Special case: move remainder back to divisor and increase quotient
<<<<<- Decrement dividend
] End division loop
]>>> End skip loop; zero former divisor and reuse space for a flag
>-- +++[-.html">[<->+++[-">[<->+++[-.html" ;"title="[<->+++[-">[<->+++[-[ Zero that flag unless quotient was 2 or 3; zero quotient; check flag
++++++++++++<[ If flag then set up divisor (13) for second division loop
(MEMORY LAYOUT: zero copy dividend divisor remainder quotient zero zero)
>-[>+>>] Reduce divisor; Normal case: increase remainder
>[+[<+>-]>+>>] Special case: increase remainder / move it back to divisor / increase quotient
<<<<<- Decrease dividend
] End division loop
>> -">+>- Add remainder back to divisor to get a useful 13
> Skip forward if quotient was 0
-[ Decrement quotient and skip forward if quotient was 1
-<<[-> Zero quotient and divisor if quotient was 2
]<<[<<->>-]>> Zero divisor and subtract 13 from copy if quotient was 1
]<<[<<+>>-] Zero divisor and add 13 to copy if quotient was 0
] End outer skip loop (jump to here if ((character minus 1)/32) was not 2 or 3)
< Clear remainder from first division if second division was skipped
<. Output ROT13ed character from copy and clear it
<-,+ Read next character
] End character reading loop
Portability issues
Partly because Urban Müller did not write a thorough language specification, the many subsequent brainfuck interpreters and compilers have implemented slightly different dialects of brainfuck.
Cell size
In the classic distribution, the cells are of 8-bit size (cells are bytes), and this is still the most common size. However, to read non-textual data, a brainfuck program may need to distinguish an end-of-file
In computing, end-of-file (EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream.
Details
In the C standard library, the character reading func ...
condition from any possible byte value; thus 16-bit cells have also been used. Some implementations have used 32-bit cells, 64-bit cells, or bignum
In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are l ...
cells with theoretically unlimited range, but programs that use this extra range are likely to be slow, since storing the value into a cell requires time as a cell's value may only be changed by incrementing and decrementing.
In all these variants, the ,
and .
commands still read and write data in bytes. In most of them, the cells wrap around, i.e. incrementing a cell which holds its maximal value (with the +
command) will bring it to its minimal value and vice versa. The exceptions are implementations which are distant from the underlying hardware, implementations that use bignums, and implementations that try to enforce portability.
It is usually easy to write brainfuck programs that do not ever cause integer wraparound or overflow, and therefore don't depend on cell size. Generally this means avoiding increment of +255 (unsigned 8-bit wraparound), or avoiding overstepping the boundaries of 128, +127
1 (one, unit, unity) is a number representing a single or the only entity. 1 is also a numerical digit and represents a single unit of counting or measurement. For example, a line segment of ''unit length'' is a line segment of length 1. ...
(signed 8-bit wraparound) (since there are no comparison operators, a program cannot distinguish between a signed and unsigned two's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
fixed-bit-size cell and negativeness of numbers is a matter of interpretation). For more details on integer wraparound, see the Integer overflow
In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is outside of the range that can be represented with a given number of digits – either higher than the maximum or lower t ...
article.
Array size
In the classic distribution, the array has 30,000 cells, and the pointer begins at the leftmost cell. Even more cells are needed to store things like the millionth Fibonacci number
In mathematics, the Fibonacci numbers, commonly denoted , form a integer sequence, sequence, the Fibonacci sequence, in which each number is the sum of the two preceding ones. The sequence commonly starts from 0 and 1, although some authors start ...
, and the easiest way to make the language Turing complete is to make the array unlimited on the right.
A few implementations extend the array to the left as well; this is an uncommon feature, and therefore portable brainfuck programs do not depend on it.
When the pointer moves outside the bounds of the array, some implementations will give an error message, some will try to extend the array dynamically, some will not notice and will produce undefined behavior
In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavio ...
, and a few will move the pointer to the opposite end of the array. Some tradeoffs are involved: expanding the array dynamically to the right is the most user-friendly approach and is good for memory-hungry programs, but it carries a speed penalty. If a fixed-size array is used it is helpful to make it very large, or better yet let the user set the size. Giving an error message for bounds violations is very useful for debugging but even that carries a speed penalty unless it can be handled by the operating system's memory protections.
End-of-line code
Different operating systems (and sometimes different programming environments) use subtly different versions of ASCII. The most important difference is in the code used for the end of a line of text. MS-DOS and Microsoft Windows use a CRLF, i.e. a 13 followed by a 10, in most contexts. UNIX and its descendants (including Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
and macOS) and Amigas use just 10, and older Macs use just 13. It would be difficult if brainfuck programs had to be rewritten for different operating systems. However, a unified standard was easy to create. Urban Müller's compiler and his example programs use 10, on both input and output; so do a large majority of existing brainfuck programs; and 10 is also more convenient to use than CRLF. Thus, brainfuck implementations should make sure that brainfuck programs that assume newline = 10 will run properly; many do so, but some do not.
This assumption is also consistent with most of the world's sample code for C and other languages, in that they use "\n", or 10, for their newlines. On systems that use CRLF line endings, the C standard library transparently remaps "\n" to "\r\n" on output and "\r\n" to "\n" on input for streams not opened in binary mode.
End-of-file behavior
The behavior of the ,
command when an end-of-file
In computing, end-of-file (EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream.
Details
In the C standard library, the character reading func ...
condition has been encountered varies. Some implementations set the cell at the pointer to 0, some set it to the C constant EOF (in practice this is usually -1), some leave the cell's value unchanged. There is no real consensus; arguments for the three behaviors are as follows.
Setting the cell to 0 avoids the use of negative numbers, and makes it marginally more concise to write a loop that reads characters until EOF occurs. This is a language extension devised by Panu Kalliokoski.
Setting the cell to -1 allows EOF to be distinguished from any byte value (if the cells are larger than bytes), which is necessary for reading non-textual data; also, it is the behavior of the C translation of ,
given in Müller's readme file. However, it is not obvious that those C translations are to be taken as normative.
Leaving the cell's value unchanged is the behavior of Urban Müller's brainfuck compiler. This behavior can easily coexist with either of the others; for instance, a program that assumes EOF = 0 can set the cell to 0 before each ,
command, and will then work correctly on implementations that do either EOF = 0 or EOF = "no change". It is so easy to accommodate the "no change" behavior that any brainfuck programmer interested in portability should do so.
Implementations
Although it is trivial to make a naive brainfuck interpreter, writing an optimizing compiler
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
or interpreter becomes more of a challenge and amusement much like writing programs in brainfuck itself is: to produce reasonably fast results, the compiler needs to piece together the fragmentary instructions forced by the language. Possible optimizations range from simple run-length peephole optimizations on repeated commands and common loop patterns like , to more complicated ones like dead code elimination
In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
and constant folding.
In addition to optimization, other types of unusual brainfuck interpreters have also been written. Several brainfuck compilers have been made smaller than 200 bytes – one is only 100 bytes in x86 machine code.
Derivatives
Many people have created brainfuck equivalents (languages with commands that directly map to brainfuck) or brainfuck derivatives (languages that extend its behavior or alter its semantics).
Some examples:
* Pi, which maps brainfuck into errors in individual digits of Pi.
* VerboseFuck, which looks like a traditional programming language, only what appears as parameters or expressions are actually parts of longer commands that cannot be altered.
* DerpPlusPlus, in which the commands are replaced with words such as 'HERP', 'DERP', 'GIGITY', etc.
* Ook!, which maps brainfuck's eight commands to two-word combinations of "Ook.", "Ook?", and "Ook!", jokingly designed to be "writable and readable by orang-utans" according to its creator, a reference to the orang-utan Librarian in the novels of Terry Pratchett
Sir Terence David John Pratchett (28 April 1948 – 12 March 2015) was an English humourist, satirist, and author of fantasy novels, especially comic fantasy, comical works. He is best known for his ''Discworld'' series of 41 novels.
Pratchet ...
.
* Ternary, similar in concept to Ook! but instead consisting of permutations of the ASCII characters 0, 1, and 2.
* BodyFuck, a BrainFuck implementation based on a gesture-controlled system so that programmer's movements are captured by a video camera and converted into the 8 possible characters.
* OooWee, commands are variations of OooWee (e.g. ooo,ooe,wee etc.). Inspired from the Rick and Morty
, creator = Justin Roiland and Dan Harmon
, developer =
, voices = {{plainlist,
* Justin Roiland
* Chris Parnell
* Spencer Grammer
* Sarah Chalke
* Kari Wahlgren
, composer = Ryan Elder
, count ...
character Mr. PoopyButthole.
* I Use Arch btw, which maps brainfuck into the words found in the phrase "I Use Arch btw". Inspired by a phrase coined by the Arch Linux
Arch Linux () is an independently developed, x86-64 general-purpose Linux distribution that strives to provide the latest stable versions of most software by following a rolling-release model. The default installation is a minimal base system, ...
community.
* Brainfunk, maps brainfuck to voice samples, which are used in a dance track, whose words encode a brainfuck program.
*DNA# is a superset based on DNA molecules, with commands replaced by Nucleobase
Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic ...
. One form uses the helix representation of the DNA molecule.
*Brainfuck2, "a derivative of brainfuck except this time actually funny". Every term in the language is a name of another brainfuck derivative.
*Fuckscript, a brainfuck derivative with "more intuitive keywords", preempted with "fuck". Developed by teenager, Josh Schiavone.
See also
* JSFuck – an esoteric JavaScript programming language with a very limited set of characters
References
External links
* {{curlie, Computers/Programming/Languages/Brainfuck
Brainfuck
interpreter on-line in JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
with collection of programs
Brainfuck IDE
– A brainfuck development environment with interactive debugger
Brainfuck collection
of Interpreters and scripts
Brainfuck
to COBOL, C, ASM, PL/1, ... compiler
Brainfuck Assembler
translating x86 assembly code (reduced set) into brainfuck code
* Brainfuck playground a
tio.run
BrainSTARK
Brainfuck implementation with a STARK prover and verifier, allowing for verification in sublinear time
Non-English-based programming languages
Esoteric programming languages
Programming languages created in 1993