In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, string generation is the process of creating a set of
strings from a collection of rules. This is an opposite process to that of
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
, which recognises a string based on some collection of rules.
Applications of string generation include
test data
Test data is data which has been specifically identified for use in tests, typically of a computer program.
Background
Some data may be used in a confirmatory way, typically to verify that a given set of input to a given function produces some e ...
generation,
Captchas
A CAPTCHA ( , a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge–response authentication, challenge–response test used in computing to determine whether the user is h ...
and random essay generation.
Generation methods
Methods for generating strings include:
* While a
deterministic finite automaton
In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
is often used to recognize strings it can easily be changed to generate strings.
Unsolved problems
Unsolved problems in string generation include:
*
It is an
undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is ...
to decide whether a given string can be generated by a given
W-grammar
In computer science, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adria ...
.
See also
*
Pretty printing
Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentation ...
– another process often considered the dual of parsing.
External links
DGL – Data Generation Languagean apparently general facility for addressing this problem
Eli Benderski blogwith a demo in Python
Bruce McKenzie paperon a general algorithm
Generatestrings matching a
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
Generatestrings from a
yacc
Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a comp ...
grammar
comp.compilersdiscussion
Generate random C programsGenerate random string using pythonuser generates strings by applying replacement rules
Algorithms on strings
Parsing
{{Comp-sci-stub