The DMS Software Reengineering Toolkit is a proprietary set of
program transformation
A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and ...
tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called ''Design Maintenance Systems.'' DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.
Usage
DMS has been used to implement
domain-specific languages (such as code generation for factory control), test coverage and profiling tools,
clone detection, language migration tools, C++ component reengineering., and for research into difficult topics such as refactoring C++ reliably.
Features
The toolkit provides means for defining language grammars and will produce
parser
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
s which automatically construct
abstract syntax trees
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurr ...
(ASTs), and
prettyprinter
Pretty-printing (or prettyprinting) is the application of any of various stylistic formatting conventions to text files, such as source code, markup, and similar kinds of content. These formatting conventions may entail adhering to an indentatio ...
s to convert original or modified ASTs back into compilable source text. The parse trees capture, and the prettyprinters regenerate, complete detail about the original source program, including source position, comments, radix and format of numbers, etc., to ensure that regenerated source text is as recognizable to a programmer as the original text modulo any applied transformations.
DMS uses
GLR parsing technology with semantic predicates. This enables it to handle all context-free grammars as well as most non-context-free language syntaxes, such as
Fortran, which requires matching of multiple DO loops with shared CONTINUE statements by label to produce ASTs for correctly nested loops as it parses. DMS has a variety of predefined language front ends, covering most real dialects of
C and
C++ including
C++0x,
C#,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
Python,
PHP
PHP is a General-purpose programming language, general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementati ...
,
EGL,
Fortran,
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily ...
,
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to:
* Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET
* Visual Basic (c ...
,
Verilog
Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is a ...
,
VHDL
The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gat ...
and some 20 or more other languages. DMS can handle
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
,
ISO-8859
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12 ...
,
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
,
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
,
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding s ...
,
Shift-JIS
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conju ...
and a variety of Microsoft character encodings.
DMS provides
attribute grammar An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are re ...
evaluators for computing custom analyses over ASTs, such as metrics, and includes support for
symbol table
In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier (or symbols), constants, procedures and functions in a program's source code is associated with inf ...
construction. Other program facts can be extracted by built-in control- and data-
flow analysis
Flow may refer to:
Science and technology
* Fluid flow, the motion of a gas or liquid
* Flow (geomorphology), a type of mass wasting or slope movement in geomorphology
* Flow (mathematics), a group action of the real numbers on a set
* Flow (psyc ...
engines, local and global
pointer analysis In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex ana ...
, whole-program
call graph
A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' call ...
extraction, and symbolic range analysis by
abstract interpretation.
DMS is implemented in a
parallel programming
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different f ...
language, PARLANSE, which allows using
symmetric multiprocessing
Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
to speed up large analyses and conversions.
Rewriting
Changes to ASTs can be accomplished by both procedural methods coded in PARLANSE and source-to-source tree transformations coded as
rewrite rule
In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or red ...
s using surface-syntax conditioned by any extracted program facts, using DMS's Rule Specification Language (RSL). The rewrite rule engine supporting RSL handles associative and commutative rules. A rewrite rule for C to replace a complex condition by the
?:
operator be written as:
rule simplify_conditional_assignment(v:left_hand_side,e1:expression,e2:expression,e3:expression)
:statement->statement
= " if (\e1) \v=\e2; else \v=e3; "
-> " \v=\e1?\e2:\e3; "
if no_side_effects(v);
Rewrite rules have names, e.g. simplify_conditional_assignment. Each rule has a ''"match this"'' and ''"replace by that"'' pattern pair separated by ->, in our example, on separate lines for readability. The patterns must correspond to language syntax categories; in this case, both patterns must be of syntax category statement also separated in sympathy with the patterns by ->. Target language (e.g., C) surface syntax is coded inside meta-quotes ", to separate rewrite-rule syntax from that of the target language. Backslashes inside meta-quotes represent domain escapes, to indicate pattern meta variables (e.g., \v, \e1, \e2) that match any language construct corresponding to the metavariable declaration in the signature line, e.g., e1 must be of syntactic category: ''(any) expression''. If a metavariable is mentioned multiple times in the ''match'' pattern, it must match to identical subtrees; the same identically shaped v must occur in both assignments in the match pattern in this example. Metavariables in the ''replace'' pattern are replaced by the corresponding matches from the left side. A conditional clause if provides an additional condition that must be met for the rule to apply, e.g., that the matched metavariable v, being an arbitrary left-hand side, must not have a side effect (e.g., cannot be of the form of a
++''; the no_side_effects predicate is defined by an analyzer built with other DMS mechanisms).
Achieving a complex transformation on code is accomplished by providing a number of rules that cooperate to achieve the desired effect. The ruleset is focused on portions of the program by metaprograms coded in PARLANSE.
A complete example of a language definition and source-to-source transformation rules defined and applied is shown using high school
algebra
Algebra () is one of the areas of mathematics, broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathem ...
and a bit of
calculus
Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizati ...
as a domain-specific language.
References
External links
DMS Software Reengineering Toolkit main web page
{{DEFAULTSORT:Dms Software Reengineering Toolkit
Static program analysis tools
Program transformation tools
Computer-aided software engineering tools
Language workbench