Software archaeology or source code archeology is the study of poorly documented or undocumented
legacy software implementations, as part of
software maintenance
Software maintenance is the modification of software after delivery.
Software maintenance is often considered lower skilled and less rewarding than new development. As such, it is a common target for outsourcing or offshoring. Usually, the tea ...
.
Software archaeology, named by analogy with
archaeology
Archaeology or archeology is the study of human activity through the recovery and analysis of material culture. The archaeological record consists of Artifact (archaeology), artifacts, architecture, biofact (archaeology), biofacts or ecofacts, ...
, includes the
reverse engineering
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompl ...
of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information.
Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules, and in some cases deliberately
obfuscatory code may be found. The term has been in use for decades.
Software archaeology has continued to be a topic of discussion at more recent software engineering conferences.
Techniques
A workshop on Software Archaeology at the 2001
OOPSLA (Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to
object-oriented programming
Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
:
*
Scripting language
In computing, a script is a relatively short and simple set of instructions that typically automation, automate an otherwise manual process. The act of writing a script is called scripting. A scripting language or script language is a programming ...
s to build static reports and for filtering diagnostic output
*Ongoing documentation in HTML pages or Wikis
*Synoptic signature analysis, statistical analysis, and
software visualization tools
*Reverse-engineering tools
*Operating-system-level tracing via truss or
strace
*Search engines and tools to search for keywords in source files
*
IDE file browsing
*
Unit testing
Unit testing, component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.
Unit testing describes tests that are run at the unit-level to contrast testing at the Integration ...
frameworks such as
JUnit and
CppUnit
*API documentation generation using tools such as
Javadoc
Javadoc (also capitalized as JavaDoc or javadoc) is an API documentation generator for the Java programming language. Based on information in Java source code, Javadoc generates documentation formatted as HTML and other formats via extensions. ...
and
doxygen
*
Debugger
A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...
s
More generally,
Andy Hunt and
Dave Thomas note the importance of
version control
Version control (also known as revision control, source control, and source code management) is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files; primarily source code t ...
,
dependency management, text indexing tools such as GLIMPSE and
SWISH-E, and "
rawinga map as you begin exploring."
Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors.
At the OOPSLA workshop,
Ward Cunningham suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and
curly braces. In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. Another technique identified at the workshop was the use of
aspect-oriented programming tools such as
AspectJ to systematically introduce
tracing code without directly editing the legacy program.
Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced.
Michael Rozlog of
Embarcadero Technologies has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?"
These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using
software metrics to look for design and style violations, using
unit testing
Unit testing, component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.
Unit testing describes tests that are run at the unit-level to contrast testing at the Integration ...
and
profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process.
Software archaeology can also be a service provided to programmers by external consultants.
In popular culture
The profession of "''programmer–archaeologist''" features prominently in
Vernor Vinge's 1999 sci-fi novel ''
A Deepness in the Sky.''
See also
*
*
*
*
*
*
*
References
External links
*
*
*
*
{{Software engineering
Computer jargon
Software maintenance