HOME

TheInfoList



OR:

Source lines of code (SLOC), also known as lines of code (LOC), is a
software metric In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement (metrics are functions, while measurements are ...
used to measure the size of a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
by counting the number of lines in the text of the program's
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or maintainability once the software is produced.


Measurement methods

Many useful comparisons involve only the
order of magnitude An order of magnitude is an approximation of the logarithm of a value relative to some contextually understood reference value, usually 10, interpreted as the base of the logarithm and the representative of values of magnitude one. Logarithmic di ...
of lines of code in a project. Using lines of code to compare a 10,000-line project to a 100,000-line project is far more useful than when comparing a 20,000-line project with a 21,000-line project. While it is debatable exactly how to measure lines of code, discrepancies of an order of magnitude can be clear indicators of software complexity or
man-hour A man-hour (sometimes referred to as person-hour) is the amount of work performed by the average worker in one hour. It is used for estimation of the total amount of uninterrupted labor required to perform a task. For example, researching and wr ...
s. There are two major types of SLOC measures: physical SLOC (LOC) and logical SLOC (LLOC). Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code excluding comment lines. Logical SLOC attempts to measure the number of executable "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are more sensitive to logically irrelevant formatting and style conventions than logical SLOC. However, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC. Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC: for (i = 0; i < 100; i++) printf("hello"); /* How many lines of code is this? */ In this example we have: * 1 physical line of code (LOC), * 2 logical lines of code (LLOC) ( for statement and
printf The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied literal ...
statement), * 1 comment line. Depending on the programmer and coding standards, the above "line" of code could be written on many separate lines: /* Now how many lines of code is this? */ for (i = 0; i < 100; i++) In this example we have: * 4 physical lines of code (LOC): is placing braces work to be estimated? * 2 logical lines of code (LLOC): what about all the work writing non-statement lines? * 1 comment line: tools must account for all code and comments regardless of comment placement. Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park (while at the
Software Engineering Institute The Software Engineering Institute (SEI) is an American research and development center headquartered in Pittsburgh, Pennsylvania. Its activities cover cybersecurity, software assurance, software engineering and acquisition, and component capabi ...
) and others developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure.


Origins

At the time that people began using SLOC as a metric, the most commonly used languages, such as FORTRAN and
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
, were line-oriented languages. These languages were developed at the time when
punched cards A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
were the main form of data entry for programming. One punched card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer, so it made sense to managers to count lines of code as a measurement of a programmer's productivity, even referring to such as " card images". Today, the most commonly used computer languages allow a lot more leeway for formatting. Text lines are no longer limited to 80 or 96 columns, and one line of text no longer necessarily corresponds to one line of code.


Usage of SLOC measures

SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with fewer SLOC may exhibit more functionality than another similar program. Counting SLOC as productivity measure has its caveats, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines (and generally spending more effort). Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Furthermore, inexperienced developers often resort to code duplication, which is highly discouraged as it is more bug-prone and costly to maintain, but it results in higher SLOC. SLOC counting exhibits further accuracy issues at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various
computer language A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Construction language – all forms of communication by which a human can specify an executable problem solution to a comput ...
s balance brevity and clarity in different ways; as an extreme example, most
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
s would require hundreds of lines of code to perform the same task as a few characters in APL. The following example shows a comparison of a "hello world" program written in
BASIC BASIC (Beginners' All-purpose Symbolic Instruction Code) is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College ...
, C, and
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily u ...
(a language known for being particularly verbose). Another increasingly common problem in comparing SLOC metrics is the difference between auto-generated and hand-written code. Modern software tools often have the capability to auto-generate enormous amounts of code with a few clicks of a mouse. For instance, graphical user interface builders automatically generate all the source code for a
graphical control elements Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
simply by dragging an icon onto a workspace. The work involved in creating this code cannot reasonably be compared to the work necessary to write a device driver, for instance. By the same token, a hand-coded custom GUI class could easily be more demanding than a simple device driver; hence the shortcoming of this metric. There are several cost, schedule, and effort estimation models which use SLOC as an input parameter, including the widely used Constructive Cost Model ( COCOMO) series of models by Barry Boehm et al., PRICE Systems True S and Galorath's SEER-SEM. While these models have shown good predictive power, they are only as good as the estimates (particularly the SLOC estimates) fed to them. Many have advocated the use of function points instead of SLOC as a measure of functionality, but since function points are highly correlated to SLOC (and cannot be automatically measured) this is not a universally held view.


Example

According to Vincent Maraia, the SLOC values for various operating systems in
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
's
Windows NT Windows NT is a proprietary graphical operating system produced by Microsoft, the first version of which was released on July 27, 1993. It is a processor-independent, multiprocessing and multi-user operating system. The first version of Win ...
product line are as follows: David A. Wheeler studied the
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become a ...
distribution of the
Linux operating system Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whic ...
, and reported that Red Hat Linux version 7.1 (released April 2001) contained over 30 million physical SLOC. He also extrapolated that, had it been developed by conventional proprietary means, it would have required about 8,000 person-years of development effort and would have cost over $1 billion (in year 2000 U.S. dollars). A similar study was later made of Debian GNU/Linux version 2.2 (also known as "Potato"); this operating system was originally released in August 2000. This study found that Debian GNU/Linux 2.2 included over 55 million SLOC, and if developed in a conventional proprietary way would have required 14,005 person-years and cost US$1.9 billion to develop. Later runs of the tools used report that the following release of Debian had 104 million SLOC, and , the newest release is going to include over 213 million SLOC.


Utility


Advantages

# Scope for automation of counting: since line of code is a physical entity, manual counting effort can be easily eliminated by automating the counting process. Small utilities may be developed for counting the LOC in a program. However, a logical code counting utility developed for a specific language cannot be used for other languages due to the syntactical and structural differences among languages. Physical LOC counters, however, have been produced which count dozens of languages. # An intuitive metric: line of code serves as an intuitive metric for measuring the size of software because it can be seen, and the effect of it can be visualized. Function points are said to be more of an objective metric which cannot be imagined as being a physical entity, it exists only in the logical space. This way, LOC comes in handy to express the size of software among programmers with low levels of experience. # Ubiquitous measure: LOC measures have been around since the earliest days of software. As such, it is arguable that more LOC data is available than any other size measure.


Disadvantages

# Lack of accountability: lines-of-code measure suffers from some fundamental problems. Some think that it isn't useful to measure the productivity of a project using only results from the coding phase, which usually accounts for only 30% to 35% of the overall effort. # Lack of cohesion with functionality: though experiments have repeatedly confirmed that while effort is highly correlated with LOC, functionality is less well correlated with LOC. That is, skilled developers may be able to develop the same functionality with far less code, so one program with less LOC may exhibit more functionality than another similar program. In particular, LOC is a poor productivity measure of individuals, because a developer who develops only a few lines may still be more productive than a developer creating more lines of code – even more: some good refactoring like "extract method" to get rid of redundant code and keep it clean will mostly reduce the lines of code. # Adverse impact on estimation: because of the fact presented under point #1, estimates based on lines of code can adversely go wrong, in all possibility. # Developer's experience: implementation of a specific logic differs based on the level of experience of the developer. Hence, number of lines of code differs from person to person. An experienced developer may implement certain functionality in fewer lines of code than another developer of relatively less experience does, though they use the same language. # Difference in languages: consider two applications that provide the same functionality (screens, reports, databases). One of the applications is written in C++ and the other application written in a language like COBOL. The number of function points would be exactly the same, but aspects of the application would be different. The lines of code needed to develop the application would certainly not be the same. As a consequence, the amount of effort required to develop the application would be different (hours per function point). Unlike lines of code, the number of function points will remain constant. # Advent of
GUI The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
tools: with the advent of GUI-based programming languages and tools such as
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic ( ...
, programmers can write relatively little code and achieve high levels of functionality. For example, instead of writing a program to create a window and draw a button, a user with a GUI tool can use drag-and-drop and other mouse operations to place components on a workspace. Code that is automatically generated by a GUI tool is not usually taken into consideration when using LOC methods of measurement. This results in variation between languages; the same task that can be done in a single line of code (or no code at all) in one language may require several lines of code in another. # Problems with multiple languages: in today's software scenario, software is often developed in more than one language. Very often, a number of languages are employed depending on the complexity and requirements. Tracking and reporting of productivity and defect rates poses a serious problem in this case, since defects cannot be attributed to a particular language subsequent to integration of the system. Function point stands out to be the best measure of size in this case. # Lack of counting standards: there is no standard definition of what a line of code is. Do comments count? Are data declarations included? What happens if a statement extends over several lines? – These are the questions that often arise. Though organizations like SEI and IEEE have published some guidelines in an attempt to standardize counting, it is difficult to put these into practice especially in the face of newer and newer languages being introduced every year. # Psychology: a programmer whose productivity is being measured in lines of code will have an incentive to write unnecessarily verbose code. The more management is focusing on lines of code, the more incentive the programmer has to expand his code with unneeded complexity. This is undesirable, since increased complexity can lead to increased cost of maintenance and increased effort required for bug fixing. In the PBS documentary '' Triumph of the Nerds'', Microsoft executive
Steve Ballmer Steven Anthony Ballmer (; March 24, 1956) is an American business magnate and investor who served as the chief executive officer of Microsoft from 2000 to 2014. He is the current owner of the Los Angeles Clippers of the National Basketball As ...
criticized the use of counting lines of code:
In IBM there's a religion in software that says you have to count K-LOCs, and a K-LOC is a thousand lines of code. How big a project is it? Oh, it's sort of a 10K-LOC project. This is a 20K-LOCer. And this is 50K-LOCs. And IBM wanted to sort of make it the religion about how we got paid. How much money we made off
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
, how much they did. How many K-LOCs did you do? And we kept trying to convince them – hey, if we have – a developer's got a good idea and he can get something done in 4K-LOCs instead of 20K-LOCs, should we make less money? Because he's made something smaller and faster, less K-LOC. K-LOCs, K-LOCs, that's the methodology. Ugh! Anyway, that always makes my back just crinkle up at the thought of the whole thing.
According to the
Computer History Museum The Computer History Museum (CHM) is a museum of computer history, located in Mountain View, California. The museum presents stories and artifacts of Silicon Valley and the information age, and explores the computing revolution and its impact o ...
Apple Developer
Bill Atkinson Bill Atkinson (born March 17, 1951) is an American computer engineer and photographer. Atkinson worked at Apple Computer from 1978 to 1990. Atkinson was the principal designer and developer of the graphical user interface (GUI) of the Apple ...
in 1982 found problems with this practice:
When the Lisa team was pushing to finalize their software in 1982, project managers started requiring programmers to submit weekly forms reporting on the number of lines of code they had written. Bill Atkinson thought that was silly. For the week in which he had rewritten QuickDraw’s region calculation routines to be six times faster and 2000 lines shorter, he put “-2000″ on the form. After a few more weeks the managers stopped asking him to fill out the form, and he gladly complied.


Related terms

* KLOC : 1,000 lines of code ** KDLOC: 1,000 delivered lines of code ** KSLOC: 1,000 source lines of code * MLOC: 1,000,000 lines of code * GLOC: 1,000,000,000 lines of code


See also

*
Software development effort estimation In software development, effort estimation is the process of predicting the most realistic amount of effort (expressed in terms of person-hours or money) required to develop or maintain software based on incomplete, uncertain and noisy input. Effort ...
* Estimation (project management) * Cost estimation in software engineering


Notes


References


Further reading

* * *


External links


Definitions of Practical Source Lines of Code
Resource Standard Metrics (RSM) defines "effective lines of code" as a realistics code metric independent of programming style.

Linux Kernel 2.6.17, Firefox, Apache HTTPD, MySQL, PHP using RSM. * * * Tanenbaum, Andrew S. ''Modern Operating Systems'' (2nd ed.). Prentice Hall. . * *
Folklore.org: Macintosh Stories: -2000 Lines Of Code
{{DEFAULTSORT:Source Lines Of Code Software metrics