Common Gateway Interface
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, Common Gateway Interface (CGI) is an interface specification that enables web servers to execute an external program, typically to process user requests. Such programs are often written in a
scripting language A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled. A scripting ...
and are commonly referred to as ''CGI scripts'', but they may include compiled programs. A typical use case occurs when a web user submits a web form on a web page that uses CGI. The form's data is sent to the web server within an
HTTP request The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
with a URL denoting a CGI script. The web server then launches the CGI script in a new
computer process In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes (even entire virtual machines) are ro ...
, passing the form data to it. The output of the CGI script, usually in the form of
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
, is returned by the script to the Web server, and the server relays it back to the browser as its response to the browser's request. Developed in the early 1990s, CGI was the earliest common method available that allowed a web page to be interactive.


History

In 1993, the
National Center for Supercomputing Applications The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
(NCSA) team wrote the specification for calling command line executables on the www-talk mailing list. The other Web server developers adopted it, and it has been a standard for Web servers ever since. A work group chaired by Ken Coar started in November 1997 to get the NCSA definition of CGI more formally defined. This work resulted in RFC 3875, which specified CGI Version 1.1. Specifically mentioned in the RFC are the following contributors: * Rob McCool (author of the
NCSA HTTPd NCSA HTTPd is an early, now discontinued, web server originally developed at the NCSA at the University of Illinois at Urbana–Champaign by Robert McCool and others. First released in 1993, it was among the earliest web servers developed, follo ...
Web server) * John Franks (author of the GN Web server) * Ari Luotonen (the developer of the CERN httpd Web server) * Tony Sanders (author of the Plexus Web server) * George Phillips (Web server maintainer at the University of British Columbia) Historically CGI programs were often written using the C programming language . RFC 3875 "The Common Gateway Interface (CGI)" partially defines CGI using C, in saying that environment variables "are accessed by the
C library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it was ...
routine getenv() or variable environ". The name CGI comes from the early days of the Web, where ''
webmaster A webmaster is a person responsible for maintaining one or more websites. The title may refer to web architects, web developers, site authors, website administrators, website owners, website coordinators, or website publishers. The duties of ...
s'' wanted to connect legacy information systems such as databases to their Web servers. The CGI program was executed by the server that provided a common "gateway" between the Web server and the legacy information system.


Purpose of the CGI specification

Each Web server runs
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
server software, which responds to requests from
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s. Generally, the HTTP server has a directory (folder), which is designated as a document collection – files that can be sent to Web browsers connected to this server. For example, if the Web server has the domain name example.com, and its document collection is stored at /usr/local/apache/htdocs/ in the local file system, then the Web server will respond to a request for http://example.com/index.html by sending to the browser the (pre-written) file /usr/local/apache/htdocs/index.html. For pages constructed on the fly, the server software may defer requests to separate programs and relay the results to the requesting client (usually, a Web browser that displays the page to the end user). In the early days of the Web, such programs were usually small and written in a scripting language; hence, they were known as ''scripts''. Such programs usually require some additional information to be specified with the request. For instance, if Wikipedia were implemented as a script, one thing the script would need to know is whether the user is logged in and, if logged in, under which name. The content at the top of a Wikipedia page depends on this information. HTTP provides ways for browsers to pass such information to the Web server, e.g. as part of the URL. The server software must then pass this information through to the script somehow. Conversely, upon returning, the script must provide all the information required by HTTP for a response to the request: the HTTP status of the request, the document content (if available), the document type (e.g. HTML, PDF, or plain text), et cetera. Initially, different server software would use different ways to exchange this information with scripts. As a result, it wasn't possible to write scripts that would work unmodified for different server software, even though the information being exchanged was the same. Therefore, it was decided to specify a way for exchanging this information: CGI (the ''Common Gateway Interface'', as it defines a common way for server software to interface with scripts). Webpage generating programs invoked by server software that operate according to the CGI specification are known as ''CGI scripts''. This specification was quickly adopted and is still supported by all well-known server software, such as Apache, IIS, and (with an extension) node.js-based servers. An early use of CGI scripts was to process forms. In the beginning of HTML, HTML forms typically had an "action" attribute and a button designated as the "submit" button. When the submit button is pushed the URI specified in the "action" attribute would be sent to the server with the data from the form sent as a
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
. If the "action" specifies a CGI script then the CGI script would be executed and it then produces an HTML page.


Using CGI scripts

A Web server allows its owner to configure which URLs shall be handled by which CGI scripts. This is usually done by marking a new directory within the document collection as containing CGI scripts – its name is often cgi-bin. For example, /usr/local/apache/htdocs/cgi-bin could be designated as a CGI directory on the Web server. When a Web browser requests a URL that points to a file within the CGI directory (e.g., http://example.com/cgi-bin/printenv.pl/with/additional/path?and=a&query=string), then, instead of simply sending that file (/usr/local/apache/htdocs/cgi-bin/printenv.pl) to the Web browser, the HTTP server runs the specified script and passes the output of the script to the Web browser. That is, anything that the script sends to
standard output In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
is passed to the Web client instead of being shown on-screen in a terminal window. As remarked above, the CGI specification defines how additional information passed with the request is passed to the script. For instance, if a slash and additional directory name(s) are appended to the URL immediately after the name of the script (in this example, /with/additional/path), then that path is stored in the PATH_INFO
environment variable An environment variable is a dynamic-named value that can affect the way running processes will behave on a computer. They are part of the environment in which a process runs. For example, a running process can query the value of the TEMP envi ...
before the script is called. If parameters are sent to the script via an
HTTP GET The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
request (a question mark appended to the URL, followed by param=value pairs; in the example, ?and=a&query=string), then those parameters are stored in the QUERY_STRING environment variable before the script is called. If parameters are sent to the script via an
HTTP POST In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accept the data enclosed in the body of the request message, most likely for storing it. It is oft ...
request, they are passed to the script's
standard input In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
. The script can then read these environment variables or data from standard input and adapt to the Web browser's request.


Example

The following
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
program shows all the environment variables passed by the Web server: #!/usr/bin/env perl =head1 DESCRIPTION printenv — a CGI program that just prints its environment =cut print "Content-Type: text/plain\n\n"; foreach ( sort keys %ENV ) If a Web browser issues a request for the environment variables at http://example.com/cgi-bin/printenv.pl/foo/bar?var1=value1&var2=with%20percent%20encoding, a 64-bit
Windows 7 Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was released to manufacturing on July 22, 2009, and became generally available on October 22, 2009. It is the successor to Windows Vista, released nearly ...
Web server running cygwin returns the following information: Some, but not all, of these variables are defined by the CGI standard. Some, such as PATH_INFO, QUERY_STRING, and the ones starting with HTTP_, pass information along from the HTTP request. From the environment, it can be seen that the Web browser is
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current ...
running on a
Windows 7 Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was released to manufacturing on July 22, 2009, and became generally available on October 22, 2009. It is the successor to Windows Vista, released nearly ...
PC, the Web server is Apache running on a system that emulates
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
, and the CGI script is named cgi-bin/printenv.pl. The program could then generate any content, write that to
standard output In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
, and the Web server will transmit it to the browser. The following are
environment variable An environment variable is a dynamic-named value that can affect the way running processes will behave on a computer. They are part of the environment in which a process runs. For example, a running process can query the value of the TEMP envi ...
s passed to CGI programs: * Server specific variables: ** SERVER_SOFTWARE: name/version of HTTP server. ** SERVER_NAME: host name of the server, may be dot-decimal IP address. ** GATEWAY_INTERFACE: CGI/version. * Request specific variables: ** SERVER_PROTOCOL: HTTP/version. ** SERVER_PORT:
TCP port In computer networking, a port is a number assigned to uniquely identify a connection endpoint and to direct data to a specific service. At the software level, within an operating system, a port is a logical construct that identifies a specific ...
(decimal). ** REQUEST_METHOD: name of HTTP method (see above). ** PATH_INFO: path suffix, if appended to URL after program name and a slash. ** PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present. ** SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi. ** QUERY_STRING: the part of URL after ? character. The
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
may be composed of *name=value pairs separated with ampersands (such as var1=val1&var2=val2...) when used to submit form data transferred via GET method as defined by HTML application/x-www-form-urlencoded. ** REMOTE_HOST: host name of the client, unset if server did not perform such lookup. ** REMOTE_ADDR: IP address of the client (dot-decimal). ** AUTH_TYPE: identification type, if applicable. ** REMOTE_USER used for certain AUTH_TYPEs. ** REMOTE_IDENT: see ident, only if server performed such lookup. ** CONTENT_TYPE: Internet media type of input data if PUT or POST method are used, as provided via HTTP header. ** CONTENT_LENGTH: similarly, size of input data (decimal, in octets) if provided via HTTP header. ** Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding
HTTP headers The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
and therefore have the same sense. The program returns the result to the Web server in the form of standard output, beginning with a header and a blank line. The header is encoded in the same way as an
HTTP header The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
and must include the MIME type of the document returned. The headers, supplemented by the Web server, are generally forwarded with the response back to the user. Here is a simple CGI program written in Python 3 along with the HTML that handles a simple addition problem. add.html:
Enter two numbers to add

add.cgi: #!/usr/bin/env python3 import cgi, cgitb cgitb.enable() input_data = cgi.FieldStorage() print('Content-Type: text/html') # HTML is following print('') # Leave a blank line print('

Addition Results

') try: num1 = int(input_data num1"value) num2 = int(input_data num2"value) except: print('Sorry, the script cannot turn your inputs into numbers (integers).') raise SystemExit(1) print(' + = '.format(num1, num2, num1 + num2))
This Python 3 CGI program gets the inputs from the HTML and adds the two numbers together.


Deployment

A Web server that supports CGI can be configured to interpret a URL that it serves as a reference to a CGI script. A common convention is to have a cgi-bin/ directory at the base of the directory tree and treat all executable files within this directory (and no other, for security) as CGI scripts. Another popular convention is to use filename extensions; for instance, if CGI scripts are consistently given the extension .cgi, the Web server can be configured to interpret all such files as CGI scripts. While convenient, and required by many prepackaged scripts, it opens the server to attack if a remote user can upload executable code with the proper extension. In the case of HTTP PUT or POSTs, the user-submitted data are provided to the program via the
standard input In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
. The Web server creates a subset of the
environment variable An environment variable is a dynamic-named value that can affect the way running processes will behave on a computer. They are part of the environment in which a process runs. For example, a running process can query the value of the TEMP envi ...
s passed to it and adds details pertinent to the HTTP environment.


Uses

CGI is often used to process input information from the user and produce the appropriate output. An example of a CGI program is one implementing a wiki. If the user agent requests the name of an entry, the Web server executes the CGI program. The CGI program retrieves the source of that entry's page (if one exists), transforms it into
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
, and prints the result. The Web server receives the output from the CGI program and transmits it to the user agent. Then if the user agent clicks the "Edit page" button, the CGI program populates an HTML textarea or other editing control with the page's contents. Finally if the user agent clicks the "Publish page" button, the CGI program transforms the updated HTML into the source of that entry's page and saves it.


Security

CGI programs run, by default, in the security context of the Web server. When first introduced a number of example scripts were provided with the reference distributions of the NCSA, Apache and CERN Web servers to show how shell scripts or C programs could be coded to make use of the new CGI. One such example script was a CGI program called PHF that implemented a simple phone book. In common with a number of other scripts at the time, this script made use of a function: escape_shell_cmd(). The function was supposed to sanitize its argument, which came from user input and then pass the input to the Unix shell, to be run in the security context of the Web server. The script did not correctly sanitize all input and allowed new lines to be passed to the shell, which effectively allowed multiple commands to be run. The results of these commands were then displayed on the Web server. If the security context of the Web server allowed it, malicious commands could be executed by attackers. This was the first widespread example of a new type of Web based attack, where unsanitized data from Web users could lead to execution of code on a Web server. Because the example code was installed by default, attacks were widespread and led to a number of security advisories in early 1996.


Alternatives

For each incoming HTTP request, a Web server creates a new CGI process for handling it and destroys the CGI process after the HTTP request has been handled. Creating and destroying a process can consume much more CPU and memory than the actual work of generating the output of the process, especially when the CGI program still needs to be interpreted by a virtual machine. For a high number of HTTP requests, the resulting workload can quickly overwhelm the Web server. The overhead involved in CGI process creation and destruction can be reduced by the following techniques: * CGI programs precompiled to
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
, e.g. precompiled from C or
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
programs, rather than CGI programs interpreted by a virtual machine, e.g.
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
, PHP or
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
programs. * Web server extensions such as Apache modules (e.g.
mod_perl Mod, MOD or mods may refer to: Places * Modesto City–County Airport, Stanislaus County, California, US Arts, entertainment, and media Music * Mods (band), a Norwegian rock band * M.O.D. (Method of Destruction), a band from New York City, US ...
,
mod_php Mod, MOD or mods may refer to: Places * Modesto City–County Airport, Stanislaus County, California, US Arts, entertainment, and media Music * Mods (band), a Norwegian rock band * M.O.D. (Method of Destruction), a band from New York City, US ...
,
mod_python mod_python is an Apache HTTP Server module that integrates the Python programming language with the server. It is intended to provide a Python language binding for the Apache HTTP Server. When mod_python released it was one of the more efficient ...
), NSAPI plugins, and ISAPI plugins which allow long-running application processes handling more than one request and hosted within the Web server.
Web 2.0 Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture and interoperability (i.e., compatibility with other products, systems, and ...
allows to transfer data from the client to the server without using HTML forms and without the user noticing. *
FastCGI FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI ...
, SCGI, and AJP which allow long-running application processes handling more than one request to be hosted externally; i.e., separately from the Web server. Each application process listens on a socket; the Web server handles an HTTP request and sends it via another protocol (FastCGI, SCGI or AJP) to the socket only for dynamic content, while static content is usually handled directly by the Web server. This approach needs fewer application processes so consumes less memory than the Web server extension approach. And unlike converting an application program to a Web server extension, FastCGI, SCGI, and AJP application programs remain independent of the Web server. * Jakarta EE runs
Jakarta Servlet A Jakarta Servlet (formerly Java Servlet) is a Java software component that extends the capabilities of a server. Although servlets can respond to many types of requests, they most commonly implement web containers for hosting web application ...
applications in a
Web container A web container (also known as a servlet container; and compare "webcontainer" ) is the component of a web server that interacts with Jakarta Servlets. A web container is responsible for managing the lifecycle of servlets, mapping a URL to a pa ...
to serve dynamic content and optionally static content which replaces the overhead of creating and destroying processes with the much lower overhead of creating and destroying threads. It also exposes the programmer to the library that comes with
Java SE Java Platform, Standard Edition (Java SE) is a computing platform for development and deployment of portable code for desktop and server environments. Java SE was formerly known as Java 2 Platform, Standard Edition (J2SE). The platform uses Ja ...
on which the version of Jakarta EE in use is based. The optimal configuration for any Web application depends on application-specific details, amount of traffic, and complexity of the transaction; these trade-offs need to be analyzed to determine the best implementation for a given task and time budget. Web frameworks offer an alternative to using CGI scripts to interact with user agents.


See also

* CGI.pm *
DOS Gateway Interface Arachne is a stable Internet suite containing a graphical web browser, email client, and dialer. Originally, Arachne was developed by Michal Polák under his xChaos label, a name he later changed into Arachne Labs. It was written in C and compi ...
(DGI) *
FastCGI FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI ...
* Perl Web Server Gateway Interface * Rack (web server interface) * Server Side Includes * Web Server Gateway Interface


References


External links


GNU cgicc
a C++ class library for writing CGI applications
CGI
a standard Perl module for CGI request parsing and HTML response generation
CGI Programming 101: Learn CGI Today!
a CGI tutorial
The Invention of CGI
{{Authority control Servers (computing) Web 1.0 Web technology Network protocols Articles with example Python (programming language) code