GitHub Copilot is a cloud-based
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
tool developed by
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, co ...
and
OpenAI
OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
to assist users of
Visual Studio Code
Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code compl ...
,
Visual Studio
Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such ...
,
Neovim
Vim (;
"Vim is pronounced as one word, like Jim, not vi-ai-em. It's written with a capital, since it's a name, again like Jim." ...
, and
JetBrains
JetBrains s.r.o. (formerly IntelliJ Software s.r.o.) is a Czech software development company which makes tools for software developers and project managers. , the company has offices in Prague; Munich; Berlin; Boston, Massachusetts; Amsterdam ...
integrated development environment
An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools a ...
s (IDEs) by
autocompleting code.
Currently available by subscription to individual developers, the tool was first announced by GitHub on 29 June 2021, and works best for users coding in
Python,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
TypeScript
TypeScript is a free and open source programming language developed and maintained by Microsoft. It is a strict syntactical superset of JavaScript and adds optional static typing to the language. It is designed for the development of large appl ...
,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
, and
Go.
History
On June 29, 2021, GitHub announced GitHub Copilot for technical preview in the Visual Studio Code development environment.
GitHub Copilot was released as a
plugin on the JetBrains marketplace on October 29, 2021. October 27, 2021, GitHub released the GitHub Copilot Neovim plugin as a public repository. GitHub announced Copilot's availability for the Visual Studio 2022 IDE on March 29, 2022. On June 21, 2022, GitHub announced that Copilot was out of "technical preview", and is available as a subscription-based service for individual developers.
Features
When provided with a programming problem in
natural language
In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languag ...
, Codex is capable of generating solution code.
It is also able to describe input code in
English
English usually refers to:
* English language
* English people
English may also refer to:
Peoples, culture, and language
* ''English'', an adjective for something of, from, or related to England
** English national id ...
and translate code between programming languages.
According to its website, GitHub Copilot includes assistive features for programmers, such as the conversion of
code comments to runnable code and autocomplete for chunks of code, repetitive sections of code, and entire
methods and/or
functions.
GitHub reports that Copilot’s autocomplete feature is accurate roughly half of the time; with some Python function header code, for example, Copilot correctly autocompleted the rest of the function body code 43% of the time on the first try and 57% of the time after ten attempts.
GitHub states that Copilot’s features allow programmers to navigate unfamiliar coding
frameworks and languages by reducing the amount of time users spend reading
documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
.
Implementation
GitHub Copilot is powered by the
OpenAI Codex, an artificial intelligence model created by OpenAI which is an artificial intelligence research laboratory. The OpenAI Codex is a modified, production version of the
Generative Pre-trained Transformer 3 (GPT-3), a language model using
deep-learning to produce human-like text. The Codex model is additionally trained on gigabytes of source code in a dozen programming languages.
Copilot’s OpenAI Codex is trained on a selection of the English language, public GitHub repositories, and other publicly available source code.
This includes a filtered dataset of 159
gigabyte
The gigabyte () is a multiple of the unit byte for digital information. The prefix '' giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB.
This defini ...
s of Python code sourced from 54 million public GitHub repositories.
Open AI’s GPT-3 is licensed exclusively to
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
, GitHub’s
parent company
A holding company is a company whose primary business is holding a controlling interest in the securities of other companies. A holding company usually does not produce goods or services itself. Its purpose is to own shares of other companies ...
.
Reception
Since Copilot's release, there have been concerns with its security and educational impact, as well as licensing controversy surrounding the code it produces.
Licensing controversy
Although most code output by Copilot can be classified as a
transformative work, GitHub admits that a small proportion is copied verbatim, which has led to fears that the output code is insufficiently transformative to be classified as
fair use
Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the intere ...
and may infringe on the copyright of the original owner.
This leaves Copilot on untested legal ground, although GitHub states that "training machine learning models on publicly available data is considered fair use across the machine learning community".
The company has also stated that as of June 2022 only a few source codes are taken over completely or partially unchanged. Therefore as the software continues to learn, this figure is expected to drop. Also in June 2022, the
Software Freedom Conservancy
Software Freedom Conservancy, Inc. is an organization that provides a non-profit home and infrastructure support for free and open source software projects. The organization was established in 2006, and as of June 2022, had over 40 member proj ...
announced it would end all uses of GitHub in its own projects, accusing Copilot of ignoring
code licenses used in training data. In November 2022, a
class-action lawsuit
A class action, also known as a class-action lawsuit, class suit, or representative action, is a type of lawsuit where one of the parties is a group of people who are represented collectively by a member or members of that group. The class action ...
was filed, challenging the legality of Copilot.
FSF white papers
On July 28 2021, the
Free Software Foundation
The Free Software Foundation (FSF) is a 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985, to support the free software movement, with the organization's preference for software being distributed under copyleft ("s ...
(FSF) published a funded call for
white paper
A white paper is a report or guide that informs readers concisely about a complex issue and presents the issuing body's philosophy on the matter. It is meant to help readers understand an issue, solve a problem, or make a decision. A white pape ...
s on philosophical and legal questions around Copilot.
Donald Robertson, the Licensing and Compliance Manager of the FSF, stated that "Copilot raises many
..questions which require deeper examination."
On February 24, 2022, the FSF announced they had received 22 papers on the subject and using an anonymous review process chose 5 papers to highlight.
Privacy concerns
As the service is cloud-based and requires continuous communication with the GitHub Copilot servers, it marks a fundamental shift in bringing the process of writing software online and thus into the hands of third parties, where every keystroke can be monitored.
Security concerns
A paper accepted for publication in the
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
Symposium on Security and Privacy in 2022 assessed the security of code generated by Copilot for the
MITRE
The mitre (Commonwealth English) (; Greek: μίτρα, "headband" or "turban") or miter (American English; see spelling differences), is a type of headgear now known as the traditional, ceremonial headdress of bishops and certain abbots in t ...
’s top 25 code weakness enumerations (e.g., cross-site scripting, path traversal) across 89 different scenarios and 1,689 programs.
This was done along the axes of diversity of weaknesses (its ability to respond to scenarios that may lead to various code weaknesses), diversity of prompts (its ability to respond to the same code weakness with subtle variation), and diversity of domains (its ability to generate
register transfer level
In digital circuit design, register-transfer level (RTL) is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those ...
hardware specifications in
Verilog
Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is a ...
).
The study found that across these axes in multiple languages, 39.33% of top suggestions and 40.73% of total suggestions led to code vulnerabilities. Additionally, they found that small, non-semantic (i.e., comments) changes made to code could impact code safety.
Education concerns
A February 2022 paper released by the
Association for Computing Machinery
The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
evaluates the impact Codex, the technology used by Github Copilot, may have on the education of novice programmers.
The study utilizes assessment questions from an introductory programming class at
The University of Auckland
, mottoeng = By natural ability and hard work
, established = 1883; years ago
, endowment = NZD $293 million (31 December 2021)
, budget = NZD $1.281 billion (31 December 2021)
, chancellor = Cecilia Tarrant
, vice_chancellor = Dawn ...
and compares Codex’s responses with student performance.
Researchers found that Codex, on average, performed better than most students; however, its performance decreased on questions that limited what features could be used in the solution (e.g.,
conditionals
Conditional (if then) may refer to:
*Causal conditional, if X then Y, where X is a cause of Y
*Conditional probability, the probability of an event A given that another event B has occurred
* Conditional proof, in logic: a proof that asserts a c ...
,
collections
Collection or Collections may refer to:
* Cash collection, the function of an accounts receivable department
* Collection (church), money donated by the congregation during a church service
* Collection agency, agency to collect cash
* Collectio ...
, and
loops).
Given this type of problem, “only two of
odex’s10 solutions produced the correct output, but both
..violated
heconstraint.” The paper concludes that Codex may be useful in providing a variety of solutions to learners, but may also lead to over-reliance and plagiarism.
See also
References
External links
*
{{Differentiable computing
GitHub
Applications of artificial intelligence