HOME

TheInfoList



OR:

R is a
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
for statistical computing and data visualization. It has been widely adopted in the fields of data mining,
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
,
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
, and
data science Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
. The core R language is extended by a large number of software packages, which contain reusable code, documentation, and sample data. Some of the most popular R packages are in the tidyverse collection, which enhances functionality for visualizing, transforming, and modelling data, as well as improves the ease of programming (according to the authors and users). R is
free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
distributed under the
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
. The language is implemented primarily in C, Fortran, and R itself. Precompiled
executable In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
s are available for the major
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s (including
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
MacOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
, and
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
). Its core is an
interpreted language In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An inter ...
with a native command line interface. In addition, multiple third-party applications are available as
graphical user interface A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
s; such applications include RStudio (an
integrated development environment An integrated development environment (IDE) is a Application software, software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, an ...
) and Jupyter (a notebook interface).


History

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland. The language was inspired by the S programming language, with most S programs able to run unaltered in R. The language was also inspired by Scheme's lexical scoping, allowing for local variables. The name of the language, R, comes from being both an S language successor and the shared first letter of the authors, Ross and Robert. In August 1993, Ihaka and Gentleman posted a binary file of R on StatLib — a data archive website. At the same time, they announced the posting on the ''s-news'' mailing list. On 5 December 1997, R became a
GNU project The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
when version 0.60 was released. On 29 February 2000, the 1.0 version was released.


Packages

R packages are collections of functions, documentation, and data that expand R. For example, packages can add reporting features (using packages such as RMarkdown, Quarto, knitr, and Sweave) and support for various statistical techniques (such as
linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...
, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering). Ease of package installation and use have contributed to the language's adoption in
data science Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, stru ...
. Immediately available when starting R after installation, base packages provide the fundamental and necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality. An example is the tidyverse collection of R packages, which bundles several subsidiary packages to provide a common API. The collection specializes in tasks related to accessing and processing " tidy data", which are data contained in a two-dimensional table with a single row for each
observation Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...
and a single column for each variable. Installing a package occurs only once. For example, to install the tidyverse collection: > install.packages("tidyverse") To load the functions, data, and documentation of a package, one calls the library() function. To load the tidyverse collection, one can execute the following code: > # The package name can be enclosed in quotes > library("tidyverse") > # But the package name can also be used without quotes > library(tidyverse) The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Friedrich Leisch to host R's
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
, executable files, documentation, and user-created packages. CRAN's name and scope mimic the Comprehensive TeX Archive Network (CTAN) and the Comprehensive Perl Archive Network (CPAN). CRAN originally had only three mirror sites and twelve contributed packages. , it has 99 mirrors and 21,513 contributed packages. Packages are also available in repositories such as R-Forge, Omegahat, and
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. To provide guidance on the CRAN web site, it
Task Views
area lists packages that are relevant for specific topics; sample topics include causal inference,
finance Finance refers to monetary resources and to the study and Academic discipline, discipline of money, currency, assets and Liability (financial accounting), liabilities. As a subject of study, is a field of Business administration, Business Admin ...
,
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
,
high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
,
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
,
meta-analysis Meta-analysis is a method of synthesis of quantitative data from multiple independent studies addressing a common research question. An important part of this method involves computing a combined effect size across all of the studies. As such, th ...
,
social science Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of societies and the relationships among members within those societies. The term was formerly used to refer to the ...
s, and spatial statistics. The Bioconductor project provides packages for genomic data analysis, complementary DNA,
microarray A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
, and high-throughput sequencing methods.


Community

There are three main groups that help support R software development: * The R Core Team was founded in 1997 to maintain the R
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
. * The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. * The R Consortium is a
Linux Foundation The Linux Foundation (LF) is a non-profit organization established in 2000 to support Linux development and open-source software projects. Background The Linux Foundation started as Open Source Development Labs in 2000 to standardize and prom ...
project to develop R infrastructure. '' The R Journal'' is an
open access Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
,
academic journal An academic journal (or scholarly journal or scientific journal) is a periodical publication in which Scholarly method, scholarship relating to a particular academic discipline is published. They serve as permanent and transparent forums for the ...
that features short to medium-length articles on the use and development of R. The journal includes articles on packages, programming tips, CRAN news, and foundation news. The R community hosts many conferences and in-person meetups. These groups include: * UseR!: an annual international R user conference
website
* Directions in Statistical Computing (DSC)
website
* R-Ladies: an organization to promote gender diversity in the R community
website
* SatRdays: R-focused conferences held on Saturdays
website
* R Conference
website
* posit::conf (formerly known as rstudio::conf)
website
On social media sites such as Twitter, the hashtag #rstats can be used to follow new developments in the R community.


Examples


Hello, World!

The following is a "Hello, World!" program: > print("Hello, World!") "Hello, World!"Here is an alternative version, which uses the cat() function: > cat("Hello, World!") Hello, World!


Basic syntax

The following examples illustrate the basic syntax of the language and use of the command-line interface. In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases. > x <- 1:6 # Create a numeric vector in the current environment > y <- x^2 # Similarly, create a vector based on the values in x. > print(y) # Print the vector’s contents. 1 4 9 16 25 36 > z <- x + y # Create a new vector that is the sum of x and y > z # Return the contents of z to the current environment. 2 6 12 20 30 42 > z_matrix <- matrix(z, nrow = 3) # Create a new matrix that transforms the vector z into a 3x2 matrix object > z_matrix 1 2 , 2 20 , 6 30 , 12 42 > 2 * t(z_matrix) - 2 # Transpose the matrix; multiply every element by 2; subtract 2 from each element in the matrix; and then return the results to the terminal. 1 2 3 , 2 10 22 , 38 58 82 > new_df <- data.frame(t(z_matrix), row.names = c("A", "B")) # Create a new dataframe object that contains the data from a transposed z_matrix, with row names 'A' and 'B' > names(new_df) <- c("X", "Y", "Z") # Set the column names of the new_df dataframe as X, Y, and Z. > print(new_df) # Print the current results. X Y Z A 2 6 12 B 20 30 42 > new_df$Z # Output the Z column 12 42 > new_df$Z

new_df Z'&& new_df

new_df$Z # The dataframe column Z can be accessed using the syntax $Z, Z' or and the values are the same. TRUE > attributes(new_df) # Print information about attributes of the new_df dataframe $names "X" "Y" "Z" $row.names "A" "B" $class "data.frame" > attributes(new_df)$row.names <- c("one", "two") # Access and then change the row.names attribute; this can also be done using the rownames() function > new_df X Y Z one 2 6 12 two 20 30 42


Structure of a function

R is able to create functions that add new functionality for code reuse. Objects created within the body of the function (which are enclosed by curly brackets) remain accessible only from within the function, and any
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
may be returned. In R, almost all functions and all user-defined functions are closures. The following is an example of creating a function to perform an arithmetic calculation: # The function's input parameters are x and y. # The function, named f, returns a linear combination of x and y. f <- function(x, y) # As an alternative, the last statement executed in a function is returned implicitly. f <- function(x, y) 3 * x + 4 * y The following is some output from using the function defined above: > f(1, 2) # 3 * 1 + 4 * 2 = 3 + 8 11 > f(c(1, 2, 3), c(5, 3, 4)) # Element-wise calculation 23 18 25 > f(1:3, 4) # Equivalent to f(c(1, 2, 3), c(4, 4, 4)) 19 22 25 It is possible to define functions to be used as infix operators by using the special syntax `%name%`, where "name" is the function variable name: > `%sumx2y2%` <- function(e1, e2) > 1:3 %sumx2y2% -(1:3) 2 8 18 Since R version 4.1.0, functions can be written in a short notation, which is useful for passing anonymous functions to higher-order functions: > sapply(1:5, \(i) i^2) # here \(i) is the same as function(i) 1 4 9 16 25


Native pipe operator

In R version 4.1.0, a native pipe operator, , >, was introduced. This operator allows users to chain functions together, rather than using nested function calls. > nrow(subset(mtcars, cyl

4)) # Nested without the pipe character 11 > mtcars , > subset(cyl

4) , > nrow() # Using the pipe character 11
Another alternative to nested functions is the use of intermediate objects, rather than the pipe operator: > mtcars_subset_rows <- subset(mtcars, cyl

4) > num_mtcars_subset <- nrow(mtcars_subset_rows) > print(num_mtcars_subset) 11
While the pipe operator can produce code that is easier to read, it is advisable to chain together at most 10-15 lines of code using this operator, as well as to chunk code into sub-tasks that are saved into objects having meaningful names. The following is an example having fewer than 10 lines, which some readers may find difficult to grasp in the absence of intermediate named steps:(\(x, n = 42, key = c(letters, LETTERS, " ", ":", ")")) strsplit(x, "") 1 , > (Vectorize(\(chr) which(chr

key) - 1))() , > (`+`)(n) , > (`%%`)(length(key)) , > (\(i) key + 1() , > paste(collapse = "") )("duvFkvFksnvEyLkHAErnqnoyr")
The following is a version of the preceding code that is easier to read: default_key <- c(letters, LETTERS, " ", ":", ")") f <- function(x, n = 42, key = default_key) f("duvFkvFksnvEyLkHAErnqnoyr")


Object-oriented programming

The R language has native support for
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument, and objects are assigned to a class simply by setting a "class" attribute in each object. The latter is a system like the Common Lisp Object System (CLOS), with formal classes (also derived from S) and generic methods, which supports multiple dispatch and multiple inheritance In the example below, summary() is a generic function that dispatches to different methods depending on whether its
argument An argument is a series of sentences, statements, or propositions some of which are called premises and one is the conclusion. The purpose of an argument is to give reasons for one's conclusion via justification, explanation, and/or persu ...
is a numeric vector or a ''factor'': > data <- c("a", "b", "c", "a", NA) > summary(data) Length Class Mode 5 character character > summary(as.factor(data)) a b c NA's 2 1 1 1


Modeling and plotting

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals. # Create x and y values x <- 1:6 y <- x^2 # Linear regression model: y = A + B * x model <- lm(y ~ x) # Display an in-depth summary of the model summary(model) # Create a 2-by-2 layout for figures par(mfrow = c(2, 2)) # Output diagnostic plots of the model plot(model) The output from the summary() function in the preceding code block is as follows: Residuals: 1 2 3 4 5 6 7 8 9 10 3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333 Coefficients: Estimate Std. Error t value Pr(>, t, ) (Intercept) -9.3333 2.8441 -3.282 0.030453 * x 7.0000 0.7303 9.585 0.000662 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.055 on 4 degrees of freedom Multiple R-squared: 0.9583, Adjusted R-squared: 0.9478 F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662


Mandelbrot set

This example of a Mandelbrot set highlights the use of
complex numbers In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form a ...
. It models the first 20
iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
s of the
equation In mathematics, an equation is a mathematical formula that expresses the equality of two expressions, by connecting them with the equals sign . The word ''equation'' and its cognates in other languages may have subtly different meanings; for ...
z = z2 + c, where c represents different complex constants. To run this sample code, it is necessary to first install the package that provides the write.gif() function: install.packages("caTools") The sample code is as follows: library(caTools) jet.colors <- colorRampPalette( c("green", "pink", "#007FFF", "cyan", "#7FFF7F", "white", "#FF7F00", "red", "#7F0000")) dx <- 1500 # define width dy <- 1400 # define height C <- complex( real = rep(seq(-2.2, 1.0, length.out = dx), each = dy), imag = rep(seq(-1.2, 1.2, length.out = dy), times = dx) ) # reshape as matrix of complex numbers C <- matrix(C, dy, dx) # initialize output 3D array X <- array(0, c(dy, dx, 20)) Z <- 0 # loop with 20 iterations for (k in 1:20) write.gif( X, "Mandelbrot.gif", col = jet.colors, delay = 100)


Version names

All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films. In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997. Some notable early releases before the named releases include the following: * Version 1.0.0, released on 29 February 2000, a leap day * Version 2.0.0, released on 4 October 2004, "which at least had a nice ring to it" The idea of naming R version releases was inspired by the naming system for
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
and Ubuntu versions. Dalgaard noted an additional reason for the use of Peanuts references in R codenames—the humorous observation that "everyone in statistics is a P-nut."


Interfaces

R is installed with a command line console by default, but there are multiple ways to interface with the language: * Integrated development environment (IDE): ** R.app (OSX/
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
only) ** Rattle GUI ** R Commander ** RKWard ** RStudio ** Tinn-R * General-purpose IDEs: ** Eclipse via th
StatET plugin
** Visual Studio via R Tools for Visual Studio. * Source-code editors: **
Emacs Emacs (), originally named EMACS (an acronym for "Editor Macros"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, s ...
** Vim via th
Nvim-R plugin
** Kate ** LyX via Sweave ** WinEdt
website
** Jupyter
website
* Other
scripting language In computing, a script is a relatively short and simple set of instructions that typically automation, automate an otherwise manual process. The act of writing a script is called scripting. A scripting language or script language is a programming ...
s: ** Python
website
**
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...

website
**
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...

source code
** F#
website
** Julia
source code
. * General-purpose programming languages: **
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
via th
Rserve socket server
** .NET C#
website
Statistical frameworks that use R in the background include Jamovi and JASP.


Implementations

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include the following:
pretty quick R
(pqR), by Radford M. Neal, which attempts to improve
memory management Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...
. * Renjin for the
Java Virtual Machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
.
CXXR
and Riposte written in C++. * Oracle'sbr>FastR
built o
GraalVM
* TIBCO Enterprise Runtime for R (TERR) to integrate with Spotfire. (The company also created
S-Plus S-PLUS is a commercial implementation of the S (programming language), S programming language sold by TIBCO Software Inc. It features object-oriented programming capabilities and advanced analytical algorithms. Its statistical analysis capabilit ...
, an implementation of the S language.) Microsoft R Open (MRO) was an R implementation. As of 30 June 2021, Microsoft began to phase out MRO in favor of the CRAN distribution.


Commercial support

Although R is an open-source project, some companies provide commercial support: *
Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
provides commercial support for its Big Data Appliance, which integrates R into its other products. *
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
provides commercial support for execution of R within
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
.


See also

* Comparison of numerical-analysis software * Comparison of statistical packages * List of numerical-analysis software * List of statistical software * Rmetrics


Notes


References


Further reading

* *


External links


R Technical Papers

Big Book of R
curated list of R-related programming books

partially annotated curated list of books relating to R or S. {{Authority control Array programming languages Cross-platform free software Data mining and machine learning software Data-centric programming languages Dynamically typed programming languages Free plotting software Free statistical software Functional languages GNU Project software Literate programming Numerical analysis software for Linux Numerical analysis software for macOS Numerical analysis software for Windows Programming languages created in 1993 Science software Statistical programming languages Articles with example R code