HOME

TheInfoList



OR:

The tidyverse is a collection of
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
packages for the
R programming language R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science. The core R language is extended by a large number of so ...
introduced by
Hadley Wickham Hadley Alexander Wickham (born 14 October 1979) is a New Zealand statistician known for his work on open-source software for the R (programming language), R statistical programming environment. He is the Chief scientific officer, chief scientist ...
and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging
piping Within industry, piping is a system of pipes used to convey fluids (liquids and gases) from one location to another. The engineering discipline of piping design studies the efficient transport of fluid. Industrial process piping (and accomp ...
. As of November 2018, the tidyverse package and some of its individual packages comprise 5 out of the top 10 most downloaded R packages. The tidyverse is the subject of multiple books and papers. In 2019, the ecosystem has been published in the '' Journal of Open Source Software''. Its syntax has been referred to as "supremely readable", and some have argued that tidyverse is an effective way to introduce complete beginners to programming, as pedagogically it allows students to quickly begin doing data processing tasks. Moreover, some practitioners have pointed out that data processing tasks are intuitively easier to
chain A chain is a serial assembly of connected pieces, called links, typically made of metal, with an overall character similar to that of a rope in that it is flexible and curved in compression but linear, rigid, and load-bearing in tension. A ...
together with tidyverse compared to Python's equivalent data processing package,
pandas Pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS) is a controversial hypothetical diagnosis for a subset of children with rapid onset of obsessive-compulsive disorder (OCD) or tic disorders. Sy ...
. There is also an active R community around the tidyverse. For example, there is the
TidyTuesday TidyTuesday, also noted as Tidy Tuesday, tidytuesday, or #tidytuesday, is a weekly community of practice that is currently organized by the Data Science Learning Community (DSLC). A new data set is highlighted each week for participants to practic ...
social data project organised by the Data Science Learning Community (DSLC), where varied real-world datasets are released each week for the community to participate, share, practice, and make learning to work with data easier. Critics of the tidyverse have argued it promotes tools that are harder to teach and learn than their built-in, base R equivalents and are too dissimilar to some programming languages. The tidyverse principles more generally encourage and help ensure that a universe of streamlined packages, in principle, will help alleviate dependency issues and compatibility with current and future features. An example of such a tidyverse principled approach is the pharmaverse, which is a collection of R packages for clinical reporting usage in pharma.


Packages

The core tidyverse packages, which provide functionality to model, transform, and visualize data, include: *
ggplot2 ggplot2 is an open-source data visualization R package, package for the Computational statistics, statistical programming language R (programming language), R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson ...
– for data visualization *
dplyr dplyr is an R package whose set of functions are designed to enable dataframe (a spreadsheet-like data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, effici ...
– for wrangling and transforming data *
tidyr
–'' help transform data specifically into tidy data, where each variable is a column, each observation is a row; each row is an observation, and each value is a cell. *
readr
–'' help read in common delimited, text files with data *
purrr
–'' a
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
toolkit *
tibble
–'' a modern implementation of the built-in data frame data structure *
stringr
–'' helps to manipulate string data types *
forcats
–'' helps to manipulate category data types Additional packages assist the core collection. Other packages based on the tidy data principles are regularly developed, such as tidytext for text analysis, tidymodels for machine learning, or tidyquant for financial operations.


References

{{R (programming language) Data analysis software Statistical software Free R (programming language) software R (programming language)