The following is a list of
statistical software
The following is a list of statistical software.
Open-source
* ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
* ADMB – a software suite for non-linear statistical modeling based on C+ ...
.
Open-source
*
ADaMSoft – a generalized statistical software with
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
algorithms and methods for data management
*
ADMB – a software suite for non-linear statistical modeling based on
C++ which uses
automatic differentiation
*
Chronux – for neurobiological time series data
*
DAP – free replacement for SAS
*
Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) a
software framework
In computer programming, a software framework is a software abstraction that provides generic functionality which developers can extend with custom code to create applications. It establishes a standard foundation for building and deploying soft ...
for developing
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
algorithms in
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
*
Epi Info –
statistical software
The following is a list of statistical software.
Open-source
* ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
* ADMB – a software suite for non-linear statistical modeling based on C+ ...
for
epidemiology
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
developed by
Centers for Disease Control and Prevention
The Centers for Disease Control and Prevention (CDC) is the National public health institutes, national public health agency of the United States. It is a Federal agencies of the United States, United States federal agency under the United S ...
(CDC). Apache 2 licensed
*
Fityk – nonlinear regression software (GUI and command line)
*
GNU Octave
GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
– programming language very similar to MATLAB with statistical features
*
gretl
gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary.
It has both a graphical user interface (GUI) and a command-line interf ...
– gnu regression, econometrics and time-series library
*
intrinsic Noise Analyzer (iNA) – For analyzing intrinsic fluctuations in biochemical systems
*
jamovi – A free software alternative to IBM
SPSS Statistics
*
JASP – A free software alternative to IBM
SPSS Statistics with additional option for Bayesian methods
*
JMulTi – For
econometric analysis, specialised in univariate and multivariate
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
analysis
*
Just another Gibbs sampler (JAGS) – a program for analyzing Bayesian hierarchical models using
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
developed by Martyn Plummer. It is similar to WinBUGS
*
KNIME
KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" con ...
– An open source analytics platform built with
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and
Eclipse using modular data pipeline workflows
*
LabPlot
LabPlot is a free and open-source, cross-platform computer program for interactive scientific plotting, curve fitting, nonlinear regression, data processing and data analysis. LabPlot is available, under the GPL-2.0-or-later license, for Win ...
– A
free and open-source
Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
,
cross-platform
Within computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several Computing platform, computing platforms. Some ...
computer program for interactive
scientific plotting,
curve fitting,
nonlinear regression,
data processing
Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an o ...
and
data analysis
Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
*
LIBSVM – C++ support vector machine libraries
*
mlpack
mlpack is a free, open-source and header-only software library for machine learning and artificial intelligence written in C++, built on top of the Armadillo library and thensmallennumerical optimization library. mlpack has an emphasis on scal ...
– open-source library for machine learning, exploits C++ language features to provide maximum performance and flexibility while providing a simple and consistent
application programming interface
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
(API)
*
Mondrian – data analysis tool using interactive statistical graphics with a link to
R
*
Neurophysiological Biomarker Toolbox – Matlab toolbox for data-mining of neurophysiological biomarkers
*
OpenBUGS
*
OpenEpi – A web-based, open-source, operating-independent series of programs for use in epidemiology and statistics based on JavaScript and HTML
*
OpenMx – A package for
structural equation modeling running in
R (programming language)
R is a programming language for statistical computing and Data and information visualization, data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science.
The core R language is ...
*
OpenNN – A
software library
In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
written in the programming language
C++ which implements
neural network
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
s, a main area of
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
research
*
Orange, a
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
,
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, and
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
software
*
Pandas –
High-performance computing
High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
(HPC) data structures and data analysis tools for
Python in Python and
Cython
Cython () is a superset of the programming language Python, which allows developers to write Python code (with optional, C-inspired syntax extensions) that yields performance comparable to that of C.
Cython is a compiled language that is ty ...
(statsmodels,
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support ...
)
*
Perl Data Language – Scientific computing with Perl
*
Ploticus – software for generating a variety of graphs from raw data
*
PSPP – A free software alternative to IBM
SPSS Statistics
*
R –
free implementation of the
S (programming language)
S is a statistical programming language developed primarily by John Chambers (statistician), John Chambers and (in earlier versions) Rick Becker, Trevor Hastie, William S. Cleveland, William Cleveland and Allan Wilks of Bell Labs, Bell Laboratorie ...
**
Programming with Big Data in R (pbdR) – a series of R packages enhanced by
SPMD parallelism for
big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
analysis
**
R Commander – GUI interface for R
**
Rattle GUI – GUI interface for R
**
Revolution Analytics – production-grade software for the enterprise big data analytics
**
RStudio – GUI interface and development environment for R
*
ROOT
In vascular plants, the roots are the plant organ, organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often bel ...
– an open-source C++ system for data storage, processing and analysis, developed by CERN and used to find the
Higgs boson
The Higgs boson, sometimes called the Higgs particle, is an elementary particle in the Standard Model of particle physics produced by the excited state, quantum excitation of the Higgs field,
one of the field (physics), fields in particl ...
*
Salstat – menu-driven statistics software
*
Scilab
Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simul ...
– uses GPL-compatible
CeCILL
CeCILL (from CEA CNRS INRIA Logiciel Libre) is a free software license adapted to both international and French legal matters, in the spirit of and retaining compatibility with the GNU General Public License (GPL).
It was jointly developed by ...
license
*
SciPy –
Python library for scientific computing that contains the ''stats'' sub-package which is partly based on the venerable '', STAT'' (a.k.a. ''PipeStat'', formerly ''UNIX, STAT'') software
**
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support ...
– extends SciPy with a host of machine learning models (classification, clustering, regression, etc.)
*
Shogun (toolbox) –
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
, large-scale machine learning toolbox that provides several SVM (Support Vector Machine) implementations (like libSVM, SVMlight) under a common framework and interfaces to Octave, MATLAB, Python, R
*
Simfit – simulation, curve fitting, statistics, and plotting
*
SOCR
*
SOFA Statistics – desktop GUI program focused on ease of use, learn as you go, and beautiful output
*
Stan (software) – open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of
Hamiltonian Monte Carlo. It is somewhat like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors
*
Statistical Lab –
R-based and focusing on educational purposes
*
TOPCAT (software) – interactive graphical analysis and manipulation package for astronomers that understands
FITS
Flexible Image Transport System (FITS) is an open standard defining a digital file format used for storage, transmission and processing of data: formatted as multi-dimensional arrays (for example a 2D image), or tables. FITS is the most commonl ...
, VOTable and
CDF formats.
*
Torch (machine learning) – a
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
software library written in
Lua (programming language)
Lua is a lightweight, high-level, multi-paradigm programming language designed mainly for embedded use in applications. Lua is cross-platform software, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relati ...
*
Weka (machine learning)
The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. Some authorities consider it as the only extant member of the genus '' Gallirallus''. ...
– a suite of machine learning software written at the
University of Waikato
The University of Waikato (), established in 1964, is a Public university, public research university located in Hamilton, New Zealand, Hamilton, New Zealand. An additional campus is located in Tauranga.
The university performs research in nume ...
Public domain
*
CSPro (core is public domain but without publicly available source code; the web UI has been open sourced under Apache version 2 and the help system under GPL version 3)
*
Dataplot (NIST)
*
X-13ARIMA-SEATS (public domain in the United States only; outside of the United States is under US government copyright)
Freeware
*
BV4.1
*
GeoDA
*
MINUIT
*
WinBUGS –
Bayesian analysis
Thomas Bayes ( ; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian
Presbyterianism is a historically Reformed Protestant tradition named for its form of church government by representative assemblies of elde ...
using
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
methods
*
Winpepi – package of statistical programs for epidemiologists
Proprietary
*
Alteryx – analytics platform with drag and drop statistical models; R and Python integration
*
Analytica – visual analytics and statistics package
*
Angoss – products KnowledgeSEEKER and KnowledgeSTUDIO incorporate several data mining algorithms
*
ASReml – for restricted maximum likelihood analyses
*
BMDP – general statistics package
*
DataGraph – online statistical software
*
DB Lytix – 800+ in-database models
*
EViews – for
econometric analysis
*
FAME (database) – a system for managing
time-series databases
*
GAUSS
Johann Carl Friedrich Gauss (; ; ; 30 April 177723 February 1855) was a German mathematician, astronomer, Geodesy, geodesist, and physicist, who contributed to many fields in mathematics and science. He was director of the Göttingen Observat ...
–
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
for statistics
*
Genedata – software for integration and interpretation of experimental data in the life science R&D
*
GenStat – general statistics package
*
GLIM – early package for fitting
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s
*
GraphPad InStat – very simple with much guidance and explanations
*
GraphPad Prism – biostatistics and nonlinear regression with clear explanations
*
Igor Pro -
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
with statistical features and numerical analysis
*
IMSL Numerical Libraries – software library with statistical algorithms
*
JMP – visual analysis and statistics package
*
LIMDEP – comprehensive statistics and econometrics package
*
LISREL – statistics package used in structural equation modeling
*
Maple
''Acer'' is a genus of trees and shrubs commonly known as maples. The genus is placed in the soapberry family Sapindaceae.Stevens, P. F. (2001 onwards). Angiosperm Phylogeny Website. Version 9, June 2008 nd more or less continuously updated si ...
–
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
with statistical features
*
Mathematica – a software package with statistical particularly ŋ features
*
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
–
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
with statistical features
*
MedCalc – for biomedical sciences
*
Microfit – econometrics package, time series
*
Minitab – general statistics package
*
MLwiN – multilevel models (free to UK academics)
*
Nacsport Video Analysis Software – software for analysing sports and obtaining statistical intelligence
*
NAG Numerical Library – comprehensive math and statistics library
*
NCSS – general statistics package
*
Neural Designer – commercial
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
package
*
NLOGIT – comprehensive statistics and econometrics package
*
nQuery Sample Size Software – Sample Size and Power Analysis Software
*
O-Matrix –
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
*
OriginPro – statistics and graphing, programming access to
NAG library
*
PASS Sample Size Software (PASS) – power and sample size software from NCSS
*
Plotly – plotting library and styling interface for analyzing data and creating browser-based graphs. Available for
R,
Python,
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
Julia, and
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
*
Primer-E Primer – environmental and ecological specific
*
PV-WAVE –
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
comprehensive data analysis and visualization with IMSL statistical package
*
Qlucore Omics Explorer – interactive and visual data analysis software
*
RapidMiner
RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022.
History
RapidMiner, formerly known as YALE (Yet Another Learning Environment), was deve ...
–
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
toolbox
*
Regression Analysis of Time Series (RATS) – comprehensive
econometric analysis package
*
S-PLUS
S-PLUS is a commercial implementation of the S (programming language), S programming language sold by TIBCO Software Inc.
It features object-oriented programming capabilities and advanced analytical algorithms. Its statistical analysis capabilit ...
– general statistics package
*
SAS (software) – comprehensive statistical package
*
SHAZAM (Econometrics and Statistics Software) – comprehensive econometrics and statistics package
*
SigmaStat – package for group analysis
*
SIMUL – econometric tool for multidimensional (multi-sectoral, multi-regional) modeling
*
SmartPLS – statistics package used in
partial least squares path modeling (PLS) and PLS-based
structural equation modeling
*
SOCR – online tools for teaching
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
*
Speakeasy (computational environment) – numerical computational environment and programming language with many statistical and
econometric analysis features
*
SPSS Modeler
IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build Predictive modelling, predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistica ...
– comprehensive data mining and text analytics workbench
*
SPSS Statistics – comprehensive statistics package
*
Stata – comprehensive statistics package
*
StatCrunch – comprehensive statistics package, originally designed for college statistics courses
*
Statgraphics – general statistics package
*
Statistica – comprehensive statistics package
*
StatsDirect – statistics package designed for biomedical, public health and general health science uses
*
StatXact – package for exact nonparametric and parametric statistics
*
SuperCROSS – comprehensive statistics package with ad-hoc, cross tabulation analysis
*
Systat – general statistics package
*
The Unscrambler – free-to-try commercial
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., '' multivariate random variables''.
Multivariate statistics concerns understanding the differ ...
software for Windows
*
WarpPLS – statistics package used in
structural equation modeling
*
Wolfram Language – the computer language that evolved from the program
Mathematica. It has similar statistical capabilities as Mathematica.
*
World Programming System (WPS) – statistical package that supports the use of
Python,
R and
SAS languages within a single user program.
*
XploRe
Add-ons
*
Analyse-it – add-on to
Microsoft Excel
Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
for statistical analysis
*
Statgraphics Sigma Express – add-on to
Microsoft Excel
Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
for
Six Sigma statistical analysis
*
SUDAAN – add-on to
SAS and
SPSS for
statistical survey
Survey methodology is "the study of survey methods".
As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey d ...
s
*
XLfit add-on to
Microsoft Excel
Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
for curve fitting and statistical analysis
See also
*
Comparison of statistical packages
*
Free statistical software
*
List of computer algebra systems
*
List of information graphics software
*
List of numerical libraries
*
List of numerical-analysis software
Listed here are notable end-user computer applications intended for use with numerical or data analysis:
Numerical-software packages
* Analytica is a widely used proprietary software tool for building and analyzing numerical models. It is a de ...
*
Mathematical software
Mathematical software is software used to mathematical model, model, analyze or calculate numeric, symbolic or geometric data.
Evolution of mathematical software
Numerical analysis and symbolic computation had been in most important place of the ...
*
Psychometric software
References
{{Public health
Statistical packages
Software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...