Brief history of free statistical software
SAS (software) was among the first commercial statistical packages, released for mainframes in 1968. SAS has since then released versions free to use, the most recent of which is SAS Studio. Epi Info a free to use program from the Centers for Disease Control and Prevention was developed in the 1980s. One of the first completely free to use and open source statistical software was R, first released in 2000. Muenchen looks at trends in scholarly articles citing statistical software packages and shows that, while SPSS is clearly the lead, R has grown steadily and was, in 2016, the second most cited. He shows in a later review that, among the R point and click packages, R Commander and Rattle have been the most frequently downloaded over the past several years, with jamovi picking up in popularity in the most recent time periods. Another article similarly found R moving up, in this case to third place, but also reported Epi Info consistently in the top 10 at three different time periods. Some of the free software packages are from governments, for example Epi Info, which is from CDC (Centers for Disease Control and Prevention). Some other software packages are from smaller or independent organizations or universities.Reviews of free statistical software
There are a few reviews of free statistical software. There were two reviews in journals (but not peer reviewed), one by Zhu and Kuljaca and another article by Grant that included mainly a brief review of R. Zhu and Kuljaca outlined some useful characteristics of software, such as ease of use, having a number of statistical procedures and ability to develop new procedures. They reviewed several programs and identified which ones, at that time, had the most functionality. At that time, several of the programs may not have had all of the desired ability for advanced statistics. Grant reviewed some of the programing features of R, and briefly mentioned the availability of other programs. One other paper reviewed statistical packages, mainly commercial, but includes R. One article reviewed EasyReg and included a discussion of its accuracy. Only two reviews have compared the output of various packages.Shackman, Gene. 2006. "Comparing free statistical software for data sets with no missing values" and "Comparing free statistical software, Handling missing data". Both available here "Free Software" http://gsociology.icaap.org/methods/soft.html In the 2006 review, all of the packages read either CSV files orUsing free statistical software
Before using any statistical packages, it is generally a good idea to have a solid background in Statistics. Then the packages can be used to the best advantage, for example, to choose the most appropriate test, to make sure all the necessary assumptions are met, so that the appropriate conclusions can be drawn. Once the statistical issues are understood, the next step is to decide which package to use. Most of these packages are menu driven, and can be learned in a couple of hours at most, except R, which is generally code driven and requires a much longer time to learn, and to some extent CDC's Epi Info, which also takes some time to learn. Several of the packages also have tutorials. These tutorials help with a basic introduction and learning the basics of the programs. For example, CDC has tutorials about Epi Info. The CDC page also lists a video slide show tutorial from the University of Nebraska, and another site has online training classes. R has a large number of tutorials and manuals, in English and other languages and a faq site. PSPP has a particularly easy to follow tutorial, and a rich set of statistical analyses, including T-test, Oneway and Factorial Anova, Linear and Logistic regression and Principal components analysis. It also has provision for it to be very easy to import data from many other different file formats. A few of the packages have email discussion lists, including R and PSPP. Most of the packages have online manuals, guides or help pages. These are useful when there are questions about specific procedures or statistical tests. Some manuals or guides are for R,R Development Core Team. An Introduction to R. Version 2.8.1 (2008-12-22). . https://cran.r-project.org/doc/manuals/R-intro.htmlMenu driven packages
Many of the packages have some kind of opening menu that is used to get or enter the data, manipulate the data, and select the statistical analysis. Then after starting the program, generally data can be obtained, either from previously saved data sets, or importing from some other format. From this menu, data files in various formats can be imported. For example, if the data is in CSV form (text with commas between values), the program recognizes the format and creates a data set from the CSV file. Finally, the program can be used to do some analysis. In this analysis menu, the variables of interest can be selected, along with other options. Then the analysis is run and results are obtained.Command driven packages
R can be used both in a menu-driven way and as a programming language and as an interpreter.Getting data
Most packages are able to import data from Excel or CSV (text with commas separating values). One consideration is whether there are missing data. Some packages, like PSPP and MicrOsiris, can automatically deal with the missing data. So for example, say one set of data look like this: In this data set, Sam is missing his age, and Sally is missing whether she was born in the USA. When some packages, like PSPP or MicrOsiris, read in or import the original data set, the packages will recognize that those values are missing, and do their calculations accordingly. MicrOsiris automatically assigns 1.5 or 1.6 billion to blanks as missing, and these values are excluded from analysis.Van Eck, Richard, Microsiris, Statistical and Data Management Software System. Version 9.1, 2006. Van Eck Computer Consulting. http://www.microsiris.com/MicrOsiris.htm Other packages need a 'placeholder', such as '-9' where there are missing data. Before the package is used to read the data, the data set has to be edited to put in a placeholder where there are missing data. So for example: If the data set actually includes '-9', then when the data is being read in the program will have to be told when the -9 means missing data.Limitations of packages
Most of the packages have limitations of some sort. Several of the programs, including Easyreg, Epidata and Instat, do not appear to handle missing data or do not handle it well. While EpiInfo has many statistical procedures, correlation is not one of them. Rather correlation is found by regression. This means that EpiInfo will not produce a single table showing correlations among multiple variables. According to the Zelig installation manual, use of Zelig requires that R and several of its libraries already be installed, and the installation also requires some degree of background in R. One limit of MicrOsiris is in handling the output. When calculations are complete, the output pages through the results, but various menu boxes also appear over the results, and so the results cannot be accessed. The output can be saved, though, as a text file and then used. One limitation is specific to programs that were developed by individuals. Support for these programs is limited to the time that the author has available. While the authors may, and often do, respond fairly quickly when there are few people asking questions, if too many people ask questions or the author is otherwise busy, support would correspondingly be slower. R is both written by and used by a large number of people all over the world, and many forums and other internet facilities can be used to get support from other users. While R is powerful, the learning curve can be rather steep for those not already familiar with other kinds of scientific programming.Gillian Raab, Susan Purdon, Kathy Buckner and Iona Waterston. The R Package. Napier University (Edinburgh) and the National Centre for Social Research (London). http://www2.napier.ac.uk/depts/fhls/peas/rpackage.aspSee also
* List of statistical software *'' Journal of Statistical Software''References
{{DEFAULTSORT:Free Statistical Software