Free/Open-source Statistical Software

If you are looking for a computer program that can help you get the results of standard statistical procedures and statistical significance tests without the need for low-level numerical programming, then a statistical package is what you need.

A statistical package is a suite of computer programs that are specialized for statistical analysis. Most statistical packages also provide facilities for data management. Statistical software is both available commercially and for free. However, I can only recommend those that are free and open source not only for the reason that it comes without a price tag, but also because it has the same quality as most of those that are proprietary.

I have here a list of some of the most known Free and Open source statistical packages as a guide to those who are looking for one:

HippoDraw is a powerful object oriented statistical data analysis package written in C++, with user interaction via a Qt-based GUI and a Python scriptable interface. It is being developed by Paul Kunz at SLAC, primarily for the analysis and presentation of particle physics and astrophysics data, but can be equally well used in other fields where data handling is important.

HippoDraw uses ROOT and implements many of the features also found in Java Analysis Studio. HippoDraw by default reads and writes files in an XML-based format, and can also read astrophysics FITS files and read data objects produced by ROOT.

HippoDraw can be used as a Python extension module, allowing users to use HippoDraw data objects with the full power of the Python language. This includes other scientific Python extension modules such Numeric and numarray, whose use with HippoDraw can lead to a large increase in processing speed, even for ROOT objects.

gretl is an open-source software application for compiling and interpreting data mainly for econometrics. It is an acronym for Gnu Regression, Econometrics and Time-series Library. It has a graphical user interface and can be used together with X-12-ARIMA, TRAMO/SEATS, and R. It is written in C, uses GTK as widget toolkit for creating its GUI, and uses gnuplot for generating graphs. As a complement to the GUI it also has a command line interface.

gretl includes the possibility to output models as LaTeX files. Its own data format is XML, and it can also import Excel, Gnumeric, OpenDocument Spreadsheet, Stata, EViews, RATS 4, GNU Octave, Comma Separated Values, PcGive, JMulTi, and ASCII files. It can export to GNU Octave, GNU R, Comma Separated Values, JMulTi, and PcGive file formats.

Besides English, gretl is also available in Basque, German, French, Italian, Polish, Portuguese and Spanish.

OpenEpi is a free, web-based, open source, operating system-independent series of programs for use in epidemiology, biostatistics, public health, and medicine, providing a number of epidemiologic and statistical tools for summary data OpenEpi was developed in JavaScript and hypertext markup language (HTML) and can be run in browsers supporting these languages, such as Microsoft Explorer, Mozilla Firefox, Safari, and Opera, on a number of operating systems, such as Microsoft Windows, Macintosh, and Linux. The program can be run from the OpenEpi website or downloaded and run without a web connection. The source code and documentation is downloadable and freely available for use by other investigators.

Ploticus is a free, GPL, non-interactive software package for producing plots, charts, and graphics from data. It was developed in a Unix/C environment and runs on various Unix, Linux, and win32 systems. Ploticus is good for automated or just-in-time graph generation, handles date and time data nicely, and has basic statistical capabilities. It allows significant user control over colors, styles, options and details.

R (programming language)
The R programming language, sometimes described as GNU S, is a programming language and software environment for statistical computing and graphics. It was originally created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is now developed by the R Development Core Team. R is considered by its developers to be an implementation of the S programming language, with semantics derived from Scheme. The name R comes partly from the first name of the two original authors, and partly as a word play on the name 'S'. The S language has become a de facto standard among statisticians for the development of statistical software.

R is widely used for statistical software development and data analysis. R's source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for Microsoft Windows, Mac OS X, and several Linux and other Unix-like operating systems. R uses a command line interface, though several graphical user interfaces are available.

Shogun is written in C++. It offers numerous algorithms and data structures for machine learning problems.

The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models. The core software itself is written in C++ and offers interfaces for MATLAB, Octave, Python and R. Shogun has been under active development since 1999. Today there is a vibrant user community all over the world using Shogun as a base for research and education, and contributing to the core package.

As Shogun was developed with bioinformatics applications in mind it is capable of processing huge datasets consisting of up to 10 million samples. Shogun supports the use of pre-calculated kernels. It is also possible to use a combined kernel i.e. a kernel consisting of a linear combination of arbitrary kernels over different domains. The coefficients or weights of the linear combination can be learned as well. For this purpose Shogun offers a multiple kernel learning functionality.

ROOT is an object-oriented software package developed by CERN. It was originally designed for particle physics data analysis and contains several features specific to this field, but it is also commonly used in other applications such as astronomy and data mining.

Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well suited for developing new machine learning schemes.

JMulTi is an open-source interactive software for econometric analysis, specialised in univariate and multivariate time series analysis. It has a Java graphical user interface.


  1. Thanks for posting about interesting and productive free software! I would also recommend Rapid Miner []. Although it's mainly a data mining software, you can use it for many statistical issues.

  2. Have you considered SOFA Statistics (released May 2009)? SOFA Statistics is released under the AGPL and is available from The emphasis is on ease of use, learn as you go, and beautiful output. Packages are available for Windows or Ubuntu Linux but will work anywhere Python will. SOFA Statistics lets the user connect directly to MySQL, SQLite, MS Access, and MS SQL Server databases (no importing required) or import data from csv or MS Excel. Disclosure - I am the lead developer.

  3. Hi, I am looking for an open source stats program for educational statistics. Do you have any recs.
    Thanks Ann