Comparative Evaluation Of Some Statistical Software
S.l. dr. Adrian Voicu
Master M.T.S.S.C.C.
First year
COMPARATIVE EVALUATION
OF SOME STATISTICAL SOFTWARE
Choosing a statistical package must take into account several details you may need to think about when deciding which statistical software package is best for your purposes. Among them, the most important to consider might be the following :
Support
Generally, on line support is provided by all commercial statistical packages.
For non commercial packages, the user has acces only to softwarw manuals, tutorials and forums.
Discipline or Type of Analysis
The subject you are pursuing can affect the type of analysis you do. Often disciplines will have their own specialized analyses, e.g. Data Mining, Image analysis, Geospatial systems, medical trials, agricultural design of experiments, split plot designs, survey design and analysis, secondary analysis of large sociological data sets, psychometric scales.
Platform
Most commercial packages have a version running under the Windows operating system. Most of the free statistics softwares were created tu run under Linix, but have also Windows versions.
Ease of Use
The trouble with Statistical packages and quantitative analysis is there can be a lot to learn. Fortunately, many statistical packages are much easier to learn and use than they used to be and documentation is usually available on line from the producers web site or as part of the help system. Most analyses can be done using the menu system, although more complex or rarer calculations may still have to be programmed.
Cost
If you have a larger budget then none of the other considerations are likely to be as important.
For instance, for Stata/SE 14 for large datasets, sigle user and educational pricing, for a perpetual license is $895.
Criteria of evaluation
Due to cost considerents, I tested some free statistical packages, concerning several point of view:
Size on disk
Portability
Graphic interface
Capabilities to import/export files with other softwares (mainly .xls and .csv files). For that, I used the same excel file(Students).
PAST
Past is free software for scientific data analysis, with functions for data manipulation, plotting, univariate and multivariate statistics, ecological analysis, time series and spatial analysis, morphometrics and stratigraphy.
Past works under most versions of Windows, and a beta version for Mac is available below.
Current version (October 2015): 3.09
Despite its size – only 4.4 MB on disk, it has some obvious advantages:
Does not require installation;
The executable can be run from any location (USB memory, desktop, etc.);
Can easily import Excel or csv files;
Can save the results in *.dat, *.nex and*.xls formats;
It can plot the data in 2D and 3D graphics (figure nr.1)
Fig. nr. 1 PAST graphic interface and plots
Excel files are very easy to import and statistic functions are very easy to perform. (figure nr. 2)
Fig. nr. 2 Excel file imported in PAST
GNU PSPP
GNU PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.
The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use.
PSPP is a stable and reliable application. It can perform descriptive statistics, T-tests, ANOVA, linear and logistic regression, measures of association, cluster analysis, reliability and factor analysis, non-parametric tests and more. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. It can be used with its graphical interface or the more traditional syntax commands.
PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.
A brief list of some of the PSPP's features follows below. PSPP has:
Support for over 1 billion cases.
Support for over 1 billion variables.
Syntax and data files which are compatible with those of SPSS.
A choice of terminal or graphical user interface.
A choice of text, postscript, pdf, OpenDocument or html output formats.
Inter-operability with Gnumeric, LibreOffice, OpenOffice.Org and other free software.
Easy data import from spreadsheets, text files and database sources.
The capability to open, analyze and edit two or more datasets concurrently. They can also be merged, joined or concatenated.
A user interface supporting all common character sets and which has been translated to multiple languages.
Fast statistical procedures, even on very large data sets.
No license fees.
No expiration period.
No unethical “end user license agreements”.
A fully indexed user manual.
Portability; Runs on many different computers and many different operating systems
PSPP sintax is similar to SPSS (figure nr. 3)
Fig. nr.3 PSPP syntax
Excel files are very easy to import (figure nr. 4)
Fig. nr.4 Excel file imported in PSPP
Epi Info
Epi Info is widely used by physicians, nurses, epidemiologists, and other public health workers lacking a background in information technology often have a need for simple tools that allow the rapid creation of data collection instruments and data analysis, visualization, and reporting using epidemiologic methods. It delivers epidemiologic functionality without the complexity or expense of large, enterprise applications.
Epi Info is easily used in places with limited network connectivity or limited resources for commercial software and professional IT support. Epi Info™ is flexible, scalable, and free while enabling data collection, advanced statistical analyses, and geographic information system (GIS) mapping capability.
The Analysis module is used to read and analyze data entered with the Enter module or data imported from 24 different data formats. Epidemiologic statistics, tables, graphs, and maps are produced with simple commands such as READ, FREQ, LIST, TABLES, GRAPH, and MAP. As each command is run, it is saved to the program editor where it can be customized and saved, shared, and used in the future as data are revised.
The Excel file used for test was easy to import and manipulate, as it can be seen in the figures nr. 5, 6 and 7:
Fig. nr.5 Excel file imported in Epi Info
Fig. nr.5 Descriptive statistics in Epi Info
Fig. nr.6 Chart in Epi Info
MATLAB “clone”- SCILAB
Scilab is free and open source software for numerical computation providing a powerful computing environment for engineering and scientific applications.
Scilab is released as open source under the CeCILL license (GPL compatible), and is available for download free of charge. Scilab is available under GNU/Linux, Mac OS X and Windows XP/Vista/7/8 (see system requirements).
Scilab includes hundreds of mathematical functions. It has a high level programming language allowing access to advanced data structures, 2-D and 3-D graphical functions.
Statistics
Scilab provides tools to perform data analysis and modeling:
Descriptive statistics
Probability distributions
Linear and nonlinear modeling
Excel file “Students” was a little bit difficult to import in Scilab, but not impossible (figure nr. 7).
Fig. nr.6 Excel file imported in Scilab
Statistics are very easy to perform in Scilab. The language used is similar to those used by other statistics software (figure nr. 7).
Fig. nr. 7 Statistics in Scilab
Fig. nr. 8 Chart in Scilab
Workflows – RapidMiner, Orange, Knime
The software based on workflows (figure nr. 9) usually are not outstanding in statistics, but they have other obvious advantages: once created, a workflow can be reused with other data sets and imported data can be used for several processing phases.
Fig. nr. 9 Workflow for reading an Excel file in RapidMiner
Excel files are easy to be read, and basic statistics are very easy to perform (figure nr. 10).
Fig. nr.10 Excel file imported in RapidMiner
Orange is very similar with RapidMiner. Excel files, statistics and charts can be obtained very easy.
Fig. nr.11 Excel file imported in Orange
Fig. nr. 12 Chart in Orange
Knime is, by far, the best workflow based software. It has I/O nodes for the most important types of files, has very many statistics tools and nodes for data mining, predictors, classification algorithms, etc. (includes R and Weka).
Fig. nr. 13 Descriptive statistics in Knime
R
R is the best one to choose. It's an open source package is available for Windows, Unix and Macintosh platforms, widely used throughout academic community. Some developers have even produced menu systems to make it easier to use.
I underline the existence of some useful graphic interfaces: Rcmdr and JGR with Deducer
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: Comparative Evaluation Of Some Statistical Software (ID: 111794)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
