Read Stata 13 file in R - r

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?

There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13

There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven

If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.

In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.

I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.

I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Related

How to write an effective loop to access datasets inside 1000's of h5 files in R [duplicate]

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?
UPDATE
I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.
The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
And inspect the structure:
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
You could also use h5, a package which I recently published on CRAN.
Compared to rhdf5 it has the following features:
S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like
readdata <- dataset[1:3, 1:3]
dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
Supported NA values for all data types
200+ Test cases with a code coverage of 80%+.
To save a matrix you could use:
library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)
... and read the entire matrix back into R:
file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)
See also h5 on
CRAN: http://cran.r-project.org/web/packages/h5/index.html
Github: https://github.com/mannau/h5
I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.
Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.
The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).
In the developer's words:
the HDF group says:
NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths
Unidata says:
The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.
In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).

Downloading CSV_GDX_tools.exe package

I have to work with GAMS and R to extract data however I am a new R user and never have used GAMS before. I need to download a package called CSV_GDX_tools.exe and I have no idea what that is...
When I try to install it in R, I get this error message:
Warning in install.packages :
package ‘CSV_GDX_tools.exe’ is not available (for R version 3.3.2)
Can anyone please help me how and where I can download the package?
First, that does not sound like an R-package, but rather like an outdated utility program for GAMS.
I say outdated, because GAMS now has built-in the functionality to convert GDX (GAMS data files) to CSV files, which can be read by any statistics program including R.
GAMS also gives you the option of exporting your data to an SQLite database file (.db), which can be read by R.
Have a look here:
https://www.gams.com/help/index.jsp?topic=%2Fgams.doc%2Fuserguides%2Fmccarl%2Fgdx_utilities.htm

Convert Stata 13 .dta file to CSV without using stata [duplicate]

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Using the "foreign" package in R

I need to import a STATA data set into R and I have downloaded the "foreign" package. Could you please tell me the steps to "load" the package into R and the steps to import the STATA dataset?
R helplist style answer: RTFM!
Statalist style answer: save your Stata file as usual. In R, type
help(package="foreign")
to find out what the commands are. The ones pertaining to Stata would have .dta in them, as .dta is Stata data file extension. read.dta(file="path/name.dta") should work on most occasions. If it does not, try saving your file from Stata as an old version (saveold filename.dta, replace).
BTW, it is Stata, not STATA. It's not an acronym, unlike SAS or SPSS... so you don't have to YELL.
P.S. As DWin correctly pointed out, you need to load the package:
library(foreign)
I assumed that since you seem to know R, remembering that won't be an issue.
It rather depends what you mean by "downloaded". You should not need to download anything, since 'foreign' is included in the standard R installation along with 'base', 'stats', 'utils', 'Matrix', and a few others like 'grDevices'. Whether or not you have already installed the 'foreign' package (unnecessarily) using one of the GUI commands, all you should need to do is:
library(foreign)
?read.dta # and run the example
I just had to deal with the same issue therefore the code:
library(foreign)
setwd(your working directory)
Please note that you have to set the working directory so that R knows where to look for your Stata dta dataset
And last the code:
read.dta("name of the dataset .dta")
A video for that topic:
https://www.youtube.com/watch?v=tCkCz4cu918

communicating with SAS datasets from R

I have a bunch of datasets that are in SAS format. I would like to avoid using SAS since I think R provides more than enough functionality for me. Therefore, is there a package that would allow me to interact with the SAS datasets from R? I have the SAS software installed but I would like to avoid coding things in multiple languages.
Since you have SAS, you can use Frank Harrell's 'Hmisc' package which has sas.get and sasxport.get functions. It also has a bunch of utility functions: label,sas.get, contents,describe. For those without a SAS license, package 'foreign' has read.ssd, lookup.xport, and read.xport.
EDIT1: I will also mention that Anthony Joseph Damico recently announced a package to parse SAS INPUT code into read.fwf code. From its description file: " Using importation code designed for SAS users to read ASCII files into sas7bdat files, the SAScii package parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf() function call."
EDIT2: There is also a package by Matt Shotwell called 'sas7bdat' with read.sas7bdat(file) that describes its function as " Read SAS files in the sas7bdat data format."
More recently, the haven package can read and write sas7bdat and SAS xpt files. This package is consistent with other import/export packages in the tidyverse.
There is also a package called libr that simulates a SAS libname() function almost exactly. This package is part of a system of packages called sassy that recreates many basic SAS concepts in R.

Resources