communicating with SAS datasets from R - r

I have a bunch of datasets that are in SAS format. I would like to avoid using SAS since I think R provides more than enough functionality for me. Therefore, is there a package that would allow me to interact with the SAS datasets from R? I have the SAS software installed but I would like to avoid coding things in multiple languages.

Since you have SAS, you can use Frank Harrell's 'Hmisc' package which has sas.get and sasxport.get functions. It also has a bunch of utility functions: label,sas.get, contents,describe. For those without a SAS license, package 'foreign' has read.ssd, lookup.xport, and read.xport.
EDIT1: I will also mention that Anthony Joseph Damico recently announced a package to parse SAS INPUT code into read.fwf code. From its description file: " Using importation code designed for SAS users to read ASCII files into sas7bdat files, the SAScii package parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf() function call."
EDIT2: There is also a package by Matt Shotwell called 'sas7bdat' with read.sas7bdat(file) that describes its function as " Read SAS files in the sas7bdat data format."

More recently, the haven package can read and write sas7bdat and SAS xpt files. This package is consistent with other import/export packages in the tidyverse.
There is also a package called libr that simulates a SAS libname() function almost exactly. This package is part of a system of packages called sassy that recreates many basic SAS concepts in R.

Related

Convert Stata 13 .dta file to CSV without using stata [duplicate]

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Read Stata 13 file in R

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

How can i specify R code in the data files in R package

In my R package, I want R to load some data from an external server to main memory. Looking around i found the option of plain R files in the data file section of a package - http://cran.r-project.org/doc/manuals/R-exts.html#Data-in-packages.
But then i couldnt find literature on how the R code will work in the data file section of a package nor how to invoke it.
It would be useful if someone can point me to a sample package too.

What methods exist for distributing a semi-live dataset with an R package?

I am building a package for internal use using devtools. I would like to have the package load in data from a file/connection (that differs depending on the date package is built). The data is large-ish so having a onetime cost of parsing and loading the data during package building is preferable.
Currently, I have a data.R file under R/ that assigns the data to package-level variables, the values are assigned during package installation (or at least that's what appears to be happening). This less than ideal setup mostly works. In order to get all instances of the package to have the same data I have to distribute the data file with the package (currently it's being copied to inst/ by a helper script before building the package) instead of just having it all be packaged together. There must be a better way.
Such as:
Generate .rda files during package building (but this requires not running the same code during package install)
I can do this with a Makefile but that seems like overkill
Can I have R code that is only run during package building and not during install?
Run R code in data/
But the data is munged using code in the package in question. I can fix that with Collate (I think) but then I have to maintain the order of all of the .R files (but with that added complexity I might as well use a Makefile?)
Build two packages, one with all of the code I want, one with the data.
Obvious, clever things I've not thought of.
tl;dr: What are some methods for adding a snapshot of dynamically changing data to an R package frozen for deployment?
As #BenBolker points out in the comments above, splitting the dataset out into a different package has precedent in the community (most notably the core package datasets) and has additional benefits.
The separation of functions from data also makes working on historic versions of the data easier to do with the up to date functions.
I currently have an tools-to-munge package and a things-to-munge package. Using a helper script I can build the tools-to-munge and setup a Suggests (or Depends) in the DESCRIPTION of both packages to point to the appropriate incrementing version of the packages. After the new tools-to-munge package has been built I can build the things-to-munge package as necessary using the functions in the tools-to-munge package.

Using the "foreign" package in R

I need to import a STATA data set into R and I have downloaded the "foreign" package. Could you please tell me the steps to "load" the package into R and the steps to import the STATA dataset?
R helplist style answer: RTFM!
Statalist style answer: save your Stata file as usual. In R, type
help(package="foreign")
to find out what the commands are. The ones pertaining to Stata would have .dta in them, as .dta is Stata data file extension. read.dta(file="path/name.dta") should work on most occasions. If it does not, try saving your file from Stata as an old version (saveold filename.dta, replace).
BTW, it is Stata, not STATA. It's not an acronym, unlike SAS or SPSS... so you don't have to YELL.
P.S. As DWin correctly pointed out, you need to load the package:
library(foreign)
I assumed that since you seem to know R, remembering that won't be an issue.
It rather depends what you mean by "downloaded". You should not need to download anything, since 'foreign' is included in the standard R installation along with 'base', 'stats', 'utils', 'Matrix', and a few others like 'grDevices'. Whether or not you have already installed the 'foreign' package (unnecessarily) using one of the GUI commands, all you should need to do is:
library(foreign)
?read.dta # and run the example
I just had to deal with the same issue therefore the code:
library(foreign)
setwd(your working directory)
Please note that you have to set the working directory so that R knows where to look for your Stata dta dataset
And last the code:
read.dta("name of the dataset .dta")
A video for that topic:
https://www.youtube.com/watch?v=tCkCz4cu918

Resources