Using read.ssd to convert SAS data set into an R data.frame - r

I wish to read data into R from SAS data sets in Windows. The read.ssd function allows me to do so, however, it seems to have an issue when I try to import a SAS data set that has any non-alphabetic symbols in its name. For example, I can import table.sas7bdat using the following:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(directory, "table", sascmd = file.path(sashome, "sas.exe"))
but I can't do the same for a table SAS data set named table1.sas7bdat. It returns an error:
Error in file.symlink(oldPath, linkPath) :
symbolic links are not supported on this version of Windows
Given that I do not have the option to rename these data sets, is there a way to read a SAS data set that has non-alphabetic symbols in its name in to R?

Looking about, it looks like others have your problem as well. Perhaps it's just a bug.
Anyway, try the suggestion from this (old) R help post, posted by the venerable Dan Nordlund who's pretty good at this stuff - and also active on SASL (sasl#listserv.uga.edu) if you want to try cross-posting your question there.
https://stat.ethz.ch/pipermail/r-help/2008-December/181616.html
Also, you might consider the transport method if you don't mind 8 character long variable names.

Use:
directory <- "C:/sas data sets"
sashome <- "/Program Files/SAS/SAS 9.1"
table.df <- read.ssd(library=directory, mem="table1", formats=F,
sasprog=file.path(sashome, "sas.exe"))

Related

Data Imported from Excel to R not Assigning Headers as Column Names

I'm brand new to R and am having difficulty with something very basic. I'm importing data from an excel file like this:
data1 <- read.csv(file.choose(), header=TRUE)
When I try to look at the data in the table by column, R doesn't recognize the column headers as objects. This is what it looks like
summary(Square.Feet)
Error in summary(Square.Feet) : object 'Square.Feet' not found
I need to run a regression and I'm having the same problem. Any help would be much appreciated.
Yes it recognizes, you have to tell R to select the dataframe so:
summary(data1$Square.Feet)
Where "data" is the name of your dataframe, and after the dollar goes the name of the variable
Hope it helps
UPDATE
As suggested below, you can use the following:
data1 <- read.csv(file.choose(), header=TRUE)
attach(data1)
This way, by doing "attach", you avoid to write everytime the name of the dataset, so we would go from
summary(data1$Square.Feet)
To this point after attaching the data:
summary(Square.Feet)
However I DO NOT recommend to do it, because if you load other datasets you may mess everything as it's quite common that variables have the same names, among other major problems, see here (Thanks Ben Bolker for your contribution): here , here, here and
here
if you want a summary of all data fields, then
summary(data1)
or you can use the 'with' helper function
with(data1, summary(Square.Feet))

How can I import data into R that is meant for use in SAS, SPSS, or STATA?

I am attempting to read data from the National Health Interview Survey in R: http://www.cdc.gov/nchs/nhis/nhis_2011_data_release.htm . The data is Sample Adult. The SAScii library actually has a function read.SAScii whose documentation has an example for the same data set I would like to use. The issue is it "doesn't work":
NHIS.11.samadult.SAS.read.in.instructions <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/SAMADULT.sas"
NHIS.11.samadult.file.location <-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/samadult.zip"
#store the NHIS file as an R data frame!
NHIS.11.samadult.df <-
read.SAScii (
NHIS.11.samadult.file.location ,
NHIS.11.samadult.SAS.read.in.instructions ,
zipped = T, )
#or store the NHIS SAS import instructions for use in a
#read.fwf function call outside of the read.SAScii function
NHIS.11.samadult.sas <- parse.SAScii( NHIS.11.samadult.SAS.read.in.instructions )
#save the data frame now for instantaneous loading later
save( NHIS.11.samadult.df , file = "NHIS.11.samadult.data.rda" )
However, when running it I get the error Error in toupper(SASinput) : invalid multibyte string 533.
Others on Stack Overflow with a similar error, but for functions such as read.delim and read.csv, have recommended to try changing the argument to fileEncoding="latin1" for example. The problem with read.SAScii is it has no such parameter fileEncoding.
See:
R: invalid multibyte string and Invalid multibyte string in read.csv
Just in case anyone has a similar problem, the issue and solution for me was to run options( encoding = "windows-1252" ) right before running the above code for read.SAScii since the ASCII file is meant for use in SAS and therefore on Windows. And I am using Linux.
The author of the SAScii library actually has another Github repository asdfree where he has working code for downloading CDC-NHIS datasets for all available years as well as as many other datasets from various surveys such as the American Housing Survey, FDA Drug Surveys, and many more.
The following links to the author's solution to the issue in this question. From there, you can easily find a link to the asdfree repository: https://github.com/ajdamico/SAScii/issues/3 .
As far as this dataset goes, the code in https://github.com/ajdamico/asdfree/blob/master/National%20Health%20Interview%20Survey/download%20all%20microdata.R#L8-L13 does the trick, however it doesn't encode the columns as factors or numeric properly. The good thing is that for any given dataset in an NHIS year, there are only about less than ten to twenty numeric columns where encoding these as numeric one by one is not so painful, and encoding the rest of the columns as numeric requires only a loop through the non-numeric columns.
The easiest solution for me, since I only require the Sample Adult dataset for 2011, and I was able to get my hands on a machine with SAS installed, was to run the SAS program included at http://www.cdc.gov/nchs/nhis/nhis_2011_data_release.htm to encode the columns as necessary. Finally, I used proc export to export the sas dataset onto a CSV file which I then opened in R easily with no necessary edits to the data except in dealing with missing values.
In case you want to work with NHIS datasets besides Sample Adult, it is worth noting that when I ran the available SAS program for 2010 "Sample Adult Cancer" (http://www.cdc.gov/nchs/nhis/nhis_2010_data_release.htm) and exported the data to a CSV, there was an issue with having less column names than actual columns when I attempted to read in the CSV file in R. Skipping the first line resolves this issue but you lose the descriptive column names. You can however import this same data easily without encoding with the R code in the asdfree repository. Please read the documentation there for more info.

Why am I getting different output from the Alteryx R tool

I using the Alteryx R Tool to sign an amazon http request. To do so, I need the hmac function that is included in the digest package.
I'm using a text input tool that includes the key and a datestamp.
Key= "foo"
datastamp= "20120215"
Here's the issue. When I run the following script:
the.data <- read.Alteryx("1", mode="data.frame")
write.Alteryx(base64encode(hmac(the.data$key,the.data$datestamp,algo="sha256",raw = TRUE)),1)
I get an incorrect result when compared to when I run the following:
write.Alteryx(base64encode(hmac("foo","20120215",algo="sha256",raw = TRUE)),1)
The difference being when I hardcode the values for the key and object I get the correct result. But if use the variables from the R data frame I get incorrect output.
Does the data frame alter the data in someway. Has anyone come across this when working with the R Tool in Alteryx.
Thanks for your input.
The issue appears to be that when creating the data frame, your character variables are converted to factors. The way to fix this with the data.frame constructor function is
the.data <- data.frame(Key="foo", datestamp="20120215", stringsAsFactors=FALSE)
I haven't used read.Alteryx but I assume it has a similar way of achieving this.
Alternatively, if your data frame has already been created, you can convert the factors back into character:
write.Alteryx(base64encode(hmac(
as.character(the.data$Key),
as.character(the.data$datestamp),
algo="sha256",raw = TRUE)),1)

R rvest connect with local host

I am creating a way to read in SPSS labels into R. Using library(sjPlot), view_spss(df, useViewer = FALSE) I can create a local html page such as http://localhost:11773/session/file1e0c67270a5.html that shows a nice table with columns for the variable names and the labels I am looking for.
Now I want to use rvest to scrape it but when I start with a command such as page <- rvest::html("http://localhost:11773/session/file1e0c67270a5.html") R just seems to get stuck.
I've tried searching for "connect with local host" but I can't seem to find any questions or answers related to the R package.
This doesn't really answer your specific question as I think the reason is that R spins up a non-persistent process to serve that HTML view of your data. But your approach seems quite round-a-bout to just get to variable labels. This is a general way that works quite well:
library(foreign)
d <- read.spss("your_data.sav", use.value.labels=TRUE, to.data.frame=FALSE)
var_labels <- attr(d, "variable.labels")
## To access the label of a variable named 'var_name':
var_labels[["var_name"]]
Where d results in a list of data, and var_labels is a named list of labels keyed by variable/column.
If you want to get variable and/or value label of SPSS-imported data, you can use get_val_labels and get_var_labels of the sjmisc-package.
See examples here. Both functions accept either a single variable (vector) or a data frame and return the associated variable and value labels. See also this blog post.
The sjmisc-Package supports data frames imported both with the haven- or foreign-package.

Read SPSS file into R

I am trying to learn R and want to bring in an SPSS file, which I can open in SPSS.
I have tried using read.spss from foreign and spss.get from Hmisc. Both error messages are the same.
Here is my code:
## install.packages("Hmisc")
library(foreign)
## change the working directory
getwd()
setwd('C:/Documents and Settings/BTIBERT/Desktop/')
## load in the file
## ?read.spss
asq <- read.spss('ASQ2010.sav', to.data.frame=T)
And the resulting error:
Error in read.spss("ASQ2010.sav", to.data.frame = T) : error
reading system-file header In addition: Warning message: In
read.spss("ASQ2010.sav", to.data.frame = T) : ASQ2010.sav: position
0: character `\000' (
Also, I tried saving out the SPSS file as a SPSS 7 .sav file (was previously using SPSS 18).
Warning messages: 1: In read.spss("ASQ2010_test.sav", to.data.frame =
T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 14
encountered in system file 2: In read.spss("ASQ2010_test.sav",
to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7,
subtype 18 encountered in system file
I had a similar issue and solved it following a hint in read.spss help.
Using package memisc instead, you can import a portable SPSS file like this:
data <- as.data.set(spss.portable.file("filename.por"))
Similarly, for .sav files:
data <- as.data.set(spss.system.file('filename.sav'))
although in this case I seem to miss some string values, while the portable import works seamlessly. The help page for spss.portable.file claims:
The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. They are also adapted to load efficiently large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.
The read.spss seems to be outdated a little bit, so I used package called memisc.
To get this to work do this:
install.packages("memisc")
data <- as.data.set(spss.system.file('yourfile.sav'))
You may also try this:
setwd("C:/Users/rest of your path")
library(haven)
data <- read_sav("data.sav")
and if you want to read all files from one folder:
temp <- list.files(pattern = "*.sav")
read.all <- sapply(temp, read_sav)
I know this post is old, but I also had problems loading a Qualtrics SPSS file into R. R's read.spss code came from PSPP a long time ago, and hasn't been updated in a while. (And Hmisc's code uses read.spss(), too, so no luck there.)
The good news is that PSPP 0.6.1 should read the files fine, as long as you specify a "String Width" of "Short - 255 (SPSS 12.0 and earlier)" on the "Download Data" page in Qualtrics. Read it into PSPP, save a new copy, and you should be in business. Awkward, but free.
,
You can read SPSS file from R using above solutions or the one you are currently using. Just make sure that the command is fed with the file, that it can read properly. I had same error and the problem was, SPSS could not access that file. You should make sure the file path is correct, file is accessible and it is in correct format.
library(foreign)
asq <- read.spss('ASQ2010.sav', to.data.frame=TRUE)
As far as warning message is concerned, It does not affect the data. The record type 7 is used to store features in newer SPSS software to make older SPSS software able to read new data. But does not affect data. I have used this numerous times and data is not lost.
You can also read about this at http://r.789695.n4.nabble.com/read-spss-warning-message-Unrecognized-record-type-7-subtype-18-encountered-in-system-file-td3000775.html#a3007945
It looks like the R read.spss implementation is incomplete or broken. R2.10.1 does better than R2.8.1, however. It appears that R gets upset about custom attributes in a sav file even with 2.10.1 (The latest I have). R also may not understand the character encoding field in the file, and in particular it probably does not work with SPSS Unicode files.
You might try opening the file in SPSS, deleting any custom attributes, and resaving the file.
You can see whether there are custom attributes with the SPSS command
display attributes.
If so, delete them (see VARIABLE ATTRIBUTE and DATAFILE ATTRIBUTE commands), and try again.
HTH,
Jon Peck
If you have access to SPSS, save file as .csv, hence import it with read.csv or read.table. I can't recall any problem with .sav file importing. So far it was working like a charm both with read.spss and spss.get. I reckon that spss.get will not give different results, since it depends on foreign::read.spss
Can you provide some info on SPSS/R/Hmisc/foreign version?
Another solution not mentioned here is to read SPSS data in R via ODBC. You need:
IBM SPSS Statistics Data File Driver. Standalone driver is enough.
Import SPSS data using RODBC package in R.
See the example here. However I have to admit that, there could be problems with very big data files.
For me it works well using memisc!
install.packages("memisc")
load('memisc')
Daten.Februar <-as.data.set(spss.system.file("NPS_Februar_15_Daten.sav"))
names(Daten.Februar)
I agree with #SDahm that the haven package would be the way to go. I myself have struggled a bit with string values when starting to use it, so I thought I'd share my approach on that here, too.
The "semantics" vignette has some useful information on this topic.
library(tidyverse)
library(haven)
# Some interesting information in here
vignette('semantics')
# Get data from spss file
df <- read_sav(path_to_file)
# get value labels
df <- map_df(.x = df, .f = function(x) {
if (class(x) == 'labelled') as_factor(x)
else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
There is no such problem with packages you are using. The only requirement for read a spss file is to put the file into a PORTABLE format file. I mean, spss file have *.sav extension. You need to transform your spss file in a portable document that uses *.por extension.
There is more info in http://www.statmethods.net/input/importingdata.html
In my case this warning was combined with a appearance of a new variable before first column of my data with values -100, 2, 2, 2, ..., a shift in the correspondence between labels and values and the deletion of the last variable. A solution that worked was (using SPSS) to create a new dump variable in the last column of the file, fill it with random values and execute the following code:
(filename is the path to the sav file and in my case the original SPSS file had 62 columns, thus 63 with the additional dumb variable)
library(memisc)
data <- as.data.set(spss.system.file(filename))
copyofdata = data
for(i in 2:63){
names(data)[i] <- names(copyofdata)[i-1]
}
data[[1]] <- NULL
newcopyofdata = data
for(i in 2:62){
labels(data[[i]]) <- labels(newcopyofdata[[i-1]])
}
labels(data[[1]]) <- NULL
Hope the above code will help someone else.
Turn your UNICODE in SPSS off
Open SPSS without any data open and run the code below in your syntax editor
SET UNICODE OFF.
Open the data set and resave it to remove the Unicode
read.spss('yourdata.sav', to.data.frame=T) works correctly then
I just came came across an SPSS file that I couldn't get open using haven, foreign, or memisc, but readspss::read.por did the trick for me:
download.file("http://www.tcd.ie/Political_Science/elections/IMSgeneral92.zip",
"IMSgeneral92.zip")
unzip("IMSgeneral92.zip", exdir = "IMSgeneral92")
# rio, haven, foreign, memisc pkgs don't work on this file! But readspss does:
if(!require(readspss)) remotes::install_git("https://github.com/JanMarvin/readspss.git")
ims92 <- readspss::read.por("IMSgeneral92/IMS_Nov7 92.por", convert.factors = FALSE)
Nice! Thanks, #JanMarvin!
1)
I've found the program, stat-transfer, useful for importing spss and stata files into R.
It resolves the issue you mention by converting spss to R dataset. Also very useful for subsetting super large datasets into smaller portions consumable by R. Not free, but a very useful tool for working with datasets from different programs -- especially if you don't have access to them.
2)
Memisc package also has an spss function worth trying.

Resources