I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")
Related
I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)
Tucson File is a standard format for tree-ring dataset (see : http://www.cybis.se/wiki/index.php?title=Tucson_format) for a precise description.
The aim is to convert Excel files with 1st Column as YEARS, and other columns as MEASUREMENTS into that RWL format to run DplR package on R.
Some clues are already on (creating a .rwl object) but actually, Chron() and Detrend() functions doen't handle column files as they introduce NAs by coercion.
I've been working many ways to built a "brutal" loop without succeeding, but I'm wondering if a smarter way is possible under R environment ?
Anyway, if somebody here is able to help on a loop I'll take it :)
Thanks a lot !
Alex,
OK, DplR Package have a write.tucson()function (o_O)
library("dplR")
dat <- read.table ("column.txt", header = T, row.names = 1)
write.tucson (dat, "tucson.txt", prec = 0.01, long.names = TRUE)
I have a spss file which contents variables and value labels. I saw foreign package with read.spss function:
data <- read.spss("2017.sav", to.data.frame = TRUE, use.value.labels = TRUE)
If i use use.value.labels = TRUE, all string change to factor variables and i dont want it because they are not factor all.
I found one solution but i dont know if it is the best way to do it
1º First read spss file with previous sentence
2º select which variables are not factor and change it to string with:
cols <- c("x", "ab")
data[cols] <- lapply(data[cols], as.character)
if i dont use use.value.labels = TRUE i will have not value labels and i cannot export file correctly
You can also use the memisc package:
sav <- spss.system.file("file.sav")
df <- as.data.set(sav)
My company regularly deals with SAV files and we extract out the metadata separately. With the foreign package, you can get the metadata out in a few different ways (after you have loaded the file in):
data.label.table <- attr(sav, "label.table")
missings <- attr(sav, "missings")
The other bits require various lapply and sapply functions to get them out. The script I have is quite long, so I will not share it here. If you read the data in with read.spss(sav, to.data.frame = TRUE) you can get:
VariableLabels <- unname(attr(sav, "variable.labels"))
I dont know why, but I can’t install a "foreign" package.
Here is what I did instead to import a dataset from SPSS to R (through Excel):
Open your data in SPSS.
Export dataset from SPSS to Excel, but make sure to choose the "Save
value labels where defined instead of data values" option at the
very bottom.
Open R.
Import dataset from Excel.
Now, you have a dataset in R with value labels.
Use the haven package:
library(haven)
data <- read_sav("2017.sav")
The labels are shown in the RStudio viewer.
We are working in Stata with data created in R, that have been exported using haven package. We stumbled upon an issue with variables that have a dot in the name. To replicate the problem, some minimal R code:
library("haven")
var.1 <- c(1,2,3)
var_2 <- c(1,2,3)
test_df <- employ.data <- data.frame(var.1, var_2)
str(test_df)
write_dta(test_df, "D:/test_df.dta")
Now, in Stata, when I do:
use "D:\test_df.dta"
d
First problem - I get an empty dataset. Second problem - we get variable name with a dot - which in Stata should be illegal. Therefore any command using directly the variable name like
drop var.1
returns an error:
factor variables and time-series operators not allowed
r(101);
What is causing such behaviour? Any solutions to this problem?
This will drop var.1 in Stata:
drop var?1
Here (as in Excel), ? is used as a wildcard for a single character. (The regular expression equivalent to .)
Unfortunately, this will also drop var_1, if it exists.
I am not sure about the missing values when writing a .dta file with haven. I am able to replicate this result in Stata 14.1 and haven 0.2.0.
However, using the read_dta function from haven,
temp2 <- read_dta("test_df.dta")
returns the data.frame. As an alternative to haven, I have used the readstata13 package in the past without issues.
library(readstata13)
save.dta13(test_df, "testdf.dta")
While this code has the same variable names issue, it provided a .dta file that contained the correct values when read into Stata 14.1. There is a convert.underscore argument to save.dta13, that is intended to remove non-valid characters in Stata variable names. I verified that it will work properly in this example for readstata13 for version 0.8.5, but had a bug in some earlier versions including version 0.8.2.
I am new to R and trying to use R to run the report I am currently doing in excel. Most of the topics here have been so helpful to me translating excel formula to R codes, however, I am struggling to generate codes for below excel if statement
=IF(AND(G2="SEA",OR(F2="FCL",F2="BCN")),W2*40,IF(G2="AIR",X2/1000*66,""))
G Column corresponds to Container/Product
F Column corresponds to Transport Mode
AI and AJ correspond to the volumes associated to each Transport mode
Appreciate all the help. Thanks
Here's the link to data exported to R
We can do a nested ifelse after reading the dataset
df1 <- read.csv("yourfile.csv", stringsAsFactors=FALSE)
ifelse(df1[,7]=="SEA" & df1[,6] %in% c("FCL", "BCN"),
df1[,35]*40, ifelse(df1[,7]=="AIR", df1[,36]*66, NA))
NOTE: Here we are referring to numeric index for extracting the columns as a reproducible example was not showed.