How to keep variable labels after raking in R Studio? - r

I think my problem is pretty simple to solve but as I'm totally new in R I don't know how to manage it.
So, I had to weight data from SPSS .sav dataset. First, I imported it into RStudio using foreign. Then I created another dataframe by iterake. However, after raking all variable labels are gone. I tried to copy labels from the source dataframe to weighted dataframe but - what is surprising - it seems R does not recognize source variable labels even though I see them in the RStudio window (below variable names). I mean after I load library labelled and launch var_label(SourceDF) I get only NULL values...
My goal is to copy new weighting variable into the source dataframe (and export the source back to the SPSS format) or copy variable labels from the source dataframe to the raked dataframe.
So:
How to create a weighted dataframe with variable labels (through iterake)?
OR
How to copy source variable labels to the weighted one?
This is the simplified code I created:
library(foreign)
library(iterake)
library(expss)
library(haven)
#source dataframe
df = read.spss("sourcedataset.sav", use.value.labels=TRUE, to.data.frame=TRUE)
#raking universe
uni = universe(data = df, category(name = "q1",
buckets = c("a", "b", "c", "d"),
targets = c(0.2,0.5,0.2,0.1), sum.1 = TRUE),
category(name = "q2",
buckets = c("e", "f"),
targets = c(0.8,0.2),
sum.1 = TRUE), N = 1000)
#creation of the raked dataframe
df.wgt = iterake(universe = uni)```

I you read through this thorough Documentation of variable labels you will find the necessary insight.
In short do the following:
Save the imported value and variable labels after the import
Reapply the saved labels after operations
The necessary commands are:
var_lab() #reads and sets variable labels
val_lab() #reads and sets value labels
variable_label <- var_lab(some_variable) #saving after import
#do stuff
var_lab(some_variable) <- variable_label #reapply labels

Related

Labels applied in R do not save when writing as a Stata file

I added variable (and value, for some) labels in R, using the apply_labels function from 'expss'. When I want to save the data using 'write.dta' and open it in Stata (or reopening the newly saved data in R), the labels do not appear.
I am suspecting that it has something to do with this line in the write.dta documentation:
If the "var.labels" attribute contains a character vector with a
string label for each variable then this is written as the variable
labels. Otherwise the variable names are repeated as variable labels.
Because this is exactly what is happening (the variable names are repeated as variable labels). When checking with attr(df$variable, "label") before trying writing the data using write.dta, the labels appear.
I get the warning message:
"In write.dta [...] abbreviating variable names".
Not sure if this has to do with the problem.
A reproducible example of the code used to add the varibale, labels, and write the data:
library(expss)
library(dplyr)
library(foreign)
df <- data.frame(country = rep(c("NL", "DE", "FR", "AT"), 2),
year = rep(c(2012,2014), 4),
LS_medianpovgap60_disp_wa = c(0.448257605781815, 0.468249874784546, 0.473270740126805, 0.483814288478694, 0.486781335455043, 0.49246341926957, 0.51121872756711, 0.556027028656306))
df <- apply_labels(df,
country = "Country",
year = "Year",
LS_medianpovgap60_disp_wa = "Median shortfall from the poverty thresholds using 60% of the median income, disposable income only households with working age (LIS and SILC average)")
write.dta(df, "df_labelled.dta")
For Stata version > 7, write.dta attempts to abbreviate variable label's if the label attributes is longer than 31 characters.
You may get a better result by using the haven package for the writing and reading steps of your code.
haven::write_dta(df, "df_labelled.dta")
temp <- haven::read_dta("df_labelled.dta")
temp
Edit
The comments below point out that Stata imposes a limit on a variable label's length (80 characters). So R-based work-arounds will all be subject to this constraint.

R - [DESeq2] - Making DESeq Dataset object from csv of already normalized counts

I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)

What is the best way to import spss file in R with value labels?

I have a spss file which contents variables and value labels. I saw foreign package with read.spss function:
data <- read.spss("2017.sav", to.data.frame = TRUE, use.value.labels = TRUE)
If i use use.value.labels = TRUE, all string change to factor variables and i dont want it because they are not factor all.
I found one solution but i dont know if it is the best way to do it
1º First read spss file with previous sentence
2º select which variables are not factor and change it to string with:
cols <- c("x", "ab")
data[cols] <- lapply(data[cols], as.character)
if i dont use use.value.labels = TRUE i will have not value labels and i cannot export file correctly
You can also use the memisc package:
sav <- spss.system.file("file.sav")
df <- as.data.set(sav)
My company regularly deals with SAV files and we extract out the metadata separately. With the foreign package, you can get the metadata out in a few different ways (after you have loaded the file in):
data.label.table <- attr(sav, "label.table")
missings <- attr(sav, "missings")
The other bits require various lapply and sapply functions to get them out. The script I have is quite long, so I will not share it here. If you read the data in with read.spss(sav, to.data.frame = TRUE) you can get:
VariableLabels <- unname(attr(sav, "variable.labels"))
I dont know why, but I can’t install a "foreign" package.
Here is what I did instead to import a dataset from SPSS to R (through Excel):
Open your data in SPSS.
Export dataset from SPSS to Excel, but make sure to choose the "Save
value labels where defined instead of data values" option at the
very bottom.
Open R.
Import dataset from Excel.
Now, you have a dataset in R with value labels.
Use the haven package:
library(haven)
data <- read_sav("2017.sav")
The labels are shown in the RStudio viewer.

R ncdf package - put.var.ncdf requiring incorrect number of dimensions

I am organizing weather data into netCDF files in R. Everything goes fine until I try to populate the netcdf variables with data, because it is asking me to specify only one dimension for two-dimensional variables.
library(ncdf)
These are the dimension tags for the variables. Each variable uses the Threshold dimension and one of the other two dimensions.
th <- dim.def.ncdf("Threshold", "level", c(5,6,7,8,9,10,50,75,100))
rt <- dim.def.ncdf("RainMinimum", "cm", c(5, 10, 25))
wt <- dim.def.ncdf("WindMinimum", "m/s", c(18, 30, 50))
The variables are created in a loop, and there are a lot of them, so for the sake of easy understanding, in my example I'll only populate the list of variables with one variable.
vars <- list()
v1 <- var.def.ncdf("ARMM_rain", "percent", list(th, rt), -1, prec="double")
vars[[length(vars)+1]] <- v1
ncdata <- create.ncdf("composite.nc", vars)
I use another loop to extract data from different data files into a 9x3 data frame named subframe while iterating through the variables of the netcdf file with varindex. For the sake of reproducing, I'll give a quick initialization for these values.
varindex <- 1
subframe <- data.frame(matrix(nrow=9, ncol=3, rep(.01, 27)))
The desired outcome from there is to populate each ncdf variable with the contents of subframe. The code to do so is:
for(x in 1:9) {
for(y in 1:3) {
value <- ifelse(is.na(subframe[x,y]), -1, subframe[x,y])
put.var.ncdf(ncdata, varindex, value, start=c(x,y), count=1)
}
}
The error message is:
Error in put.var.ncdf(ncdata, varindex, value, start = c(x, y), count = 1) :
'start' should specify 1 dims but actually specifies 2
tl;dr: I have defined two-dimensional variables using ncdf in R, I am trying to write data to them, but I am getting an error message because R believes they are single-dimensional variables instead.
Anyone know how to fix this error?

Data specific print as PSPP variable view in R

I have a .sav file. I want to print out the data properly as PSPP variable view in R.
Succeeded to print the type of data, but not the other specific sg.: width, label, value label,...
I using following command to read data:
library(foreign)
library(memisc)
data <- read.spss("Database.sav", use.value.labels = FALSE,
max.value.labels = 100)
x = do.call(rbind,data)
Please check the following command for variable and value labels read from SPSS in R. Hope this will work...
library(foreign)
## Read SPSS data
data<-read.spss("Database.sav",use.value.labels=FALSE,to.data.frame=FALSE)
data_frame<-as.data.frame(data)
dim(data_frame)
# Variable Labels...
variable_labels <- attr(data, "variable.labels")
variable_labels
# Value Labels...
value_labels<-attr(data,"label.table")
value_labels

Resources