I am trying to use imputed data created with MICE in Stata.
My understanding of the steps are:
1) converting the mids object to mi in R
m=20
completed=lapply(1:20,function(i)complete(imp,i))
completed.mi=do.call(Zelig::mi,completed)
2) preparing mice object for exporting in R
(a) mi2stata
STATA=mi::mi2stata(completed.mi, m=20, file="C:\\Users\\STATA.csv",
missing.ind = FALSE)
Note: after loading the data into Stata, version 11 or later, type 'mi
import ice' to register the data as being multiply imputed.
For Stata 10 and earlier, install MIM by typing 'findit mim' and include
'mim:' as a prefix for any command using the MI data.
Error in lapply(X = X, FUN = FUN, ...) :
trying to get slot "data" from an object (class "mi") that is not an S4
object
(b) Following the suggestion from below to write a csv without mi2stata:
data_out <- data.table::rbindlist(completed, idcol="m")
write.csv(data_out, "C:\\deleted\\STATA2.csv", row.names=FALSE)
3) importing the CSV file of the original, nonimputed data into Stata
**appears to have worked fine. all variables from CSV file appears on the
right-hand side
4) use mi import ice command in Stata
(a) error re: mi2stata (I had actually imported the non-imputed file)
. mi import ice STATA
varlist not allowed
r(101);
(b) error in reading CSV version of imputed data
mi import ice[stata2]
weights not allowed
r(101);
I have encountered errors with 2, 4, and possibly 1 (as error for 2 refers back to conversion of mice object to mi class data). I would really appreciate a user friendly step by step guidance. Although mi2stata might not work directly work for mice objects, I am still interested in learning a solution for this.
Collecting the comments above: you can't use mi::mi2stata with either the data that results from Zelig::mi or from mice::complete. But if you look at the code for mi::mi2stata, it just seems to stack the raw data, and each imputed dataset. It then adds indices to mark each dataset, and each observation.
library(mice)
# don't really need data.table but makes adding the indices easier
library(data.table)
# Function to export mice imputed datasets
mice2stata <- function(imp, path="stata", type="dta"){
completed <- lapply(seq_len(imp$m),function(i) complete(imp,i))
data_out <- rbindlist(completed, idcol="_mj")
data_out <- rbind(imp$data, data_out, fill=TRUE)
data_out[, `_mj` := replace(`_mj`, is.na(`_mj`), 0L)]
data_out[, `_mi` := rowid(`_mj`)]
if(type=="dta") {
foreign::write.dta(data_out, file=paste(path, type, sep="."))
} else {
write.csv(data_out, file=paste(path, type, sep="."), na="", row.names=FALSE)
}
}
An example
imp <- mice(nhanes, m=2, print=FALSE)
mice2stata(imp, type="dta")
Then in Stata use
use path\to\stata.dta
mi import ice
Q4 looks straightforward. The syntax for that command (not function) is documented as
mi import ice [, options]
and so STATA looks like an attempt to specify a variable list. Where does that come from?
If Q2 failed, was the point of Q3 and Q4?
I hope that some R user can add some comments on Q2. On the face of it, you got an explicit error message, so do you think it's wrong?
Related
I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')
I am analyzing FCS files from a CyTOF experiment using Flowcore package
. When I import and export my FCS files using read.FCS and write.FCS, I find that these functions have corrupted my FCS file and all channels are affected and the data looks like the tSNE in the picture below (not what is expected or meaningful).
I'm using R (ver.3.6), Rstudio (1.2.1335), and flowcore ver.3.9.
Here is the code I have used:
library(flowCore)
#Import FCS file
myfilename<-"export_MIX_NT_Ungated_viSNE.fcs"
myfile_fcs<-read.FCS(myfilename,
transformation="linearize", which.lines=NULL,
alter.names=FALSE, column.pattern=NULL)
#I plan to do some data analysis here in the final version before exporting below
#export the fcs file and rename it to T_+filename
write.FCS(myfile_fcs,paste("T_",keyword(myfile_fcs)$"$FIL",sep=""), what="numeric")
and this is what the original file looks like before import into R
and this is what the exported result looks like after export
Here is the file that we have used for this code: dropbox link for the example file
I've looked into your problem and at first I was skeptical about the transformation of read.fcs. Looking into your example file, I also see that there are already columns for your original (full plot) tsne plot, so I'm assuming flowjo is rewriting the tsne values after you read/write it into R. Since Flowcore is generally more targeted towards flow data and not cytof, I took a few pieces of this Bioc2017 walkthough and recreated the transformations, which seems to work better although I'm not sure how flowjo will handle the data now. If you were going to do more work on the data though, we now have it at an accessible low level so you can basically do whatever you want. Here's my code.
fcs_raw <- read.flowSet("~/Downloads/export_MIX_NT_Ungated_viSNE.fcs", transformation = FALSE,
truncate_max_range = FALSE)
fcs <- fsApply(fcs_raw, function(x, cofactor = 5){
expr <- exprs(x)
expr <- asinh(expr[,] / cofactor)
exprs(x) <- expr
x
})
expr <- fsApply(fcs, exprs)
library(matrixStats)
rng <- colQuantiles(expr, probs = c(0.01, 0.99))
expr01 <- t((t(expr) - rng[, 1]) / (rng[, 2] - rng[, 1]))
expr01[expr01 < 0] <- 0
expr01[expr01 > 1] <- 1
expr01
summary(expr01)
Be aware that this does mess up your original tSNE column numbers, so if these were important to you, I would read the flowset, make a copy of those columns, and move on with the data analysis in the code. If you have future questions or analysis with flow data feel free to contact me directly.
#csugai, thanks for your answer. The truncate_max_range = FALSE argument in the read.flowSet function caught my eyes so I included that into my read.FCS function and that fixed the problem! Although I didn't really understand other parts of your code that resulted in a binned data.
I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")
We are working in Stata with data created in R, that have been exported using haven package. We stumbled upon an issue with variables that have a dot in the name. To replicate the problem, some minimal R code:
library("haven")
var.1 <- c(1,2,3)
var_2 <- c(1,2,3)
test_df <- employ.data <- data.frame(var.1, var_2)
str(test_df)
write_dta(test_df, "D:/test_df.dta")
Now, in Stata, when I do:
use "D:\test_df.dta"
d
First problem - I get an empty dataset. Second problem - we get variable name with a dot - which in Stata should be illegal. Therefore any command using directly the variable name like
drop var.1
returns an error:
factor variables and time-series operators not allowed
r(101);
What is causing such behaviour? Any solutions to this problem?
This will drop var.1 in Stata:
drop var?1
Here (as in Excel), ? is used as a wildcard for a single character. (The regular expression equivalent to .)
Unfortunately, this will also drop var_1, if it exists.
I am not sure about the missing values when writing a .dta file with haven. I am able to replicate this result in Stata 14.1 and haven 0.2.0.
However, using the read_dta function from haven,
temp2 <- read_dta("test_df.dta")
returns the data.frame. As an alternative to haven, I have used the readstata13 package in the past without issues.
library(readstata13)
save.dta13(test_df, "testdf.dta")
While this code has the same variable names issue, it provided a .dta file that contained the correct values when read into Stata 14.1. There is a convert.underscore argument to save.dta13, that is intended to remove non-valid characters in Stata variable names. I verified that it will work properly in this example for readstata13 for version 0.8.5, but had a bug in some earlier versions including version 0.8.2.
I imported a dataset in the .sav SPSS format, and I'm getting an error that I haven't seen before.
1: In read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", ... :
C:\Users\acer\Desktop\X\X\PIREDEU\ees2009_v0.9_20110622.sav: File contains duplicate label for value 1.1 for variable V200
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
This came up after I typed warnings(PIREDEU). I imported the data using the foreign library:
library(foreign)
PIREDEU<-read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
I've fiddled with various combinations for the latter three arguments of the read.spss function, and I've gotten nowhere.
Anyone have any suggestions?
I used the below one and it worked perfectly, just ignore the warning message and check data by typing its name:
mydata4<-read.spss("C:\\Work\\data.sav",use.value.labels=F,to.data.frame=T)
mydata4 # check data
Do you have long strings in the file - longer than 8 bytes? Statistics uses some special arrangements to handle those. It looks like the problem is with the value labels. If you can delete those (using SPSS) you might be able to get the rest of the data.
Try to read data without labels.
library(foreign)
PIREDEU <- read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav",
use.value.labels = F,
to.data.frame = T)
Does it work?
Convert the spss datafile into .por (portable file) and in R, install the packages hMisc, memisc and foreign and load the package using library(foreign), library(hMisc) and library(memisc).
Then type the following:
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors