Converting R file to Stata with missing string values - r

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .

I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.

You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

Related

Error in x[is.na(x)] <- na.string : replacement has length zero when exporting data frame to openxlsx in R

I have an issue when I try to export a data frame with the library openxlsx to an Excel. When I tried, this error happen:
openxlsx::write.xlsx(usertl_lp, file = "Mi_Exportación.xlsx")
Error in x[is.na(x)] <- na.string : replacement has length zero
usertl_lp_clean <- usertl_lp %>% mutate(across(where(is.list), as.character))
openxlsx::write.xlsx(usertl_lp_clean, file = "Mi_Exportación.xlsx")
This error may be caused by cells containing vectors. So, using across to modify the vector to character.
I posted this here for others in need.
I think you are looking for the writeData function from the same package.
Check out writeFormula from the same package as well or even write_xlsx from the writexl package.
I was having a similar problem in a data frame, but, in my case, I was using the related openxlsx::writeData.
The data frame was generated using sapply, with functions which could deliver errors because of the data. So, I coded to fill with NA when an error were generated. I ended up with NaN and NAs in the same column.
What worked for me is conducting the following treatment before writeData:
df[is.na(df)]<-''
so, for your problem, the following may work:
df[is.na(df)]<-''
openxlsx::write.xlsx(as.data.frame(df), file = "df.xlsx", colNames = TRUE, rowNames = FALSE, append = FALSE)

Write stata dataframe in R [duplicate]

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

R function doesn't recognize variable

I am not very familiar with loops in R, and am having a hard time stating a variable such that it is recognized by a function, DESeqDataSetFromMatrix.
pls is a table of integers. metaData is a data frame containing sample IDs and conditions corresponding to pls. I verified that the below steps run error-free with the individual elements of cond run successfully .
I reviewed relevant posts on referencing variables in R:
How to reference variable names in a for loop in R?
How to reference a variable in a for loop?
Based on these posts, I modified i in line 3 with single brackets, double brackets and "as.name". No luck. DESeqDataSetFromMatrix is reading the literal text after ~ and spits out an error.
cond=c("wt","dhx","mpp","taz")
for(i in cond){
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=~i, tidy = TRUE)
"sizeFactors"(dds) <- 1
paste0("PLS",i)<-DESeq(dds)
pdf <- paste(i,"-PLS_MA.pdf",sep="")
tsv <- paste(i,"-PLS.tsv",sep="")
pdf(file=pdf,paper = "a4r", width = 0, height = 0)
plotMA(paste0("PLS",i),ylim=c(-10,10))
dev.off()
write.table(results(paste0("PLS",i)),file = tsv,quote=FALSE, sep='\t', col.names = NA)
}
With brackets, an unexpected symbol error populates.
With i alone, DESEqDataSetFromMatrix tries to read "i" from my metaData column.
Is R just not capable of reading variables in some situations? Generally speaking, is it better to write loops outside of R in a more straightforward language, then push as standalone commands? Thanks for the help—I hope there is an easy fix.
For anyone else who may be having trouble looping with DESeq2 functions, comments above addressed my issue.
Correct input:
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=as.formula(paste0("~", i)), tidy = TRUE)
as.formula worked well with all DESeq functions that I tested.
reformulate(i) also worked well in most situations.
Thanks, everyone for the help!

Error importing SPSS data into R

I imported a dataset in the .sav SPSS format, and I'm getting an error that I haven't seen before.
1: In read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", ... :
C:\Users\acer\Desktop\X\X\PIREDEU\ees2009_v0.9_20110622.sav: File contains duplicate label for value 1.1 for variable V200
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
This came up after I typed warnings(PIREDEU). I imported the data using the foreign library:
library(foreign)
PIREDEU<-read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
I've fiddled with various combinations for the latter three arguments of the read.spss function, and I've gotten nowhere.
Anyone have any suggestions?
I used the below one and it worked perfectly, just ignore the warning message and check data by typing its name:
mydata4<-read.spss("C:\\Work\\data.sav",use.value.labels=F,to.data.frame=T)
mydata4 # check data
Do you have long strings in the file - longer than 8 bytes? Statistics uses some special arrangements to handle those. It looks like the problem is with the value labels. If you can delete those (using SPSS) you might be able to get the rest of the data.
Try to read data without labels.
library(foreign)
PIREDEU <- read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav",
use.value.labels = F,
to.data.frame = T)
Does it work?
Convert the spss datafile into .por (portable file) and in R, install the packages hMisc, memisc and foreign and load the package using library(foreign), library(hMisc) and library(memisc).
Then type the following:
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

Imported a csv-dataset to R but the values becomes factors

I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:
setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP
However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.
Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.
By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.
It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with
point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))
But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)
You can set this globally for all read.csv/read.* commands with
options(stringsAsFactors=F)
Then read the file as follows:
my.tab <- read.table( "filename.csv", as.is=T )
When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").
The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:
stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")
This should import all variables as either integers or numeric.
None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric
or you can specify which columns to convert:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric
Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:
stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)
I'm new to R as well and faced the exact same problem. But then I looked at my data and noticed that it is being caused due to the fact that my csv file was using a comma separator (,) in all numeric columns (Ex: 1,233,444.56 instead of 1233444.56).
I removed the comma separator in my csv file and then reloaded into R. My data frame now recognises all columns as numbers.
I'm sure there's a way to handle this within the read.csv function itself.
This only worked right for me when including strip.white = TRUE in the read.csv command.
(I found the solution here.)
for me the solution was to include skip = 0
(number of rows to skip at the top of the file. Can be set >0)
mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",", skip = 22)

Resources