R Data file not converting to Stata file - r

I am getting this error. Cannot figure out why? Any advise?
library(foreign)
x <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(x, 'x.dta')
Error in write.dta(x, "x.dta") :
4 arguments passed to .Internal(nchar) which requires 3

The haven package works much better than foreign in this case as it will read strings (including empty strings) as string values.
library( haven )
x <- data.frame( a = "", b = 1, stringsAsFactors = FALSE )
write_dta( x, 'x.dta' )
Alternatively, if you pass parameter a a value when creating the data frame, instead of an empty string, foreign will be fine.
x <- data.frame( a = "a", b = 1, stringsAsFactors = FALSE )
write.dta( x,"y.dta" )
As you're using an older version of Stata, haven is the way to go, as you can specify the version of Stata you wish the dta file to be compatible with.
write_dta( x, 'x.dta', version = 13 )

Related

How to solve the problem of character change that use write.xlsx() to writes data into excel document in R language?

I write a data.frame into an excel document through the function of write.xlsx. The header of the data.frame contains the characters like "95%CI", "Pr(>|W|)", etc. The data.frame is output in the r console without any problem, but when I written it into Excel file through write.xlsx(), 95% CI becomes X95.CI, and Pr(>|W|) becomes Pr...W..
How to solve this problem?
The test code is as follows:
library("openxlsx")
mydata <- data.frame("95%CI" = 1,
"Pr(>|W|)" =2)
write.xlsx(mydata,
"test.xlsx",
sheetName = "test",
overwrite = TRUE,
borders = "all", colWidths="auto")
I don't think this code works correctly in R console as well.
mydata <- data.frame("95%CI" = 1,"Pr(>|W|)" =2)
mydata
# X95.CI Pr...W..
#1 1 2
You have some non-standard characters in column names (like %, (, > etc), if you want to keep them use check.names = FALSE in data.frame function.
mydata <- data.frame("95%CI" = 1,"Pr(>|W|)" =2, check.names = FALSE)
mydata
# 95%CI Pr(>|W|)
#1 1 2
Now when you write it to excel -
openxlsx::write.xlsx(mydata,
"test.xlsx",
sheetName = "test",
overwrite = TRUE,
borders = "all", colWidths="auto")

R :Read csv numeric with comma in decimal, package sparklyr

I need to read a file of type ".csv" using the library "sparklyr", in which the numeric values appear with commas. The idea is to be able to read using "spark_read_csv()" directly.
I am using:
library(sparklyr)
library(dplyr)
f<-data.frame(DNI=c("22-e","EE-4","55-W"),
DD=c("33,2","33.2","14,55"),CC=c("2","44,4","44,9"))
write.csv(f,"aff.csv")
sc <- spark_connect(master = "local", spark_home = "/home/tomas/spark-2.1.0-bin-hadoop2.7/", version = "2.1.0")
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",")
tbl <- sdf_copy_to(sc = sc, x =df , overwrite = T)
The problem, read the numbers as factor
To manipulate string inside a spark df you can use regexp_replace function as mentioned here:
https://spark.rstudio.com/guides/textmining/
For you problem it would work out like this:
tbl <- sdf_copy_to(sc = sc, x =df, overwrite = T)
tbl0<-tbl%>%
mutate(DD=regexp_replace(DD,",","."),CC=regexp_replace(CC,",","."))%>%
mutate_at(vars(c("DD","CC")),as.numeric)
to check your result:
> glimpse(tbl0)
Observations: ??
Variables: 3
$ DNI <chr> "22-e", "EE-4", "55-W"
$ DD <dbl> 33.20, 33.20, 14.55
$ CC <dbl> 2.0, 44.4, 44.9
If u dont want to replace it with '.' maybe you can try this.
spark_read_csv
Check the documentation. Use escape parameter to specify which character you are trying to ignore.
In this case try using:
df <- spark_read_csv(sc, name = "data", path = "/home/tomas/Documentos/Clusterapp/aff.csv", header = TRUE, delimiter = ",", escape = "\,").
You could replace the "," in the numbers with "." and convert them to numeric. For instance
df$DD<-as.numeric(gsub(pattern = ",",replacement = ".",x = df$DD))
Does that help?

How do I fix the “No encoding Supplied” error?

I am facing difficulties after running the code and trying to export the dataset to a spreadsheet or txt.file.
I am newbie to R, so maybe this question is trivial.
After running the following code:
eia_series <- function(api_key, series_id, start = NULL, end = NULL, num = NULL, tidy_data = "no", only_data = FALSE){
# max 100 series
# test if num is not null and either start or end is nut null. Not allowed
# api_key test for character.
# series_id test for character.
# if start/end not null, then check if format matches series id date format
# parse date and numerical data
# parse url
series_url <- httr::parse_url("http://api.eia.gov/series/")
series_url$query$series_id <- paste(series_id, collapse = ";")
series_url$query$api_key <- api_key
series_url$query$start <- start
series_url$query$end <- end
series_url$query$num <- num
# get data
series_data <- httr::GET(url = series_url)
series_data <- httr::content(series_data, as = "text")
series_data <- jsonlite::fromJSON(series_data)
# Move data from data.frame with nested list and NULL excisting
series_data$data <- series_data$series$data
series_data$series$data <- NULL
# parse data
series_data$data <- lapply(X = series_data$data,
FUN = function(x) data.frame(date = x[, 1],
value = as.numeric(x[, 2]),
stringsAsFactors = FALSE))
# add names to the list with data
names(series_data$data) <- series_data$data
# parse dates
series_data$data <- eia_date_parse(series_list = series_data$data, format_character = series_data$series$f)
# tidy up data
if(tidy_data == "tidy_long"){
series_data$data <- lapply(seq_along(series_data$data),
function(x) {cbind(series_data$data[[x]],
series_time_frame = series_data$series$f[x],
series_name = series_data$series$series_id[x],
stringsAsFactors = FALSE)})
series_data$data <- do.call(rbind, series_data$data)
}
# only data
if(only_data){
series_data <- series_data$data
}
return(series_data)
}
After running the function
eia_series(api_key = "XXX",series_id = c("PET.MCRFPOK1.M", "PET.MCRFPOK2.M"))
I tried to "transfer" the data in order to export it but got the following error:
No encoding supplied: defaulting to UTF-8.
I don't understand why. Could you help me out?
That doesn't look like an error, rather a statement. Probably coming from httr::content(series_data, as = "text"). Look in https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html in The body section. It shouldn't be a problem, as long as your data returns what you expect. Otherwise you can try different encoding or there is a bug elsewhere.
Try:
series_data <- httr::content(series_data, as = "text", encoding = "UTF-8")

reading row names in read_csv2 (readr package)

I am trying to load an example dataset from here: http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv to run an example PCA.
The correctly loaded data frame can be replicated with this line of code:
decathlon = read.csv('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
header = TRUE, row.names = 1, check.names = FALSE,
dec = '.', sep = ';')
However, I was wondering if this can be simulated with function(s) from readr package. Suitable function for this seems to be read_csv2, however, the row.names command is not available:
dplyrtlon = read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_names = TRUE, col_types = NULL, skip = 0)
Any suggestion on how to do this within readr?
readr returns tibbles instead of data frames. Tibbles are much faster and memory efficient than data frames but do not support row names.
Depending on what you want to do with your data after reading it in, you could either, add a column name to the first column (it looks like last names):
dplyrtlon <- read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_types = NULL, skip = 0)
names(dplyrtlon)[1] <- "last_name"
or you could convert the variable to a data frame, and use the content of the first column to set up row names:
r <- as.data.frame(dplyrtlon)
rownames(r) <- r[, 1]
r <- r[, -1]

R: Accessing and using names of data.frames within a list during an `apply` function

I'm trying to write a function that takes a list of data.frames (somelist) and writes them out to .csv files, each named the same as the data.frame being written (i.e., a1.csv, a2.csv, a3.csv).
somelist <- list(a1 = data.frame(a = 1, b = 2, c = 3),
a2 = data.frame(d = 4, e = 5, f = 6),
a3 = data.frame(g = 7, h = 8, i = 9))
csvout <- function (y) {
z <- deparse(substitute(y))
write.table(y, file = paste0("~/somefolder/",
z,
".csv"),
sep = ",",
row.names = FALSE)
print(z)
}
sapply(somelist, csvout)
That is the closest I get, and instead, the files are named what is printed for z:
[1] "X[[1L]]"
[1] "X[[2L]]"
[1] "X[[3L]]"
a1 a2 a3
"X[[1L]]" "X[[2L]]" "X[[3L]]"
In searching the documentation for an answer, I think I'm on the right track here in ?sapply, but I've been unable to connect the dots:
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index
Update
I can use a for loop using syntax similar to C programming...
for (i in 1:length(somelist)) {
write.table(i, file = paste0("~/somefolder/",
names(somelist)[i],
".csv"),
sep = ",",
row.names = FALSE)
}
But I'm trying to code more natively in R and I'm aware that the apply family, or at least using a vectorized for loop (i.e., for (all in somelist) { instead of for (i in 1:length(somelist)) {) is preferable for efficiency and proper coding etiquette.
This is really, really, really dirty, but I think it works as you described. Of course for more than nine data.frames it needs adjustment in the substitutes part.
csvout <- function (y, csvnames) {
write.table(y, file = paste0("test",
csvnames[as.numeric(substr(deparse(substitute(y)),4,4))],
".csv"),
sep = ",",
row.names = FALSE)
}
sapply(somelist, FUN=csvout, names(somelist))
I suppose you know that, but if you implemented a FOR-loop instead of sapply this would be much easier because you could directly reference the data.frame names with the names function.
Edit:
This is the FOR-loop solution which works no matter how many data.frames you've got:
csvout <- function (y) {
for (i in 1:length(y)){
write.table(y[i], file = paste0("test",
names(y)[i],
".csv"),
sep = ",",
row.names = FALSE)
}
}
csvout(somelist)
You can use apply in similar fashion like apply(1:length,function(i){})

Resources