I have a list containing 2 or more dataframes:
d <- data.frame(x=1:3, y=letters[1:3])
f <- data.frame(x=11:13, y=letters[11:13])
df <- list(d, f)
to save them as .csv, I use the following syntax:
filenames = paste0('C:/Output_', names(df), '.csv')
Map(write.csv, df, filenames)
But I would like to add some strings to obtain a specific format, like:
quote = FALSE, row.names = FALSE, sep = "\t", na = "", col.names = FALSE
And the thing is that I am not that sure where to add that syntax. Wherever I try, I get a warning saying my syntax has been ignored.
> Warning messages:
1: In (function (...) : attempt to set 'col.names' ignored
2: In (function (...) : attempt to set 'sep' ignored
3: In (function (...) : attempt to set 'col.names' ignored
4: In (function (...) : attempt to set 'sep' ignored
Any suggestions? In BaseR preferably!
Why you're still getting col.names warnings: farther down in the documentation (?write.csv) you'll see
These wrappers [write.csv and write.csv2] are deliberately inflexible: they are designed to
ensure that the correct conventions are used to write a valid
file. Attempts to change ‘append’, ‘col.names’, ‘sep’, ‘dec’ or
‘qmethod’ are ignored, with a warning.
Should go away if you use write.table() instead.
You need to use anonymous function in order to be able to pass further arguments, i.e.
Map(function(...) write.csv(..., quote = FALSE, row.names = FALSE, sep = "\t", na = ""), df, filenames)
Related
So i have a wrapper function that contains a lot of sub functions. Rather than write out arguments for all the potential arguments for each sub functions in the wrapper. I want to use ... (dots) to allow them to pass through any number of arguments to change the behaviour of sub functions if necessary.
The problem is that several functions may wish to make use of different arguments ... and i keep getting unused argument errors.
So I've tried to use do.call and to update the formals outputs with matches from ... see below code for function
elipRead <- function( type, path, ...){
if(type == "csv"){
ar <- list(...)
args <- formals(readr::read_csv)
args$file <- path
args[which(names(args) %in% names(ar))] <- ar[na.omit(match(names(args), names(ar)))]
out <- do.call(readr::read_csv, args = args)
} else {
ar <- list(...)
args <- formals(readxl::read_xlsx)
args$path <- path
args[which(names(args) %in% names(ar))] <- ar[na.omit(match(names(args), names(ar)))]
out <- do.call(readxl::read_xlsx, args)
}
return(out)
}
However, despite checking the args list is updating correctly i still get errors
csv <-"csv_Filename.csv"
test1 <- elipRead("csv", paste0(getwd(),csv), sheet = "Sheet1" , col_names = FALSE)
# Error in default_locale() : could not find function "default_locale"
xlsx <-"xlsx_Filename.xlsx"
test2 <- elipRead("xlsx", paste0(getwd(),xlsx), sheet = "Sheet1", col_names = TRUE)
# Error: `guess_max` must be a positive integer
for the xlsx attempt the error is in the guess_max default where it cannot find
n_max object. I assume this is to do with do.call envir and n_max not being in the parent environment. For the csv issue again its an issue of not being able to find the default_local() function.
Error in check_non_negative_integer(guess_max, "guess_max") :
object 'n_max' not found
6.
check_non_negative_integer(guess_max, "guess_max")
5.
check_guess_max(guess_max)
4.
read_excel_(path = path, sheet = sheet, range = range, col_names = col_names,
col_types = col_types, na = na, trim_ws = trim_ws, skip = skip,
n_max = n_max, guess_max = guess_max, progress = progress,
.name_repair = .name_repair, format = "xlsx")
3.
(function (path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max), progress = readxl_progress(),
.name_repair = "unique") ...
2.
do.call(readxl::read_xlsx, args)
1.
elipRead("xlsx", paste0(add, xlsx), sheet = "Sheet1", col_names = TRUE)
In the end there are three potential answers i'm hoping for:
1 recommendations of changes to my current code to ensure the do.call function works.
2 An alternative method for using ... to only pass the relevant arguments from the ... dots list to a function.
3 An completely different approach for passing arguments from a wrapper to internal functions.
I am using lapply to read a list of files. The files have multiple rows and columns, and I interested in the first row in the first column. The code I am using is:
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
The first row has three columns but I am only reading the first one. From other posts on stackoverflow I found that the way to do this would be to use colClasses = c('character', 'NULL', 'NULL'). While this approach is working, I would like to know the underlying issue that is causing the following error message to be generated and hopefully prevent it from popping up:
"In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3"
It's to let you know that you're just keeping one column of the data out of three because it doesn't know how to handle colClasses of "NULL". Note your NULL is in quotation marks.
An example:
write.csv(data.frame(fi=letters[1:3],
fy=rnorm(3,500,1),
fo=rnorm(3,50,2))
,file="a.csv",row.names = F)
write.csv(data.frame(fib=letters[2:4],
fyb=rnorm(3,5,1),
fob=rnorm(3,50,2))
,file="b.csv",row.names = F)
file_list=list("a.csv","b.csv")
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
Which results in:
[[1]]
FALSE.
1 fi
[[2]]
FALSE.
1 fib
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3
Which is the same as if you used:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c('character', 'asdasd', 'asdasd'))
But the warning goes away (and you get the rest of the row as a result) if you do:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c( 'character',NULL, NULL))
You can see where errors and warnings come from in source code for a function by entering, for example, read.table directly without anything following it, then searching for your particular warning within it.
I am trying to read a rather big file with the read.table.ffdf method from the library ff. Unfortunately, the column-names of this table contain whitespaces, tabs and other special characters. It looks roughly like this (but with ~400 columns):
attribute_1;next attribute;who creates, these horrible) column&nämes
198705;RXBR ;2017-07-05 00:00:00
This isn't pretty, I know, but i am forced to work with this, so I have to set check.names to FALSE.
Furthermore, I am generating a list with the column-class-types which I do like this:
path <- 'path_to_csv-file'
headset <- read.csv(path, sep= ';', dec= '.', header = TRUE, nrows = 2, check.names = FALSE)
#print(headset)
headclasses <- vector(mode = 'character', length = 0)
#heavily simplified version - switch_statement is in an extra function
for(i in colnames(headset)){
headclasses[[i]] <- switch (i,
'attribute_1' = 'numeric',
'next attribute' = 'factor',
'who creates, these horrible) column&nämes' = 'POSIXct'
)
}
#print(colnames(headset))
#print(headclasses)
Now, if i call:
df <- read.table.ffdf(file=path, levels = NULL, appendLevels = TRUE, FUN = 'read.table', na.strings = c('\\N',''), sep= ';', dec= '.', colClasses = headclasses, check.names = FALSE , header = TRUE, nrows = 1e4, VERBOSE = TRUE)
I get the following error:
Error in repnam(colClasses, colnames(x), default = NA) :
the following argument names do not match'next attribute','(who creates, these horrible column&nämes)'
Why do I get this error? And how can I fix it so that I have the uglier strings as column names?
Note, that in the previous call, check.names is set to FALSE.
My work so far:
1. Trying with proper names but wrong check.names option when calling read.table.ffdf
If I let R choose proper column-names (i.e. check.names = TRUE in the first call to a read-method) and adjust the switch-statement accordingly, I get no error at all (yet a warning) even if I set check.names = FALSE in the read.table.ffdf-method:
headset <- read.csv(path, sep= ';', dec= '.', header = TRUE, nrows = 2)
print(headset)
headclasses <- vector(mode = 'character', length = 0)
#heavily simplified version - switch_statement is in an extra function
for(i in colnames(headset)){
headclasses[[i]] <- switch (i,
'attribute_1' = 'numeric',
'next.attribute' = 'factor',
'who.creates..these.horrible..column.nämes' = 'POSIXct'
)
}
print(colnames(headset))
print(headclasses)
my_df <- read.table.ffdf(file=path, levels = NULL, appendLevels = TRUE, FUN = 'read.table', na.strings = c('\\N',''), sep= ';', dec= '.', colClasses = headclasses, check.names = FALSE , header = TRUE, nrows = 2, VERBOSE = TRUE)
print(my_df)
print(colnames(my_df))
"attribute_1" "next.attribute" "who.creates..these.horrible..column.nämes"
Warning message:
In read.table(na.strings = c("\N", ""), sep = ";", dec = ".", colClasses > = list( :
not all columns named in 'colClasses' exist
So this works, when it shouldn't?
Of course, leaving out check.names when calling read.table.ffdf works in the same way, so somewhere something goes missing.
2. Checking source Code of read.table.ffdf
I went to the rdrr.io site (read.table.ffdf-source-code) to check the source code and tried to understand, what I am doing wrong. To cut it short, this is what happens to my file:
rt.args <- list(na.strings = c('\\N',''), sep= ';', dec= '.', colClasses = headclasses, check.names = FALSE , header = TRUE, nrows = 2)
rt.args$file <- path
asffdf_args <- list()
FUN <- 'read.table'
dat <- do.call(FUN, rt.args)
x <- do.call("as.ffdf", c(list(dat), asffdf_args))
#print(colnames(dat))
#print(colnames(x))
and this yields
"attribute_1" "next attribute" "who creates, these horrible) column&nämes"
"attribute_1" "next.attribute" "who.creates..these.horrible..column.nämes"
Ok, so this is where it goes wrong.
I don't know which asffdf_args to pass and since I am kind of new to R, I am not sure what to look for exactly other than some kind of check.names equivalent. I already had a look at the as.ffdf.data.frame method via
getAnywhere(as.ffdf.data.frame)
but that didn't help me understand what I should put in.
So, how can I make read.table.ffdf-work with the uglier column-names? Which 'asffdf_args' do I have to pass to make check.names = FALSE work in said method?
I could adapt my switch-statement (for roughly 400 columns), read the file with check.names = TRUE and after read.table.ffdf is done, I could set the column names to the desired ones (since I have to work with the nastier names later on). But this classifies as a workaround for me and does not satisfy me at all.
This is my first question here, so be gentle with me, if I am overlooking something major and feel free to push me in the right direction.
Thanks in advance for the help.
As is, you probably cannot pass arguments the way you would like to.
as.ffdf.data.frame() calls ffdf() on it's last line.
ffdf in turn calls make.names a few times, without checking any arguments.
If you edit ffdf(), and comment out the line vnam <- make.names(vnam, unique = TRUE) towards the very end of the function, then as.ffdf.data.frame() will be able to retain your funky column names.
I am not providing the modified version of ffdf as the function is more than 300 lines long.
I have tested with a new function ffdf_new, injecting it as follows:
# save original version
orig <- ff::ffdf
# devtools::install_github("miraisolutions/godmode")
godmode:::assignAnywhere("ffdf", ffdf_new)
# simple test below
DF <- data.frame(
'attribute_1' = 1:10,
'next attribute' = 3:12,
'who creates, these horrible) column&nämes' = 11:20,
check.names = FALSE
)
as.ffdf.data.frame(DF)[["who creates, these horrible) column&nämes"]]
## ff (open) integer length=10 (10)
## [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
## 11 12 13 14 15 16 17 18 19 20
# switch back
godmode:::assignAnywhere("ffdf", orig)
Consider the following two commands:
> write.csv(irfilt,'foo.bar',row.names=FALSE)
#works fine but:
> write.csv(irfilt,'foo.bar',row.n=FALSE)
Error in write.table(irfilt, "foo.bar", row.n = FALSE, col.names = NA, :
'col.names = NA' makes no sense when 'row.names = FALSE'
I would have expected row.n to auto-expand to row.names but apparently that's not happening. There isn't any other allowed argument to write.table which could be confused with row.names. Does anyone know what is causing this misinterpretation? I thought it might be related to the fact that write.csv has no named arguments, but it seems odd that I wouldn't just get an error message about an unknown argument, rather than a misinterpreted arg.
You don't get any partial argument matching inside of write.csv because write.csv's only argument is .... So write.csv's attempt to manipulate your call fails here:
rn <- eval.parent(Call$row.names)
Call$col.names <- if (is.logical(rn) && !rn) TRUE else NA
And row.n is matched to row.names in the call to write.table, but the write.table call generated by write.csv is:
write.table(irfilt, "foo.bar", row.n = FALSE, col.names = NA,
sep = ",", dec = ".", qmethod = "double")
Which is why you're getting the error about col.names = NA while row.names = FALSE.
I am using the TraMineR package. I am printing output to a CSV file, like this:
write.csv(seqient(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE)
write.csv(seqici(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
write.csv(seqST(sequences.seq), file = "diversity_measures.csv", quote = FALSE, na = "", row.names = TRUE, append= TRUE)
The dput(sequences.seq) object can be found here.
However, this does not append the output properly but creates this error message:
In write.csv(seqST(sequences.seq), file = "diversity_measures.csv", :attempt to set 'append' ignored
Additionally, it only gives me the output for the last command, so it seems like it overwrites the file each time.
Is it possible to get all the columns in a single CSV file, with a column name for each (i.e. entropy, complexity, turbulence)
You can use append=TRUE in write.table calls and use the same file name, but you'll need to specify all the other arguments as needed. append=TRUE is not available for the wrapper function write.csv, as noted in the documentation:
These wrappers are deliberately inflexible: they are designed to
ensure that the correct conventions are used to write a valid file.
Attempts to change append, col.names, sep, dec or qmethod are ignored,
with a warning.
Or you could write out
write.csv(data.frame(entropy=seqient(sequences.seq),
complexity=seqici(sequences.seq),
turbulence=seqST(sequences.seq)),
'output.csv')