This question is almost the same as a previous question, but differs enough that the answers for that question don't work here. Like #chase in the last question, I want to write out multiple files for each split of a dataframe in the following format(custom fasta).
#same df as last question
df <- data.frame(
var1 = sample(1:10, 6, replace = TRUE)
, var2 = sample(LETTERS[1:2], 6, replace = TRUE)
, theday = c(1,1,2,2,3,3)
)
#how I want the data to look
write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file="test.txt")
#whole df output looks like this:
#test.txt
>1_A
1
>8_A
1
>4_A
2
>9_A
2
>2_A
3
>1_A
3
However, instead of getting the output from the entire dataframe I want to generate individual files for each subset of data. Using d_ply as follows:
d_ply(df, .(theday), function(x) write(paste(">", df$var1,"_", df$var2, "\n", df$theday, sep=""), file=paste(x$theday,".fasta",sep="")))
I get the following output error:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning messages:
1: In if (file == "") file <- stdout() else if (substring(file, 1L, :
the condition has length > 1 and only the first element will be used
2: In if (substring(file, 1L, 1L) == "|") { :
the condition has length > 1 and only the first element will be used
Any suggestions on how to get around this?
Thanks,
zachcp
There were two problems with your code.
First, in constructing the file name, you passed the vector x$theday to paste(). Since x$theday is taken from a column of a data.frame, it often has more than one element. The error you saw was write() complaining when you passed several file names to its file= argument. Using instead unique(x$theday) ensures that you will only ever paste together a single file name rather than possibly more than one.
Second, you didn't get far enough to see it, but you probably want to write the contents of x (the current subset of the data.frame), rather than the entire contents of df to each file.
Here is the corrected code, which appears to work just fine.
d_ply(df, .(theday),
function(x) {write(paste(">", x$var1,"_", x$var2, "\n", x$theday, sep=""),
file=paste(unique(x$theday),".fasta",sep=""))
})
Related
library(tidyverse)
y <- read_tsv("assignment_data.tsv")
x <- 1
When I check R console I get the following:
> y <- read_tsv("assignment_data.tsv", header=TRUE)
Error in read_tsv("assignment_data.tsv", header = TRUE) :
unused argument (header = TRUE)
>
> x <- 1
>
However, I can only access x in the global environment and I can't visualize the data in the file I tried to import.
Regarding your error:
Error in read_tsv("assignment_data.tsv", header = TRUE) :
unused argument (header = TRUE)
If you use
?read_tsv
you will find header is not one of the arguments. Instead, you are looking for col_names
Edit:
We found out the problem laid within the tsv itself. The number of column names did not match the number of columns (implied by data)
I have a group of .xls files containing data for different periods of the year. I would like to merge them so that I have all the data in one file. I tried the following code:
#create files list
setwd("~/2010")
file.list <- list.files( pattern = ".*\\.xls$", full.names = TRUE )
When I continue, I get some warnings but I don't think they are relevent. See below:
#read files
> l <- lapply( file.list, readxl::read_excel )
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, ... :
Expecting numeric in F1944 / R1944C6: got '-'
2: In read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, ... :
Expecting numeric in H1944 / R1944C8: got '-'
Then, I run the following line and the problems with the attributes pop up:
> dt <- data.table::rbindlist( l, use.names = TRUE, fill = TRUE )
Error in data.table::rbindlist(l, use.names = TRUE, fill = TRUE) :
Class attribute on column 15 of item 4 does not match with column 15 of item 1.
Can someone help me to fix this? Many thanks in advance
If you are going to bind together two datasets, the classes of the columns must match. Yours apparently do not. So you somehow need to address these mismatches.
Because you did not supply a col_types argument to read_xl::read_excel, it is guessing column types. I assume you expect the columns to be the same class in all of the data frames (otherwise, why bind them?) in which case you could pass a col_types argument so that read_xl::read_excel doesn't have to guess.
The error messages here are useful: I think they are saying that a column was guessed to be numeric but then the parser encountered a "-". Maybe this led to the column being assigned class "character". Perhaps "-" appears in the raw data to indicate a missing value. Then passing na = c("", "-") to read_xl::read_excel could resolve the issue.
I have a very stupid question. It has been already asked, but none of the solutions provided seem to work with me.
I am looping over a list containing different data frames, to perform an analysis and save an output file named differently for each input data frame. The name would be something like originalname_output.txt.
I wrote this piece of code which seems to work fine (does all the analysis in the correct ways), but gives an error when coming to the write.table part.
library(qqman)
library(QuASAR)
list_QuASAR <- list (Fw, Rv, tot) #all of the are dfs
for (i in list_QuASAR){
output <- fitQuasarMpra(i[,2], i[,3], i[,4])
print(sum(output$padj_quasar<0.1))
qq(output$pval3, col = "black", cex = 1)
write.table(output, paste0("quasar_output/", i, "_output.txt"), col.names = T, sep = "\t")
}
fitQuasarMpra is a function of a package called QuASAR. Of course the subdirectory called quasar_output already exists.
The error I am getting is:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
I know it's a trivial problem but I am currently stuck. I may consider to switch and use lapply, but then I may encounter the same problem and I wanted to solve this first.
Many thanks for you help.
You're trying to use a data frame object (i) as part of a file name; i.e. the data frame itself, not its name. You could try iterating over a named list instead:
list_QuASAR <- list (Fw = Fw,Rv = Rv,tot = tot)
for (i in names(list_QuASAR)){
output <- fitQuasarMpra(list_QuASAR[[i]][,2], list_QuASAR[[i]][,3], list_QuASAR[[i]][,4])
print(sum(output$padj_quasar<0.1))
qq(output$pval3, col = "black", cex = 1)
write.table(output, paste0("quasar_output/", i, "_output.txt"), col.names = T, sep = "\t")
}
I created a function clean_to_CSV(df) that takes in a data frame, cleans it, spits it back out, and also writes it to a CSV, with the CSV's filename using the inputted name of the dataset:
clean_to_CSV <- function(df) {
# df <- # code that cleans the df (runs with no errors)
write.csv(df, file = paste0(deparse(substitute(df)), "_clean.csv"), row.names = FALSE)
df
}
However, this returns:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning messages:
3: In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
This is very puzzling because 1) taking the exact same write.csv... line and running it outside the function works perfectly. Also, I know that writing to CSV and returning the df don't interfere with each other. Finally, I did look at related SO posts, but they were either more complex questions or didn't have a solid answer. None were such a simple case where one line of code works outside a function but not inside it.
After you apply the cleaning code (the part you commented out), df has a different representation that no longer has the name it had when it was an argument value.
To fix it, just capture the name of the object as soon as you enter the function, and then reference that value later. Here's an example.
clean_to_CSV <- function(df) {
obj_name = deparse(substitute(df))
# df <- # code that cleans the df (runs with no errors)
write.csv(df, file = paste0(obj_name, "_clean.csv"), row.names = FALSE)
df
}
The question says it all - I want to take a list object full of data.frames and write each data.frame to a separate .csv file where the name of the .csv file corresponds to the name of the list object.
Here's a reproducible example and the code I've written thus far.
df <- data.frame(
var1 = sample(1:10, 6, replace = TRUE)
, var2 = sample(LETTERS[1:2], 6, replace = TRUE)
, theday = c(1,1,2,2,3,3)
)
df.daily <- split(df, df$theday) #Split into separate days
lapply(df.daily, function(x){write.table(x, file = paste(names(x), ".csv", sep = ""), row.names = FALSE, sep = ",")})
And here is the top of the error message that R spits out
Error: Results must have one or more dimensions.
In addition: Warning messages:
1: In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
What am I missing here?
Try this:
sapply(names(df.daily),
function (x) write.table(df.daily[[x]], file=paste(x, "txt", sep=".") ) )
You should see the names ("1", "2", "3") spit out one by one, but the NULLs are the evidence that the side-effect of writing to disk files was done. (Edit: changed [] to [[]].)
You could use mapply:
mapply(
write.table,
x=df.daily, file=paste(names(df.daily), "txt", sep="."),
MoreArgs=list(row.names=FALSE, sep=",")
)
There is thread about similar problem on plyr mailing list.
A couple of things:
laply performs operations on a list. What you're looking for is d_ply. And you don't have to break it up by day, you can let plyr do that for you. Also, I would not use names(x) as that returns all of the column names of a data.frame.
d_ply(df, .(theday), function(x) write.csv(x, file=paste(x$theday,".csv",sep=""),row.names=F))