Use of lapply with custom functions of several arguments - r

I am having trouble using lapply with a custom function I created, but it is a fairly specific function and I haven't been able to find an answer that suits me.
The function is quite long and does a lot of things, but I think I've managed to trim it down to a reproducible example that gives the error I am getting.
The thing is, I have a folder with two different types of files, that I want to load into R as elements of a list. They have more or less the same information, but they have different layout, different file extension, different almost everything, so the process to import them is different for each type.
The function goes like this:
f <- function(a_file, b_file, type){
if (type == "a"){act <- read.table(a_file, skip = 19, header = TRUE, sep = "", dec = ".")}
if (type == "b"){act <- read.table(b_file, header=FALSE, sep="\t", dec=".")}
return(act)}
Then I create vectors with the names of the two types of files I want to call, like this:
a_files <- dir(pattern=".deg")
b_files <- dir(pattern=".act")
And finally try to apply the function like this:
act_list <- lapply(a_files, f, b_files, type = "b")
which works if type = "a", but fails for type = "b", giving the error:
Error in file(file, "rt") : invalid 'description' argument
which I am pretty sure has to do with the fact that I am applying the function to only the "a_files" vector and not to "b_files", but I as much as I try I can't figure out a way to fix it...

There is a much simpler way to solve your problem.
First, define the function to have only one argument for the filename. Then decide inside it what type of file it is and do the right type of read.
readFiles = function(file){
if(grepl(file,pattern = "\\.deg")){
act <- read.table(a_file, skip = 19, header = TRUE, sep = "", dec = ".")
}
if(grepl(file,pattern = "\\.act")){
act <- read.table(b_file, header=FALSE, sep="\t", dec=".")
}
return(act)
}
Finally, you only have to call lapply on your files vector:
filesVector = dir(pattern = "(\\.act|\\.deg)")
result = lapply(filesVector, readFiles)
filesVector will return all files that contain ".act" or ".deg".
NOTE: Your pattern is not correct, since it will return files that contain any character followed by "act" or by "deg".

Related

Dynamically dropping files while reading using lapply

I've got the following code to read several files and append them separately to a single list along with the file name
foo <- function(fname){
fread(fname, skip = 5, header = TRUE, sep = " ") %>%
mutate(fn = fname)
}
all <- lapply(files, FUN = foo)
After the file is read, I would like to insert a condition in the function which checks for some properties in the file failing which it drops the file along with the filename.
Not strictly related to reading a table but other files also
Edit:
I also use the following efficient method of doing it from here:
all <- setNames(lapply(files, foo), files)
I tried the following fairly simple option using Filter.
I used the condition in the Filter
all <- setNames(lapply(myFiles, function(x) {readLAS(x)}), myFiles)
all <- Filter(function(x) {area(x) >= 640}, all)
Not while running the lapply function but it uses only one extra line of code

Write.table using looped variable in a for loop

I have a very stupid question. It has been already asked, but none of the solutions provided seem to work with me.
I am looping over a list containing different data frames, to perform an analysis and save an output file named differently for each input data frame. The name would be something like originalname_output.txt.
I wrote this piece of code which seems to work fine (does all the analysis in the correct ways), but gives an error when coming to the write.table part.
library(qqman)
library(QuASAR)
list_QuASAR <- list (Fw, Rv, tot) #all of the are dfs
for (i in list_QuASAR){
output <- fitQuasarMpra(i[,2], i[,3], i[,4])
print(sum(output$padj_quasar<0.1))
qq(output$pval3, col = "black", cex = 1)
write.table(output, paste0("quasar_output/", i, "_output.txt"), col.names = T, sep = "\t")
}
fitQuasarMpra is a function of a package called QuASAR. Of course the subdirectory called quasar_output already exists.
The error I am getting is:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
I know it's a trivial problem but I am currently stuck. I may consider to switch and use lapply, but then I may encounter the same problem and I wanted to solve this first.
Many thanks for you help.
You're trying to use a data frame object (i) as part of a file name; i.e. the data frame itself, not its name. You could try iterating over a named list instead:
list_QuASAR <- list (Fw = Fw,Rv = Rv,tot = tot)
for (i in names(list_QuASAR)){
output <- fitQuasarMpra(list_QuASAR[[i]][,2], list_QuASAR[[i]][,3], list_QuASAR[[i]][,4])
print(sum(output$padj_quasar<0.1))
qq(output$pval3, col = "black", cex = 1)
write.table(output, paste0("quasar_output/", i, "_output.txt"), col.names = T, sep = "\t")
}

R unable to detect that I have more than one column in loaded files

What I want to do is take every file in the subdirectory that I am in and essentially just shift the column header names over one left.
I try to accomplish this by using fread in a for loop:
library(data.table)
## I need to write this script to reorder the column headers which are now apparently out of wack
## I just need to shift them over one
filelist <- list.files(pattern = ".*.txt")
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
However, I keep getting the following or a variant of the following error message:
Error in names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1", :
'names' attribute [8] must be the same length as the vector [1]
Which is confusing to me because, as you can clearly see above, R Studio is able to load the files as having the correct number of columns. However, the error message seems to imply that there is only one column. I have tried different functions, such as colnames, and I have even tried to define the separator as being quotation marks (as my files were previously generated by another R script that quotation-separated the entries), to no luck. In fact, if I try to define the separator as such:
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], sep = "\"", fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
I get the following error:
Error in fread(filelist[[i]], sep = "\"", fill = TRUE) :
sep == quote ('"') is not allowed
Any help would be appreciated.
I think the problem is that, despite the name, list.files returns a character vector, not a list. So using [[ isn't right. Then, with assign, you create an objects that have the same name as the files (not good practice, it would be better to use a list). Then you try to modify the names of the object created, but only using the character string of the object name. To use an object who's name is in a character string, you need to use get (which is part of why using a list is better than creating a bunch of objects).
To be more explicit, let's say that filelist = c("data1.txt", "data2.txt"). Then, when i = 1, this code: assign(filelist[[i]], fread(filelist[[i]], fill = TRUE)) creates a data table called data1.txt. But your next line, names(filelist[[i]]) <- ... doesn't modify your data table, it modifies the first element of filelist, which is the string "data1.txt", and that string indeed has length 1.
I recommend reading your files into a list instead of using assign to create objects.
filelist <- list.files(pattern = ".*.txt")
datalist <- lapply(filelist, fread, fill = TRUE)
names(datalist) <- filelist
For changing the names, you can use data.table::setnames instead:
for(dt in datalist) setnames(dt, c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)"))
However, fread has a col.names argument, so you can just do it in the read step directly:
my_names <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
datalist <- lapply(filelist, fread, fill = TRUE, col.names = my_names)
I would also suggest not using "-log10(p)" as a column name - nonstandard column names (with parens and -) are usually more trouble than they are worth.
Could you run the following code to have a closer look at what you are putting into filelist?
i <- 1
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
print(filelist[[i]])
I suspect you may need to use the code below instead of the assign statement
filelist[[i]] <- fread(filelist[[i]], fill = TRUE)

How to automate read.csv command in R?

I'm doing something stupid and I cannot get read.csv to write a lot of files.
If I write:
write.csv(X1, file = "X1.csv")
Then it writes a ~2mb csv file which is ok. I have around 2000 variables in memory and I've tried
for (i in seq_along(fotos)) {
write.csv(paste("X", i, sep = ""), file = paste(paste("X", i, sep = ""),"csv", sep="."))}
I obtain the desired files but the files are ~2kb and X1.csv contains only one cell saying "X1.csv", and all all the files are similar because X1000.csv contains "X1000.csv", this is unlike the command write.csv(X1, file = "X1.csv") which creates a file X1.csv containing a matrix of 96x96.
Any idea of what I'm doing wrong?
Many thanks in advance.
You can get the object by name with the function get. However, it is much better to read the data frames into a list than into objects related by having common names.
So you can create a list of the data frames:
X <- lapply(seq_along(fotos), function(i) get(paste0("X", i)))
names(x) <- fotos
And then write them (and this is what you'd use if you had a list to start with):
lapply(names(X), function(name) write.csv(X[[name]], paste(name, 'csv', sep='.')))
You could try using the get() function
for (i in seq_along(fotos)) {
write.csv(get(paste("X", i, sep = "")), file = paste(paste("X", i, sep = ""),"csv", sep="."))}

Executing function on objects of name 'i' within for-loop in R

I am still pretty new to R and very new to for-loops and functions, but I searched quite a bit on stackoverflow and couldn't find an answer to this question. So here we go.
I'm trying to create a script that will (1) read in multiple .csv files and (2) apply a function to strip twitter handles from urls in and do some other things to these files. I have developed script for these two tasks separately, so I know that most of my code works, but something goes wrong when I try to combine them. I prepare for doing so using the following code:
# specify directory for your files and replace 'file' with the first, unique part of the
# files you would like to import
mypath <- "~/Users/you/data/"
mypattern <- "file+.*csv"
# Get a list of the files
file_list <- list.files(path = mypath,
pattern = mypattern)
# List of names to be given to data frames
data_names <- str_match(file_list, "(.*?)\\.")[,2]
# Define function for preparing datasets
handlestripper <- function(data){
data$handle <- str_match(data$URL, "com/(.*?)/status")[,2]
data$rank <- c(1:500)
names(data) <- c("dateGMT", "url", "tweet", "twitterid", "rank")
data <- data[,c(4, 1:3, 5)]
}
That all works fine. The problem comes when I try to execute the function handlestripper() within the for-loop.
# Read in data
for(i in data_names){
filepath <- file.path(mypath, paste(i, ".csv", sep = ""))
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
}
When I execute this code, I get the following error: Error in data$URL : $ operator is invalid for atomic vectors. I know that this means that my function is being applied to the string I called from within the vector data_names, but I don't know how to tell R that, in this last line of my for-loop, I want the function applied to the objects of name i that I just created using the assign command, rather than to i itself.
Inside your loop, you can change this:
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
to
tmp <- read.delim(filepath, colClasses = "character", sep = ",")
assign(i, handlestripper(tmp))
I think you should make as few get and assign calls as you can, but there's nothing wrong with indexing your loop with names as you are doing. I do it all the time, anyway.

Resources