Is it possible run lapply such that the X list argument is used as the second argument to FUN and the first argument to FUN is skipped?
One example is rjson::fromJSON(json_str, file, [other arguments]). I have a list containing several file paths of json files and would like to read each of them, collapsing the results into a list.
Normally, lapply would be ideal for this. However, in order to read from a file, the json_str argument cannot be given, even a null value. This is because fromJSON uses missing to check whether arguments are given. If both file and json_str are given, an error is thrown.
That means that lapply(files, fromJSON, json_str = NULL) will not work. I'm aware that I could work around this by manually making my own function as follows.
result <- lapply(files, function(file) {
fromJSON(file = file)
})
However, this seems cumbersome and unnecessary. Is there some cleaner way of doing this?
Related
I have merged a bunch of csv files but cant get them to export to one file correctly what am i doing wrong?The data shows up in my console but I get a error that says "Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.fram",
setwd("c:/users/adam/documents/r data/NBA/DK/TEMP")
filenames <- list.files("c:/users/adam/documents/r data/NBA/DK/TEMP")
do.call("rbind",lapply(filenames, read.csv, header = TRUE))
write.csv(read.csv, file ='Lineups.csv')
You did not assign the results of do.call function to anything. Fairly common R noob error. Failure to understand the functional programming paradigm. Results need to be assigned to R names or they just get garbage-collected.
The error is actually from the code that you didn't put in a code block:
write.csv(read.csv, file ='Lineups.csv')
The 'read.csv' was presumably your intended name for the result of the do.call-operation, except it is by default a function name rather than your expectation. You could assign the do.call-results to the name 'read.csv' but doing so is very poor practice. Choose a more descriptive name like 'TEMP_files_appended'.
TEMP_files_appended <- do.call("rbind",lapply(filenames, read.csv, header = TRUE))
write.csv(TEMP_files_appended, file ='Lineups.csv')
(I will observe that using header=TRUE for read.csv is not needed since that is the default for that function.)
I want to read multiple files. To do this I use a generic function read_list
read_list(file_list, read_fun)
Assigning different read function to the argument read_fun I can read different kinds of files, i.e. read.csv for reading csv files, read_dta for STATA files, etc.
Now, I need to read some csv files where the first four lines need to be skipped. Thus, instead than passing read.csv as an argument to read_list, I would like to pass read.csv with the skip argument set to 4. Is it possible to do this in R? I tried
my_read_csv <- function(...){
read.csv(skip = 4, ...)
}
It seems to work, but I would like to confirm that this is the right way to do it. I think that functions being objects in R is a fantastic and very powerful feature of the language, but I'm not very familiar with R closures and scoping rules, thus I don't want to inadvertently make some big mistake.
You can simply rewrite your read_list to add the unnamed argument qualifier ... at the end and then replace the call to
read_fun(file) with read_fun(file, ...).
This will allow you to write the following syntax:
read_list(files, read.csv, skip = 4)
wich will be equivalent to using your current read_list with a cusom read function:
read_list(files, function(file)read.csv(file, skip = 4))
Also, be aware that read_list sounds awfully lot like a "reinvent the wheel" function. If you describe the behaviour of read_list a little more, I can expand.
Possible alternatives may be
read_list <- function(files, read_fun, ...)lapply(files, read_fun, ...)
# in this case read_list is identical to lapply
read_list <- function(files, read_fun, ...)do.call(rbind, lapply(files, read_fun, ...))
# This will rbind() all the files to one data.frame
I'm not sure if read_list is specialized to your specific task in some way but you can use lapply along with read.csv to read a list of files:
# generate fake file names
files <- paste0('file_', 1:10, '.csv')
# Read files using lapply
dfs <- lapply(files, read.csv, skip = 4)
The third argument of lapply is ... which allows you to pass additional arguments to the function you're applying. In this case, we can use ... to pass the skip = 4 argument to read.csv
I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.
I have created a list of objects in my work environment
data <- c("variable1", "variable2", "variable3")
i would like to save the files to different directories with the variable name as the directory... so i did this to give me a list of file names to pass to the save function via lapply..
paste0(data,"/",data,".rda")
lapply(data,FUN=save,file = paste0(data,"/",data,".rda"))
i get the error
Error in FUN(X[[i]], ...) : object ‘X[[i]]’ not found
i'm not sure what i'm doing wrong here..
Do you have a list of objects, or a list of names of objects? You say you have the former, but the code you give is for the latter.
Also, if you only have one object per file, then it's better to use the saveRDS function (and loadRDS to load it).
lapply(data, function(x) saveRDS(get(x), paste0(x, "/", x, ".rds")))
If you have to use save:
lapply(data, function(x) save(list=x, file=paste0(x, "/", x, ".rds")))
Several things going on here.
First, you need not use lapply when you don't care about the return value of the function called at each iteration. It offers nothing in this case.
Second, and more importantly, what you are doing is writing objects to files with names derived from their variable names in R. That's an anti-pattern.
Instead, create a list of the objects, and use for for the work. We need to use saveRDS for this (thanks Hong Ooi) as l[[n]] is also not the name of an object in the environment.
l <- list(variable1 = variable1, variable2 = variable2, variable3=variable3)
for (n in names(l)) {
fname = paste0(n, '/', n, '.rda')
saveRDS(file=fname, l[[n]])
}
It would be better to just save the entire list, but then all the data would be in one file in one directory.
As for what's actually wrong with your code:
You pass the same value for file to all invocations of save, and you don't intend to do so. This value is a vector, but what you want is that each iteration gets one element from this vector.
The way lapply computes the value to pass to the function confuses save. In particular, it does this:
names <- as.character(substitute(list(...)))[-1L]
That results in something like the following, which is not the name of an object in the environment.
c("variable1", "variable2", "variable3")[[1]]
I am trying to source multiple functions, that differ by a number in the name.
For example: func1, func2.
I tried using "func_1", and "func_2", as well as putting the number first, "1func" and "2func". No matter how I index the function names, the source function just reads in one function that it calls "func" - which is not what I want.
I have tried using for-loops and sapply:
for-loop:
func.list <- list.files(path="/some_path",pattern="some pattern",full.names=TRUE)
for(i in 1:length(func.list)){
source(func.list[i])
}
sapply:
sapply(func.list,FUN=source)
I am going to be writing multiple versions of a data correction function, and would really like to be able to index them - because giving a concise, but specific, name would be difficult, and not allow me to selectively source just the function files from their directory.
In my code, func.list gives the output (I have replaced the actual directory because of privacy/contractual issues):
[1] "mypath/1resp.correction.R"
[2] "mypath/2resp.correction.R"
Then when I source func.list with either the for-loop or sapply code (listed above), R only loads one function named resp.correction, with the code body from "2resp.correction.R".
The argument to source is a file name, not a function name. So you cannot be fancy here: you need to provide the exact filenames.
It sounds like your two files contain the definitions of a function with the same name (resp.correction) in both files, so yes, as you source one file after the other, the function is overwritten in your global environment.
You could, inside your loop, reassign the function to a different name:
func.list <- list.files(path="/some_path",pattern="some pattern",full.names=TRUE)
for(i in 1:length(func.list)) {
source(func.list[i], local = TRUE)
assign(paste0("resp.correction", i), resp.correction, envir = .GlobalEnv)
}