How to loop through input variables for a function in R? - r

I have a function, biovars(prec,tmin,tmax) with three variables as input. prec,tmin,tmax are multi bandGeoTiffs. However there are several Tiffs stored in different directories, that needed to be stacked before the can be passed to the function:
stack("/path1/to/file/prec1.tif")
stack("/path2/to/file/tmin1.tif")
stack("/path3/to/file/tmax1.tif")
How can I loop through each directory, stack the first file and use the result of the loop as input for the function:
biovars(prec1,tmin1,tmax1)
biovars(prec2,tmin2,tmax2)
...
biovars(precn,tminn,tmaxn)
Is there an easy way for doing this? I don't think it is possible with three for loops...

From your description, I think you can do something like this:
library(dismo)
# setwd(root of data files")
fprec <- list.files(".", pattern="prec", recursive=TRUE, full.names=TRUE)
ftmin <- gsub("prec", "tmin", fprec)
ftmax <- gsub("prec", "tmax", fprec)
fbio <- gsub("prec", "bio", fprec)
for (i in 1:length(fprec)) {
prec <- brick(fprec[i])
tmin <- brick(ftmin[i])
tmax <- brick(ftmax[i])
b <- biovars(prec, tmin, tmax, filename=fbio[i])
}

I always prefer to use *apply functions and avoid get. It also leads to easily read code (IMO).
prec_files = c("/path/prec1.tif", "/path/prec2.tif", ..., "/path/precn.tif")
tmin_files = c("/path/tmin1.tif", "/path/tmin2.tif", ..., "/path/tminn.tif")
tmax_files = c("/path/tmax1.tif", "/path/tmax2.tif", ..., "/path/tmaxn.tif")
prec_stacks <- lapply(prec_files, stack)
tmin_stacks <- lapply(tmin_files, stack)
tmax_stacks <- lapply(tmax_files, stack)
results <- mapply(FUN = biovar, prec_stacks, tmin_stacks, tmax_stacks)
If biovars returns a useful object, you can store the results of the mapply call. If you hit memory limitations by (I assume) loading all the rasters at once, you can wrap the loading and biovar call into one function:
stack_and_biovar <- function(prec_path, tmin_path, tmax_path) {
prec <- stack(prec_path)
tmin <- stack(tmin_path)
tmax <- stack(tmax_path)
biovar(prec, tmin, tmax)
}
mapply(FUN = stack_and_biovar, prec_files, tmin_files, tmax_files)

If I understand you correctly then there is a simple way to do it using either pmap or pwalk from purrr package.
Basically, you list all the files with given patterns then use pmap/pwalk to loop through that list and pass corresponding file names to raster::stack and biovars functions.
library(tidyverse)
pathGIS <- "/path1/to/file/"
filePattern <- list("^prec*", "^tmax*", "^tmin*")
# List files and sort them numerically using mixedsort from gtools
fileList <- map(filePattern, ~ gtools::mixedsort(list.files(path = pathGIS, recursive = TRUE,
pattern = .x, full.names = TRUE)))
head(fileList)
# ..1, ..2, etc. correspond to the sublist order within fileList
pwalk(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
pmap(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
Simple example
fileList <- list(c("./GIS/prec1.tif", "./GIS/prec2.tif"),
c("./GIS/tmax1.tif", "./GIS/tmax2.tif"),
c("./GIS/tmin1.tif", "./GIS/tmin2.tif"))
dummy_function <- function(precFile, tmaxFile, tminFile) {
cat("\n-----\n")
print(precFile)
print(tmaxFile)
print(tminFile)
cat("-----\n")
}
pwalk(fileList, ~ dummy_function(..1, ..2, ..3))
-----
[1] "./GIS/prec1.tif"
[1] "./GIS/tmax1.tif"
[1] "./GIS/tmin1.tif"
-----
-----
[1] "./GIS/prec2.tif"
[1] "./GIS/tmax2.tif"
[1] "./GIS/tmin2.tif"
-----

Related

Name seurat function in r with name of each experiment/variable

I am using seurat to analyze some scRNAseq data, I have managed to put all the SCT integration one line codes from satijalab into a function with basically
SCT_normalization <- function (f1, f2) {
f_merge <- merge (f1, y=f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <<- PrepSCTIntegration(object.list = f.list, anchor.features = features)
return (f.list)
}
so that I will have f.list in the global environment for downstream analysis and making plots. The problem I am running into is that, every time I run the function, the output would be f.list, I want it to be specific to the input value name (i.e., f1 and/or f2). Basically something that I can set so that I would know which input value was used to generate the final output. I saw something using the assign function but someone wrote a warning about "the evil and wrong..." so I am not sure as to how to approach this.
From what it sounds like you don't need to use the super assign function <<-. In my opinion, I don't think <<- should be used as it can cause unexpected changes in objects. This is what I assume the other person was saying. For example, if you have the following function:
AverageVector <- function(v) x <<- mean(v, rm.na = TRUE)
Now you're trying to find the average of a vector you have, along with more analysis
library(tidyverse)
x <- unique(iris$Species)
avg_sl <- AverageVector(iris$Sepal.Length)
Now where x used to be a character vector, it's not a numeric vector with a length of 1.
So I would remove the <<- and call your function like this
object_list_1_2 <- SCT_normalize(object1, object2)
If you wanted a slightly more programatic way you could do something like this to keep track of objects you could do something like this:
SCT_normalization <- function(f1, f2) {
f_merge <- merge (f1, y = f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <- PrepSCTIntegration(object.list = f.list, anchor.features = features)
to_return <- list(inputs = list(f1, f2), normalized = f.list)
return(to_return)
}

string as function argument in R inside map

Question: i have the following R code (below):
It didn't work with "x" = "ARIMA" and "ETS" from "my.list".
That's the problem:
"fabletools::model(arima_auto = fable::ARIMA(Trips))" = it works,
but this: "fabletools::model(arima_auto = fable::x(Trips))" didn't work.
Does anyone know the solution to my problem. Is it even possible in R?
library(tidyverse)
library(fable)
library(fabletools)
library(tsibble)
tourism <- tsibble::tourism
my.list <- list("ARIMA","ETS")
my.list[[1]] ## "ARIMA"
my.list[[2]] ## "ETS"
f_test <- function(.df1,.n){
x <- .df1[[.n]][[1]] ### 1) "ARIMA", "ETS"
print(x)
fit <- tourism %>%
dplyr::filter(Region == "Adelaide") %>%
#fabletools::model(arima_auto = fable::ARIMA(Trips)) ### it works
fabletools::model(arima_auto = fable::x(Trips)) ### didn't work
assign("fit", fit, envir= globalenv())
}
purrr::map(.x = seq(my.list), .f = ~(f_test(my.list, Counter <- .x)))
When you're calling x(Trips), your x is the character vector "ARIMA". R has no idea what [character vector](Trips) means. It would be like trying to call "Alice"(y) and expecting R to treat "Alice" as a function, even though it clearly is not one.
What you want is a way for R to swap the string "ARIMA" for its corresponding function. This is what match.fun is for. Try this instead:
working<-match.fun(x)
fabletools::model(arima_auto = working(Trips))
Note that we didn't need to use any namespaces for this, unlike your original approach. Good practice would be to find a way to do so, e.g. working<-get(x,envir = environment(fable)), but we didn't need it here.

Aggregating all SpatialPolygonsDataFrame objects from list into one SpatialPolygonsDataFrame

Not looking to editing topology, merely aggregating all polygons into one sp object of type SpatialPolygonsDataFrame (spdf). There is only one polygon per spdf.
Data (dropbox link to data) (filesize 1.1KB) ( dput() not appropriate in this instance):
list_of_spdf <- unlist(readRDS("data.Rds"))
I get the desired result with:
one_spdf <- rbind(list_of_spdf[1][[1]], list_of_spdf[2][[1]], list_of_spdf[3][[1]], makeUniqueIDs = TRUE)
# when plotting can see two polygons (third object is a repeat for sake of testing)
plot(one_spdf)
Having hundreds of objects (though only one polygon per spdf), I need to do the rbind programatically. So I tried lapply
list_of_spdf <- lapply(list_of_spdf, rbind, makeUniqueIDs = TRUE)
Obviously, this returns a list and therefore not what I'm looking for.
So I wrote a function:
rbindSPDF <- function(lst) {
# Create empty spdf objects
pol <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
pols <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
# loop for rbind
for (i in 1:length(lst)) {
pol[i] <- lst[i][[1]]
if (length(pols) == 0) {
pols <- pol[i]
} else {
pols <- rbind(pols, pol[i], makeUniqueIDs = TRUE)
}
}
return(pols)
}
However, when using rbindSPDF:
single_spdf <- rbindSPDF(list_of_spdf)
I get:
Error in as.vector(data) :
no method for coercing this S4 class to a vector
Not sure what I'm doing wrong here.
Plus, I'm guessing I probably don't even need to use my own function.
Note: On top of many other packages, I'm using spand rgdal for spatial data and would rather avoid using yet another one due to attaching/detaching time and masking.
To have a programmatical version of
one_spdf <- rbind(list_of_spdf[1][[1]],
list_of_spdf[2][[1]],
list_of_spdf[3][[1]],
...
makeUniqueIDs = TRUE)
for a very long list in list_of_spdf, would something like the following work?
# generate list containing list_of_spdf[i][[1]]
list.df <- lapply(seq_along(list_of_spdf),
function(i){list_of_spdf[i][[1]]})
# apply rbind to the list
one_spdf2 <- do.call("rbind",
c(args = list.df, makeUniqueIDs = TRUE))
> all.equal(one_spdf, one_spdf2)
[1] TRUE
The results seem equivalent on my machine.

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

combination of expand.grid and mapply?

I am trying to come up with a variant of mapply (call it xapply for now) that combines the functionality (sort of) of expand.grid and mapply. That is, for a function FUN and a list of arguments L1, L2, L3, ... of unknown length, it should produce a list of length n1*n2*n3 (where ni is the length of list i) which is the result of applying FUN to all combinations of the elements of the list.
If expand.grid worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN is unused, but I can imagine that FLATTEN=FALSE would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE ...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried #baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
#ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply (g for grid) takes an arbitrarily large named list mvars (that gets expand.grid-ed within the function) and a FUN that utilizes the list names as if they were arguments to the function itself (gmcmapply will update the formals of FUN so that by the time FUN is passed to mcmapply it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid (i.e. if the user prints model summaries or status messages as a part of FUN as I often do). I think a feasible solution is to use the substitute() and body() functions, described here to edit FUN to open a sink() at the beginning of FUN and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN itself, but later it would be nice to just pass gmcmapply an argument called something like log_children = "path_to_log_dir. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate

Resources