How to convert for loop to apply in r [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have the following loop. I am trying to convert it into using the apply function instead of a loop, but I don't know how to rewrite the code.
for (i in 1:dim(Y)[2]) {
K = K_origin
print(i)
e = lmmlite::eigen_rotation(K, t(Y)[i,], use_cpp = T)
VC = lmmlite::fitLMM(
e$Kva,
e$y,
e$X,
reml = T,
use_cpp = T,
tol = 1e-6,
check_boundary = T
)
write.table(
VC$sigmasq_g,
"Vg_temp.txt",
row.names = F,
col.names = F,
append = T,
quote = F,
sep = "\n"
)
write.table(
VC$sigmasq_e,
"Ve_temp.txt",
row.names = F,
col.names = F,
append = T,
quote = F,
sep = "\n"
)
}
I want results like these
Vg Ve
1.15521325512487 0.755118863386436
0.579039221720728 1.21733212837417
0.372439354137817 0.296327744338075
0.0668396114713355 0.300417453013007
0.00771158861391208 0.100176380868691
0.210174870097273 0.141907482831872

R's apply functions has to be formulated as 1) something to iterate over, and 2) a function to apply to each element in (1).
But! Whether you will gain anything from converting your particular loop into an apply, is doubtful. If your loop is slow, I am guessing it is due to the operations performed, not that "R is slow on loops". If you only have 6 rows in Y, you will gain nothing from re-formulating the loop into an apply!
For your loop, each i is independent (as opposed to looping over i, when a result depends on the calculation on i-1). So that makes it very easy to re-formulate. Generally,
for (i in some_sequence) {
do something with i
}
can be reformulated to
my_do_something <- function(i) {
do something
}
for (i in some_sequence) {
my_do_something(i)
}
which can be again be directly reformulated to
sapply(some_sequence, my_do_something)
In your case, this would be
my_rotate <- function(i) {
e = lmmlite::eigen_rotation(K, t(Y)[i,], use_cpp = T)
VC = lmmlite::fitLMM( ... )
write.table(...)
write.table(...)
NULL
}
sapply(seq_len(dim(Y)[2]), my_rotate)
Notice how I added an NULL at the bottom of the function? That would be because apply will gather the returned values from the iterated function; write.table returns the written data invisible. Try the function without the last NULL and see what apply returns.
But wait, there's more!
Since you are iterating over particular rows (and asking about apply in particular), let's just drop the i stuff and feed the function the row:
my_rotate_row <- function(x) {
# you might or might not need to either use x as is, transpose it as t(x) or double transpose it, t(t(x)), to get the correct orientation.
# x is most likely an atomic vector, whereas `eigen_rotation` might be requiring either a row-vector or a column vector.
e = lmmlite::eigen_rotation(K, x, use_cpp = T)
VC = lmmlite::fitLMM( ... )
# lets collect data directly into a data.frame or matrix, instead of using files:
c(VC$sigmasq_g, VC$sigmasq_e)
}
Now you can use apply:
apply(Y, 2, my_rotate_row)

Related

Paste multiple elements in R

I make this code using a for-statement. (The main purpose of this code is to list different webpages, which are obtained via httr and rvest)
r = "asdgkjkhdf"
t = "osrt"
all = c()
for(i in 1:400)
{
y = paste(r, i, sep = '')
d = paste(y, t, sep = '')
all = c(all, d)
}
all
I got things like these (pasted numbers are actually getting accumulated in the each results)
[1]asdgkjkhdf1osrt
[2]asdgkjkhdf12osrt
[3]asdgkjkhdf123osrt
[4]asdgkjkhdf1234osrt
...
But I want results like these regardless of how many numbers i put in 'for()'function.
[1]asdgkjkhdf1osrt
[2]asdgkjkhdf2osrt
...
[400]asdgkjkhdf400osrt
like these above
What should I change in order to have what I want to result in?
Should I use paste(substr(), substr(), sep='')?
If you really want to use a for-statement you can use the following
r = "asdgkjkhdf"
t = "osrt"
all = c()
for (idx in 1:400)
all = c(all, paste0(r, idx, t))
However, in R you should prefer code without for-statements since, in general, this is less readable and hurts performance. The solution without the for-statement (given by Roland in the comments) equals
all <- paste0(r, 1:400, t)
Note that paste0("string")is just a short notation for paste("string", sep='').

Using for loop and rbind to iterate over multiple files

I have a small R script of 14 functions and want to run it for 81 files. While I have read several posts on Stack Overflow that address similar issues, I am still having trouble getting this working. I am using a for loop and rbind.
All functions within the { } of the loop work. I have tested them without the for loop and I get the vector of data that I need. But when I run the for loop I only get an output for the last file in the folder. I am not sure what is going on.
Is the for loop working right (is it iterating through the files) and simply overwriting the previous runs? If the for loop is working then I assume I have a problem with my rbind. Or, is the for loop only running the last file in list.files()?
In the end, I want a matrix (or table) with the results of the 14 functions for each of the 81 files.
Here is the code:
res=(1:14)
for(i in list.files())
{
nd = read.csv(i, header= TRUE, row.names =1, check.names = FALSE)
mx = as.matrix(nd)
res[1]=basename(i)
res[2]=-99 #this is just a place holder
res[3]=gden(mx)
res[4]=centralization(mx,degree)
deg = degree(mx, gmode="graph", diag=FALSE, rescale=FALSE)
res[5]=mean(deg)
res[6]=sd(deg)
res[7]=max(deg)
res[8]=min(deg)
Ndeg = degree(mx, gmode="graph", diag=FALSE, rescale=TRUE)*1000
res[9]=mean(Ndeg)
res[10]=sd(Ndeg)
res[11]=max(Ndeg)
res[12]=min(Ndeg)
iso = isolates(mx, diag=FALSE)
res[13]=length(iso)
res[14]=nrow(mx)
}
results=rbind(res)
results
Make your set of functions together a new function and sapply it to every element of list.files():
out <- sapply(list.files(), function(i){
nd = read.csv(i, header= TRUE, row.names =1, check.names = FALSE)
mx = as.matrix(nd)
res = numeric(14)
res[1]=basename(i)
res[2]=-99 #this is just a place holder
res[3]=gden(mx)
res[4]=centralization(mx,degree)
deg = degree(mx, gmode="graph", diag=FALSE, rescale=FALSE)
res[5]=mean(deg)
res[6]=sd(deg)
res[7]=max(deg)
res[8]=min(deg)
Ndeg = degree(mx, gmode="graph", diag=FALSE, rescale=TRUE)*1000
res[9]=mean(Ndeg)
res[10]=sd(Ndeg)
res[11]=max(Ndeg)
res[12]=min(Ndeg)
iso = isolates(mx, diag=FALSE)
res[13]=length(iso)
res[14]=nrow(mx)
return(res)
}
out
you have to have rbind(res) inside the loop,something like this
results = rbind(res), but that is not enough. something like results = rbind(results,res)
It depends how you want to store them as an array of array etc..
You'd better also lapply sapply etc.. instead of loop
I also posted this question on my university listserv and a fellow student provided the following fixes. And now it works :)
res=(1:14)
summary=(1:14)
for(i in list.files())
{
....code as above.....
summary=rbind(summary, res)
}
summary
# then to put into a .csv
write.csv(summary, "nameoffile.csv")

Substituting variables in a loop?

I am trying to write a loop in R but I think the nomenclature is not correct as it does not create the new objects, here is a simplified example of what I am trying to do:
for i in (1:8) {
List_i <-List
colsToGrab_i <-grep(predefinedRegex_i, colnames(List_i$table))
List_i$table <- List_i$table[,predefinedRegex_i]
}
I have created 'predefinedRegex'es 1:8 which the grep should use to search
The loop creates an object called "List_i" and then fails to find "predefinedRegex_i".
I have tried putting quotes around the "i" and $ in front of the i , also [i] but these do not work.
Any help much appreciated. Thank you.
#
Using #RyanGrammel's answer below::
#CREATING regular expressions for grabbing sets groups 1 -7 ::::
g_1 <- "DC*"
g_2 <- "BN_._X.*"
g_3 <- "BN_a*"
g_4 <- "BN_b*"
g_5 <- "BN_a_X.*"
g_6 <- "BN_b_X.*"
g_7 <- "BN_._Y.*"
for i in (1:8)
{
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA)
assign(x = paste("Forgrep_", i, sep=""), value = colnames(get(x = paste("tableA_", i, sep=""))))
assign(x = paste("grab_", i, sep=""), value = grep((get(x = paste("g_",i, sep=""))), (get(x = paste("Forgrep_",i, sep="")))))
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA[,get(x = paste("grab_",i, sep=""))])
}
This loop is repeated for each table inside "BigList".
I found I could not extract columnnames from
(get(x = paste("BigList_", i, "$tableA" sep=""))))
or from
(get(x = paste("BigList_", i, "[[2]]" sep=""))))
so it was easier to extract the tables first. I will now write a loop to repack the lists up.
Problem
Your syntax is off: you don't seem to understand how exactly R deals with variable names.
for(i in 1:10) name_i <- 1
The above code doesn't assign name_1, name_2,....,name_10. It assigns "name_i" over and over again
To create a list, you call 'list()', not List
creating a variable List_i in a loop doesn't assign List_1, List_2,...,List_8.
It repeatedly assigns an empty list to the name 'List_i'. Think about it; if R names variables in the way you tried to, it'd be equally likely to name your variables L1st_1, L2st_2...See 'Solution' for some valid R code do something similar
'predefinedRegex_i' isn't interpreted as an attempt to get the variable 'predefinedRegex_1', 'predefinedRegex_2', and so one.
However, get(paste0("predefinedRegex_", i)) is interpreted in this way. Just make sure i actually has a value when using this. See below.
Solution:
In general, use this to dynamically assign variables (List_1, List_2,..)
assign(x = paste0("prefix_", i), value = i)
if i is equal to 199, then this code assigns the variable prefix_199 the value 199.
In general, use this to dynamically get the variables you assigned using the above snippet of code.
get(x = paste0("prefix_", i))
if i is equal to 199, then this code gets the variable prefix_199.
That should solve the crux of your problem; if you need any further help feel free to ask for clarification here, or contact me via my Twitter Feed.

a reliable way to tell if = is for assignment in R code?

I'm a stubborn useR who uses = instead of <- all the time, and apparently many R programmers will frown on this. I wrote the formatR package which can replace = with <- based on the parser package. As some of you might know, parser was orphaned on CRAN a few days ago. Although it is back now, this made me hesitant to depend on it. I'm wondering if there is another way to safely replace = with <-, because not all ='s mean assignment, e.g. fun(a = 1). Regular expressions are unlikely to be reliable (see line 18 of the mask.inline() function in formatR), but I will certainly appreciate it if you can improve mine. Perhaps the codetools package can help?
A few test cases:
# should replace
a = matrix(1, 1)
a = matrix(
1, 1)
(a = 1)
a =
1
function() {
a = 1
}
# should not replace
c(
a = 1
)
c(
a = c(
1, 2))
This answer uses regular expressions. There are a few edge cases where it will fail but it should be okay for most code. If you need perfect matching then you'll need to use a parser, but the regexes can always be tweaked if you run into problems.
Watch out for
#quoted function names
`my cr*azily*named^function!`(x = 1:10)
#Nested brackets inside functions
mean(x = (3 + 1:10))
#assignments inside if or for blocks
if((x = 10) > 3) cat("foo")
#functions running over multiple lines will currently fail
#maybe fixable with paste(original_code, collapse = "\n")
mean(
x = 1:10
)
The code is based upon an example on the ?regmatches page. The basic idea is: swap function contents for a placeholder, do the replacement, then put your function contents back.
#Sample code. For real case, use
#readLines("source_file.R")
original_code <- c("a = 1", "b = mean(x = 1)")
#Function contents are considered to be a function name,
#an open bracket, some stuff, then a close bracket.
#Here function names are considered to be a letter or
#dot or underscore followed by optional letters, numbers, dots or
#underscores. This matches a few non-valid names (see ?match.names
#and warning above).
function_content <- gregexpr(
"[[:alpha:]._][[:alnum:._]*\\([^)]*\\)",
original_code
)
#Take a copy of the code to modify
copy <- original_code
#Replace all instances of function contents with the word PLACEHOLDER.
#If you have that word inside your code already, things will break.
copy <- mapply(
function(pattern, replacement, x)
{
if(length(pattern) > 0)
{
gsub(pattern, replacement, x, fixed = TRUE)
} else x
},
pattern = regmatches(copy, function_content),
replacement = "PLACEHOLDER",
x = copy,
USE.NAMES = FALSE
)
#Replace = with <-
copy <- gsub("=", "<-", copy)
#Now substitute back your function contents
(fixed_code <- mapply(
function(pattern, replacement, x)
{
if(length(replacement) > 0)
{
gsub(pattern, replacement, x, fixed = TRUE)
} else x
},
pattern = "PLACEHOLDER",
replacement = regmatches(original_code, function_content),
x = copy,
USE.NAMES = FALSE
))
#Write back to your source file
#writeLines(fixed_code, "source_file_fixed.R")
Kohske sent a pull request to the formatR package which solved the problem using the codetools package. The basic idea is to set a code walker to walk through the code; when it detects = as a symbol of a functional call, it is replaced by <-. This is due to the "Lisp nature" of R: x = 1 is actually `=`(x, 1) (we replace it by `<-`(x, 1)); of course, = is treated differently in the parse tree of fun(x = 1).
The formatR package (>= 0.5.2) has since got rid of dependency on the parser package, and replace.assign should be robust now.
The safest (and probably fastest) way to replace = by <- is directly typing <- instead of trying to replace it.

combination of expand.grid and mapply?

I am trying to come up with a variant of mapply (call it xapply for now) that combines the functionality (sort of) of expand.grid and mapply. That is, for a function FUN and a list of arguments L1, L2, L3, ... of unknown length, it should produce a list of length n1*n2*n3 (where ni is the length of list i) which is the result of applying FUN to all combinations of the elements of the list.
If expand.grid worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN is unused, but I can imagine that FLATTEN=FALSE would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE ...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried #baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
#ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply (g for grid) takes an arbitrarily large named list mvars (that gets expand.grid-ed within the function) and a FUN that utilizes the list names as if they were arguments to the function itself (gmcmapply will update the formals of FUN so that by the time FUN is passed to mcmapply it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid (i.e. if the user prints model summaries or status messages as a part of FUN as I often do). I think a feasible solution is to use the substitute() and body() functions, described here to edit FUN to open a sink() at the beginning of FUN and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN itself, but later it would be nice to just pass gmcmapply an argument called something like log_children = "path_to_log_dir. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate

Resources