Performance suggestions in R - r

I have this piece of code:
library("GO.db")
lookParents <- function(x) {
parents <- subset(yy[x][[1]], labels(yy[x][[1]])=="is_a")
for (parent in parents) {
m[index,1] <<- Term(x)
m[index,2] <<- Term(parent)
m[index,3] <<- -log2(go_freq[x,1]/go_freq_all)
m[index,4] <<- log2(go1_freq2[x])
m[index,5] <<- x
m[index,6] <<- parent
index <<- index + 1
}
if (is.null(parents)) {
return(c())
} else {
return(parents)
}
}
getTreeMap <- function(GOlist, xx, m) {
print(paste("Input list has",length(GOlist), "terms", sep=" "))
count <- 1
for (go in GOlist) {
parents <- lookParents(go)
if (count %% 100 == 0) {
print(count)
}
while (length(parents) != 0) {
x <- parents[1]
parents <- parents[-1]
parents <- c(lookParents(x), parents)
}
count <- count + 1
}
}
xx <- c(as.list(GOBPANCESTOR), as.list(GOCCANCESTOR), as.list(GOMFANCESTOR))
go1_freq2 <- table(as.character(unlist(xx[go1])))
xx <- c(as.list(GOBPPARENTS), as.list(GOCCPARENTS), as.list(GOMFPARENTS))
m <- as.data.frame(matrix(nrow=1,ncol=6))
m[1,] <- c("all", "null", 0, 0, "null","null")
##biological processes
index <- 2
getTreeMap(BP, xx, m)
but it is really slow. BP is simply a vector. Do you have performance suggestions to apply? I would like to make it run faster, but that's all I can do at the moment.

I suggest following improvements:
add your functions into RProfile.site and compile them using cmpfun
use foreach and dopar instead of normal for
always delete the variables you don't need anymore and then call the garbage collector

Related

capturing R matrix list

Below piece of code is generating what I need but I am not able to store it so that I can use it further.
In the case below, I want to store each player's hand in a list of matrices p such that p[i]<-deck2[smpl,].
The second thing I want is to save and use the final matrix of deck2 (i.e say with 10 players, it will be a 29 row matrix). I can see NROW(deck2) as 29 but the assignment of d<-deck2 is not happening. What am I missing here?
deck2=matrix(c(rep( c(2:10,"J","Q","K","A"),4),rep(c("C","D","H","S"),rep(13,4))), ncol=2,dimnames=list(NULL,c("rank","suit")))
player_hands=function(players)
{ if(players >= 2 && players <= 10) {
for(i in 1:players)
{
smpl <- sample(1:NROW(deck2),2,replace=F)
r <- deck2[smpl,]
p <- deck2[smpl,]
deck2 <- deck2[-smpl,]
print(r)
if(i==players)
{ smpl <- sample(1:NROW(deck2),3,replace=F)
r <- deck2[smpl,]
p <- deck2[smpl,]
deck2 <- deck2[-smpl,]
print("Dealer Hand")
print(r)
}
else i=i+1
} }
else print("Invalid No. of Players")
}
I believe this should do what you want. It will return a list containing two items.
The first of these two items is the list of hands p, of which the last one will be the dealer's hand.
The second of the two items it returns will be the new deck2.
player_hands=function(players)
{ if(players >= 2 && players <= 10) {
p = list()
for(i in 1:players)
{
smpl <- sample(1:NROW(deck2),2,replace=F)
r <- deck2[smpl,]
p[[i]] = r
deck2 <- deck2[-smpl,]
if(i==players)
{ smpl <- sample(1:NROW(deck2),3,replace=F)
r <- deck2[smpl,]
p[[players+1]] <- r
deck2 <- deck2[-smpl,]
}
else i=i+1
}
return(list(p, deck2))
}
else print("Invalid No. of Players")
}

looping through a matrix with a function

I'd like to perform this function on a matrix 100 times. How can I do this?
v = 1
m <- matrix(0,10,10)
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
return(x)
}
}
This is what I have so far but doesn't work.
for (i in 1:100) {
rad(m)
}
I also tried this, which seemed to work, but gave me an output of like 5226 rows for some reason. The output should just be a 10X10 matrix with changed values depending on the conditions of the function.
reps <- unlist(lapply(seq_len(100), function(x) rad(m)))
Ok I think I got it.
The return statement in your function is only inside a branch of an if statement, so it returns a matrix with a probability of ~50% while in the other cases it does not return anything; you should change the code function into this:
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
}
return(x)
}
Then you can do:
for (i in 1:n) {
m <- rad(m)
}
Note that this is semantically equal to:
for (i in 1:n) {
tmp <- rad(m) # return a modified verion of m (m is not changed yet)
# and put it into tmp
m <- tmp # set m equal to tmp, then in the next iteration we will
# start from a modified m
}
When you run rad(m) is not do changes on m.
Why?
It do a local copy of m matrix and work on it in the function. When function end it disappear.
Then you need to save what function return.
As #digEmAll write the right code is:
for (i in 1:100) {
m <- rad(m)
}
You don't need a loop here. The whole operation can be vectorized.
v <- 1
m <- matrix(0,10,10)
n <- 100 # number of random replacements
idx <- sample(length(m), n, replace = TRUE) # indices
flip <- sample(c(-1, 1), n, replace = TRUE) # subtract or add
newVal <- aggregate(v * flip ~ idx, FUN = sum) # calculate new values for indices
m[newVal[[1]]] <- m[newVal[[1]]] + newVal[[2]] # add new values

Calculate recursively `log(log(log(134)))`

I want to recursively count the log cylces in my function
logCounter <- function(number) {
k <- 0
if(k>=0){
k = k+1
}
result <- log(number)
if (result > 1) {
logCounter(result)
} else {
return(k)
}
}
logCounter(123)#returns 3 because log(log(log(123))) < 1
However, my counter k does not work as I would have inspected. Therefore I really would appreciate your answer!!!
You don't need to use Recall. Try this:
logCounter <- function(number) {
if (number <1) return(0) # A minor edit.
result <- log(number)
if (result > 1) return(logCounter(result)+1)
return(1)
}
The key is to try to compose your function in a way that doesn't require storing intermediate results.
You could do this much more easily without calling the function recursively with a while loop:
logCounter <- function(number) {
k <- 0
result <- number
while(result>1){
k <- k + 1
result <- log(result)
}
return(k)
}
> logCounter(123)
[1] 3
EDIT: If you need to use recursion, consider the Recall function:
logCounter <- function(number, iter=1) {
if(log(number)>1)
out <- Recall(log(number), iter+1)
else
out <- list(log(number),iter)
return(out)
}
> logCounter(123)
[[1]]
[1] 0.4518085
[[2]]
[1] 3

specify dynamic array indexes programmatically

I'd like to generalize this code to handle an array of any number of dimensions but I'm not sure how to specify array indexes programatically. I think its possible with some combination of paste eval substitute quote but I can't figure it out.
x <- array(runif(1000),dim=c(10,10,10))
w <- vector("list")
for (i in seq(dim(x)[1]))
{
w[i] <- list(which(x[i,,] == max(x[i,,]),arr.ind=TRUE))
}
for (i in seq(dim(x)[1]))
{
# looking for something like:
# s <- paste(i,",",paste(w[[i]],collapse=","),sep="")
# v <- x[s]
v <- x[i,w[[i]][[1]],w[[i]][[2]]]
print(paste("the max at index",i,"is",round(v,4)),quote=FALSE)
}
nvm, figured out how to do it
x <- array(runif(1000),dim=c(10,10,10))
w <- vector("list")
for (i in seq(dim(x)[1]))
{
w[i] <- list(which(x[i,,] == max(x[i,,]),arr.ind=TRUE))
}
for (i in seq(dim(x)[1]))
{
v <- do.call(`[`,c(list(x,i),w[[i]]))
print(paste("the max at index",i,"is",round(v,4)),quote=FALSE)
}

breaking out of for loop when running a function inside a for loop in R

Suppose you have the following function foo. When I'm running a for loop, I'd like it to skip the remainder of foo when foo initially returns the value of 0. However, break doesn't work when it's inside a function.
As it's currently written, I get an error message, no loop to break from, jumping to top level.
Any suggestions?
foo <- function(x) {
y <- x-2
if (y==0) {break} # how do I tell the for loop to skip this
z <- y + 100
z
}
for (i in 1:3) {
print(foo(i))
}
Admittedly my R knowledge is sparse and this is drycoded, but something like the following should work:
foo <- function(x) {
y <- x-2
if (y==0) {return(NULL)} # return NULL then check for it
z <- y + 100
z
}
for (i in 1:3) {
j <- foo(i)
if(is.null(j)) {break}
print(j)
}
Edit: updated null check for posterity
As a matter of coding practice, don't do this. Having a function that can only be used inside a particular loop is not a great idea. As a matter of educational interest, you can evaluate the 'break' in the parent environment.
foo <- function(x) {
y <- x-2
if (y==0) {eval.parent(parse(text="break"),1)}
z <- y + 100
z
}
for (i in 0:3) {
print(foo(i))
}
Are we allowed to be a little more creative? Could you recast your problem to take advantage of the following approach, where the operation is based on vectors?
x <- 1:3
y <- x[x-2 < 0] - 2 + 100 # I'm leaving the "- 2" separate to highlight the parallel to your code
y
If, however, a deeper form underlies the question and we need to follow this pattern for now, perhaps tweak it just a bit...
foo <- function(x) {
y <- x - 2
if (y != 0) {
z <- y + 100
z
} # else implicitly return value is NULL
}
for (i in 1:3) {
if (is.numeric(result <- foo(i))) {
print(result)
} else {
break
}
}
An alternative way is to throw an error and catch it with try, like so:
foo <- function(x) {
y <- x-2
if (y==0) {stop("y==0")}
z <- y + 100
z
}
try(for (i in 0:5) {
print(foo(i))
}, silent=TRUE)
## or use tryCatch:
for (i in 0:5) {
bar <- tryCatch(foo(i),error=function(e) NA)
if(is.na(bar)){ break } else { print(bar) }
}
I have no clue how r works but I found the question interesting because I could lookup a new language's syntax so excuse my answer if it is totally wrong :)
foo <- function(x) {
y <- x-2
if (y!=0) z <- NULL else z <- y + 100
z
}
for (i in 1:3)
{
a <- foo(i)
if (a == NULL) {next}
print(a)
}

Resources