I can't seem to make apply function access/modify a variable that is declared outside... what gives?
x = data.frame(age=c(11,12,13), weight=c(100,105,110))
x
testme <- function(df) {
i <- 0
apply(df, 1, function(x) {
age <- x[1]
weight <- x[2]
cat(sprintf("age=%d, weight=%d\n", age, weight))
i <- i+1 #this could not access the i variable in outer scope
z <- z+1 #this could not access the global variable
})
cat(sprintf("i=%d\n", i))
i
}
z <- 0
y <- testme(x)
cat(sprintf("y=%d, z=%d\n", y, z))
Results:
age=11, weight=100
age=12, weight=105
age=13, weight=110
i=0
y=0, z=0
Using the <<- operator you can write to variables in outer scopes:
x = data.frame(age=c(11,12,13), weight=c(100,105,110))
x
testme <- function(df) {
i <- 0
apply(df, 1, function(x) {
age <- x[1]
weight <- x[2]
cat(sprintf("age=%d, weight=%d\n", age, weight))
i <<- i+1 #this could not access the i variable in outer scope
z <<- z+1 #this could not access the global variable
})
cat(sprintf("i=%d\n", i))
i
}
z <- 0
y <- testme(x)
cat(sprintf("y=%d, z=%d\n", y, z))
The result here:
age=11, weight=100
age=12, weight=105
age=13, weight=110
i=3
y=3, z=3
Note that the usage of <<- is dangerous, as you break up scoping. Do this only if really necessary and if you do, document that behavior clearly (at least in bigger scripts)
try the following inside your apply. Experiment with the value of n. I believe that for i it should be one less than for z.
assign("i", i+1, envir=parent.frame(n=2))
assign("z", z+1, envir=parent.frame(n=3))
testme <- function(df) {
i <- 0
apply(df, 1, function(x) {
age <- x[1]
weight <- x[2]
cat(sprintf("age=%d, weight=%d\n", age, weight))
## ADDED THESE LINES
assign("i", i+1, envir=parent.frame(2))
assign("z", z+1, envir=parent.frame(3))
})
cat(sprintf("i=%d\n", i))
i
}
OUTPUT
> z <- 0
> y <- testme(x)
age=11, weight=100
age=12, weight=105
age=13, weight=110
i=3
> cat(sprintf("y=%d, z=%d\n", y, z))
y=3, z=3
Related
I have a function f(x, y) that returns a list of 8 logical vectors, where x and y are integers. I want to populate a three-dimensional array M so that M[x, y, z] is the number of TRUEs in the zth element of f(x, y). I can do this with nested for loops, but I know those are frowned upon in R. I think there's a more elegant way, using either outer or rbind and sapply but I can't figure it out. Here's my code with the nested for loops:
M <- array(dim=c(150, 200, 8))
for(j in 1:150) {
for(k in 1:200) {
rsu <- f(j, k)
for(z in 1:8) {
M[j, k, z] <- sum(rsu[[z]])
}}}
What is a more efficient/elegant way of populating this array that gives the same result?
Edited to add: For purposes of this question, treat f as a black box. In reality it involves various calculations and lookups about eight different satellites, but here's a dummy function that will generate some data for this example:
is.prime <- function(n) n == 2L || all(n %% 2L:ceiling(sqrt(n)) != 0)
#source for is.prime function:
# https://stackoverflow.com/questions/19767408/prime-number-function-in-r
f <- function(x,y) {
retlist <- list()
retlist[[1]] <- c(FALSE, FALSE, rep(TRUE, x))
retlist[[2]] <- c(TRUE, TRUE, rep(FALSE, y), rep(TRUE, y))
retlist[[3]] <- c(is.prime(x), is.prime(y), is.prime(x+y), is.prime(x+y+3), sapply(x:(2*(x+y)), is.prime))
retlist[[4]] <- c(x+y %% 5 == 0, x*y %% 6 ==0)
retlist[[5]] <- retlist[[(x+y) %% 4 + 1]]
retlist[[6]] <- retlist[[y %% 4 + 1]]
retlist[[7]] <- retlist[[x %% 6 + 1]]
retlist[[8]] <- sapply(abs(x-y):(7L*x+y+1), is.prime)
return(retlist)
}
Here's how to the populate the array, giving the same results, using nested functions and sapply instead of for:
f2 <- function(x,y) {
rsu <- f(x,y)
values <- vapply(1:8, FUN=function(z) sum(rsu[[z]]), FUN.VALUE=1L)
}
f3 <- function(x) array(data=t(sapply(1:200, FUN=function(w) f2(x,w))), dim=c(1,200,8))
M2 <- array(data=t(sapply(1:150, FUN=f3)), dim=c(150,200,8))
Here's how to do it with outer. But it's unintuitive; the matrix data are assigned within the function. I don't understand why I need to invoke Vectorize(f2) here instead of simply f2.
M2 <- array(dim=c(150, 200, 8))
f2 <- function(x, y) {
rsu <- f(x, y)
M2[x, y, ] <<- vapply(1:8, FUN=function(z) sum(rsu[[z]]), FUN.VALUE=1L)
return(0L)
}
ABC <- outer(1:150, 1:200, Vectorize(f2))
I have a large loop that will take too long (~100 days). I'm hoping to speed it up with the snow library, but I'm not great with apply statements. This is only part of the loop, but if I can figure this part out, the rest should be straightforward. I'm ok with a bunch of apply statements or loops, but one apply statement using a function to get object 'p' would be ideal.
Original data
dim(m1) == x x # x >>> 0
dim(m2) == y x # y >>> 0, y > x, y > x-10
dim(mout) == x x
thresh == x-10 #specific to my data, actual number probably unimportant
len(v1) == y #each element is a random integer, min==1, max==thresh
len(v2) == y #each element is a random integer, min==1, max==thresh
Original loop
p <- rep(NA,y)
for (k in 1:y){
mout <- m1 * matrix(m2[k,],x,x)
mout <- mout/sum(mout)
if (v1[k] < thresh + 1){
if(v2[k] < thresh + 1){
p[k] <- out[v1[k],v2[k]]
}
if(v2[k] > thresh){
p[k] <- sum(mout[v1[k],(thresh+1):x])
}
}
#do stuff with object 'p'
}
library(snow)
dostuff <- function(k){
#contents of for-loop
mout <- m1 * matrix(m2[k,],x,x)
mout <- mout/sum(mout)
if (v1[k] < thresh + 1){
if(v2[k] < thresh + 1){
p <- out[v1[k],v2[k]]
}
if(v2[k] > thresh){
p <- sum(mout[v1[k],(thresh+1):x])
}
}
#etc etc
return(list(p,
other_vars))
}
exports = c('m1',
'm2',
'thresh',
'v1',
'x' ,
'v2')
cl = makeSOCKcluster(4)
clusterExport(cl,exports)
loop <- as.array(1:y)
out <- parApply(cl,loop,1,dostuff)
p <- rep(NA,y)
for(k in 1:y){
p[k] <- out[[k]][[1]]
other_vars[k] <- out[[k]][[2]]
}
i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje
2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84
This calculates the log of (x-1)! to return the lgamma(x) value of an integer but my function log_gamma works only till x = 171 for x > 171 it returns Inf. How can I solve this problem?
log_gamma <- function(x){
y <- 1
if (x < 1)(
return("Infinity")
)
if (x == 1)(
return(0)
)
x <- x-1
for (i in 1:x){
y <- y*i
}
return(log(y))
}
Your current solution first computes 171! which is a pretty big number. Instead, use the fact that log(a*b) = log(a) + log(b) to compute this as a sum.
log_gamma <- function(x){
y <- 1
if (x < 1)(
return("Infinity")
)
if (x == 1)(
return(0)
)
x <- x-1
for (i in 1:x){
y <- y + log(i)
}
return(y)
}
log_gamma(171)
[1] 707.5731
log_gamma(172)
[1] 712.7147
log_gamma(1000)
[1] 5906.22
I want to create a function that takes 3 arguments: a list of values and two cutoff values (a high and a low). Then I want it to how many of the values in the list are within the range of the two cutoff values.
So far I have tried:
count <- function(y, x1, x2){
tmp1 <- length(y)
tmp2 <- length(y>x1)
tmp3 <- length(tmp2<=x2)
return(tmp3)
}
and
count <- function(y, x1, x2){
results <- list()
for (i in y) {
if(y > x1 & y <= x2) {
results <- results+1
}
}
return(results)
}
none of them work. Can some help me correct my code?
Simplify it down. Take the sum of a vectorized logical operation
f <- function(x, y, z) sum(x > y & x < z)
f(1:10, 3, 7)
# [1] 3
But the data.table authors are one step ahead of you. They've written a function between(). I believe there is also one in the dplyr package as well.
library(data.table)
between
# function (x, lower, upper, incbounds = TRUE)
# {
# if (incbounds)
# x >= lower & x <= upper
# else x > lower & x < upper
# }
# <bytecode: 0x44fc790>
# <environment: namespace:data.table>
So for the same result as above you can simply do
sum(between(1:10, 3, 7, FALSE))
# [1] 3