I've got the following code in R:
func.time <- function(n){
times <- c()
for(i in 1:n){
r <- 1 #x is the room the mouse is in
X <- 0 #time, starting at 0
while(r != 5){
if(r == 1){
r <- sample(c(2,3),1) }
else if(r == 2){
r <- sample(c(1,3), 1) }
else if(r == 3){
r <- sample(c(1,2,4,5), 1) }
else if (r == 4){
r <- sample(c(3,5), 1) }
X <- X + 1
}
times <- c(X, times)
}
mean(times)
}
func.time(10000)
It works fine, but I've been told that using switch() can speed it up seeing as I've got so many if else statements but I can't seem to get it to work, any help is appreciated in advance.
Edit
I've tried this:
func.time <- function(n) {
times <- c()
for(i in 1:n) {
r <- 1 #x is the room the mouse is in
X <- 0 #time, starting at 0
while(r != 5) {
switch(r, "1" = sample(c(2,3), 1),
"2" = sample(c(1,3), 1),
"3" = sample(c(1,2,4,5), 1),
"4" = sample(c(3,5)))
X <- X + 1
}
times <- c(X, times)
}
mean(times)
}
func.time(10000)
But it was a basic attempt, I'm not sure I've understood the switch() method properly.
I though Dominic's assessment was very useful but when I went to examine the edit it was being held up on what I thought was an incorrect basis. So I decided to just fix the code. When usign a numeric argument to the EXPR parameter you do not use the item=value formalism but rather just put in the expressions:
func.time <- function(n){times <- c()
for(i in 1:n){; r <- 1; X <- 0
while(r != 5){
r <- switch(r,
sample(c(2,3), 1) , # r=1
sample(c(1,3), 1) , # r=2
sample(c(1,2,4,5), 1), #r=3
sample(c(3,5), 1) ) # r=4
X <- X + 1 }
times <- c(X, times) }
mean(times) }
func.time(1000)
#[1] 7.999
For another example of how to use switch with a numeric argument to EXPR, consider my answer to this question: R switch statement with varying outputs throwing error
Related
I am a R beginner and I tried to make a median function from scratch.
Here is my code:
mymedian <- function(x) {
len <- length(x)
sorted <- sort(x)
if (len %% 2 == 0) {
med1 <- sorted[len / 2]
med2 <- sorted[(len + 1) %/% 2]
result <- sorted[med1 + med2 / 2]
return(result)
} else {
result <- sorted[(len + 1)/2]
return(result)
}
}
Im getting "NA" output. I couldn't find where the problem is.
Main issue is you're trying to index your sorted vector with a non-integer (e.g., 168.5). Compare your function to this:
mymedian <- function(x){
len <- length(x)
sorted <-sort(x)
if(len%%2==0){
i <- len/2
med1<-sorted[i]
med2 <- sorted[i+1]
result <- sum(med1,med2)/2
return(result)
}else{
result<-sorted[(len+1)/2]
return(result)
}
}
I am working with a time-series raster brick. The brick has 365 layers representing a value for each day of the year.
I want to create a new layer in which each cell holds the number of day of year in which a certain condition is met.
My current approach is the following (APHRO being the raster brick), but returns the error message below:
enter code here
r <- raster(ncol=40, nrow=20)
r[] <- rnorm(n=ncell(r))
APHRO <- brick(x=c(r, r*2, r))
NewLayer <- calc(APHRO, fun=FindOnsetDate(APHRO))
Returning this error:
Error in .local(x, ...) : not a valid subset
And the function being parsed:
FindOnsetDate <- function (s) {
x=0
repeat {
x+1
if(s[[x]] >= 20 | s[[x]] + s[[x+1]] >= 20 & ChkFalseOnset() == FALSE)
{break}
}
return(x);
}
With the function for the 3rd condition being:
ChkFalseOnset <- function (x) {
for (i in 0:13){
if (sum(APHRO[[x+i:x+i+7]]) >= 5)
{return(FALSE); break}
return(TRUE)
}
}
Thank you in advance!!!!
And please let me know if I should provide more information - tried to keep it parsimonious.
The problem is that your function is no good:
FindOnsetDate <- function (s) {
x=0
repeat {
x+1
if(s[[x]] >= 20 | s[[x]] + s[[x+1]] >= 20)
{break}
}
return(x);
}
FindOnsetDate(1:100)
#Error in s[[x]] :
# attempt to select less than one element in get1index <real>
Perhaps something like this:
FindOnsetDate <- function (s) {
j <- s + c(s[-1], 0)
sum(j > 20 | s > 20)
# if all values are positive, just do sum(j > 20)
}
FindOnsetDate(1:20)
#10
This works now:
r <- calc(APHRO, FindOnsetDate)
I would suggest a basic two-step process. With a 365-days example:
set.seed(123)
r <- raster(ncol=40, nrow=20)
r_list <- list()
for(i in 1:365){
r_list[[i]] <- setValues(r,rnorm(n=ncell(r),mean = 10,sd = 5))
}
APHRO <- brick(r_list)
Use a basic logic test for each iteration:
r_list2 <- list()
for(i in 1:365){
if(i != 365){
r_list2[[i]] <- APHRO[[i]] >= 20 | APHRO[[i]] + APHRO[[i+1]] >= 20
}else{
r_list2[[i]] <- APHRO[[i]] >= 20
}
}
Compute sum by year:
NewLayer <- calc(brick(r_list2), fun=sum)
plot(NewLayer)
I have a problem with the Performance of my R Code.
My Code is very slow. I have to loop over a vector of 3000 elements. On every loop I call many functions.
I tried first with parallelization, but it doesn’t work. In every step I need the results previous steps.
Now I have an idea: I would divide the vector in 3 pieces of 1000 elements. And make the calculation of each piece by itself. On the first element of piece 1 and 2, I will have a problem, but I can handle it.
I would like to calculate each of the 3 pieces by a separate CPU-Core.
Actually I could make 3 .R-Files and start 3 R-Sessions (=3 Cores) and calculate it.
But I would like to do it in one file. I would like to define, that my first loop is going to be calculated by Core 1, and the other ones by the other Cores.
Is it possible?
Thank you.
This is an simple Example. It describes my problem.
#Situation now
vec3000 <- rnorm(3000)
result3000 <- rep(NA, length(vec3000))
for (i in 1 : 3000){
if (i == 1){
result3000[i] <- vec3000[i]
}else{
result3000[i] <- result3000[i - 1] + vec3000[i]
}
}
#New Situation
vec1000_1 <- vec3000[1:1000]
vec1000_2 <- vec3000[1001:2000]
vec1000_3 <- vec3000[2001:3000]
result1000_1 <- rep(NA, 1000)
result1000_2 <- rep(NA, 1000)
result1000_3 <- rep(NA, 1000)
#Calculated by Core 1
for (i in 1 : 1000){
if (i == 1){
result1000_1[i] <- vec1000_1[i]
}else{
result1000_1[i] <- result1000_1[i - 1] + vec1000_1[i]
}
}
#Calculated by Core 2
for (i in 1 : 1000){
if (i == 1){
result1000_2[i] <- vec1000_2[i]
}else{
result1000_2[i] <- result1000_2[i - 1] + vec1000_2[i]
}
}
#Calculated by Core 3
for (i in 1 : 1000){
if (i == 1){
result1000_3[i] <- vec1000_3[i]
}else{
result1000_3[i] <- result1000_3[i - 1] + vec1000_3[i]
}
}
Here's an example using the foreach package that operates on vector chunks in parallel:
library(doParallel)
library(itertools)
nworkers <- 3
cl <- makePSOCKcluster(nworkers)
registerDoParallel(cl)
vec3000 <- rnorm(3000) # dummy input
# This computes "resvecs" which is a list of "nworkers" vectors
resvecs <- foreach(vec=isplitVector(vec3000, chunks=nworkers)) %dopar% {
result <- double(length=length(vec))
for (i in seq_along(result)) {
if (i == 1) {
result[i] <- vec[i]
} else {
result[i] <- result[i - 1] + vec[i]
}
}
result
}
This uses the "isplitVector" function from the itertools package to split "vec3000" into three chunks to make use of three cores. You can change the value of "nworkers" to control the number of cores that are used.
Note that I used the doParallel backend so the example would work on Windows, Mac OS X and Linux.
I'd like to perform this function on a matrix 100 times. How can I do this?
v = 1
m <- matrix(0,10,10)
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
return(x)
}
}
This is what I have so far but doesn't work.
for (i in 1:100) {
rad(m)
}
I also tried this, which seemed to work, but gave me an output of like 5226 rows for some reason. The output should just be a 10X10 matrix with changed values depending on the conditions of the function.
reps <- unlist(lapply(seq_len(100), function(x) rad(m)))
Ok I think I got it.
The return statement in your function is only inside a branch of an if statement, so it returns a matrix with a probability of ~50% while in the other cases it does not return anything; you should change the code function into this:
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
}
return(x)
}
Then you can do:
for (i in 1:n) {
m <- rad(m)
}
Note that this is semantically equal to:
for (i in 1:n) {
tmp <- rad(m) # return a modified verion of m (m is not changed yet)
# and put it into tmp
m <- tmp # set m equal to tmp, then in the next iteration we will
# start from a modified m
}
When you run rad(m) is not do changes on m.
Why?
It do a local copy of m matrix and work on it in the function. When function end it disappear.
Then you need to save what function return.
As #digEmAll write the right code is:
for (i in 1:100) {
m <- rad(m)
}
You don't need a loop here. The whole operation can be vectorized.
v <- 1
m <- matrix(0,10,10)
n <- 100 # number of random replacements
idx <- sample(length(m), n, replace = TRUE) # indices
flip <- sample(c(-1, 1), n, replace = TRUE) # subtract or add
newVal <- aggregate(v * flip ~ idx, FUN = sum) # calculate new values for indices
m[newVal[[1]]] <- m[newVal[[1]]] + newVal[[2]] # add new values
I am trying write a function that generates simulated data but if the simulated data does not meet the condition, I need to skip it and if it does meet the condition, then I will apply the function summary.
I would like to loop it until I find 10 valid datasets and then stop. (I actually have to do this until it reaches 10000). Here is the code. The code sort of works except it does not stop. I think I probably placed the next and break function in the wrong place. I hope someone could help me write this together.
Another way I could approach this is to generate all the valid data first and then apply the function find_MLE (summary) to the final list.
Edit: I put break inside repeat. I edit the code to make it reproducible. Still the code keeps generating data and does not break.
here is a reproducible version
validData <- function(GM, GSD,sampleSize, p) {
count=0
for (i in 1:n) {
repeat {
lod <- quantile(rlnorm(1000000, log(GM), log(GSD)), p = p)
X_before <- rlnorm(sampleSize, log(GM), log(GSD))
Xs <- ifelse(X_before <= lod, lod, X_before)
delta <- ifelse(X_before <= lod, 1, 0)
pct_cens <- sum(delta)/length(delta)
print(pct_cens)
if (pct_cens == 0 & pct_cens ==1) next
else {
sumStats <- summary(Xs)
Med <- sumStats[3]
Ave <- sumStats[4]
}
count<- count+1
if (count == 10) break
}}
return(c(pct_cens, Med, Ave))
}
validData(GM=1,GSD=2,sampleSize=10,p=0.1)
Thanks for your help. I was able to write a function without using break function! I posted it here in case other people might find it helpful.
dset <- function (GM, GSD, n, p) {
Mean <- array()
Median <- array()
count = 0
while(count < 10) {
lod <- quantile(rlnorm(1000000, log(GM), log(GSD)), p = p)
X_before <- rlnorm(n, log(GM), log(GSD))
Xs <- ifelse(X_before <= lod, lod, X_before)
delta <- ifelse(X_before <= lod, 1, 0)
pct_cens <- sum(delta)/length(delta)
print(pct_cens)
if (pct_cens == 0 | pct_cens == 1 ) next
else {count <- count +1
if (pct_cens > 0 & pct_cens < 1) {
sumStats <- summary(Xs)
Median[count] <- sumStats[3]
Mean [count]<- sumStats[4]
print(list(pct_cens=pct_cens,Xs=Xs, delta=delta, Median=Median,Mean=Mean))
}
}
}
return(data.frame( Mean=Mean, Median=Median)) }
Since your code isn't replicable, I cannot fully test and debug your code, but here is what I think it would look like without being able to replicate with an MLE function. This is roughly how I would set it up. But check out the documentation/Google on break, next, for/while loops related to R when testing your code.
validData <- function(GM, GSD,Size, p) {
for (i in 1:20) {
count <- 1
repeat {
lod <- quantile(rlnorm(1000000, log(GM), log(GSD)), p = p)
X_before <- rlnorm(Size, log(GM), log(GSD))
Xs <- ifelse(X_before <= lod, lod, X_before)
delta <- ifelse(X_before <= lod, 1, 0)
pct_cens <- sum(delta)/length(delta)
if (pct_cens == 0 & pct_cens ==1)
function() #your foo goes here
else {
mles <- find_MLE(c(0,0), Xs, delta)
GM_est <- mles[1]
GSD_est <- mles[2]
AM_est <- exp(log(GM_est) + 1 )
SD_est<- sqrt((AM_est)^2*exp(log(GSD_est)^2))
D95th_est <- GM_est*(GSD_est^1.645)
} }
return(c(GM_est,GSD_est,AM_est,SD_est,D95th_est))
count<- count+1
if (count == 10) break
}}
To skip to the outer loop based on a condition, simply use break()
Here's a simple example where the inner loop will try to run 10 times, but a condition will usually be met which prevents it
# OUTER LOOP
for(i in 1:2) {
print(paste("Outer loop iteration", i))
# INNER LOOP (will run max 10 times)
for(j in 1:10) {
print(paste("Inner loop iteration", j))
if (runif(1) > 0.4) { # Randomly break the inner loop
print(paste("Breaking inner loop", j))
break()
}
}
}
If you want to skip to the outer loop when there's an error (rather than based on a condition), see here