I'm currently working on a two-host epidemiological SI model. That is, a compartmental model with no recovery compartment.
I'm still relatively new to R, but am developing a decent understanding after mostly using MATLAB. However, the thing I am having issues with finding any helpful resources on is how to vary two different input parameters so I can examine them and maybe even 3-D plot these variables or phase plot them to see if the population dies off.
So, more specifically I want to produce results when varying mu between 0 and 1, and alpha between 0 and 1, I could just "plug and play" but I want to be able to show a more dynamic result and think it would be handy to have as a tool in my wheel-house.
Anyway, here is the code I have so far:
# Here we will load the required packages for the assignment
library(deSolve)
library(ggplot2)
# Here we the two-host (male & female) SI model
KModel <- function(time, state, params){
with(as.list(c(state, params)),{
N <- SF+IF+SM+IM
dSF <- r*(SF+alpha*IF)-r*N*SF-BFM*(SF*IM)/N
dIF <- (BFM*(SF*IM)/N)-r*N*IF-mu*IF
dSM <- r*(SF+alpha*IF)-r*N*SM-BMF*(SM*IF)/N
dIM <- (BMF*(SM*IF)/N)-r*N*IM-mu*IM
return(list(c(dSF, dIF, dSM, dIM)))
})
}
# here are the initial parameters
r = 0.2
BFM = 1.2
BMF = 1
mu = 0
alpha = 0
params<-c(r,BFM,BMF,mu,alpha)
initial_state<-c(SF=0.49 ,IF=0.01, SM=0.49,IM=0.01)
times<-0:60
# Here we use ode() to numerically solve the system
out1<-ode(y=initial_state, times=times, func=KModel, parms=params, method="ode23")
out<-as.data.frame(out1)
plot(out1)
So, I think I have a pretty good "skeleton" for solving any single solution for a compartmental model, however Like I mentioned I'd like to be able to vary two of the parameters to examine specific scenarios.
Thanks!
After reading the question a first time, it was not completely clear to me if the parameters should vary over time or if scenarios for different parameter combinations are intended. For the first case, several posts exist already on StackOverflow, e.g. https://stackoverflow.com/a/69846444/3677576 or Modifying SIR model to include stochasticity.
If the influence of parameters should be evaluated in form of scenarios, one may consider nested loops. As a more compact alternative, one can create a matrix with all desired parameter combinations using expand.grid. Then one can use an apply function, e.g. lapply. The (temporary) output tmp is then a list of matrices, that can be converted into a big data frame with the common do.call() approach (see relevant SO posts about this).
This is then joined together with the parameter matrix and forms a suitable data structure for ggplot.
Note also that I used the default solver lsoda instead of ode23, because it is more precise and efficient.
library(deSolve)
library(ggplot2)
library(dplyr)
KModel <- function(time, state, params){
with(as.list(c(state, params)),{
N <- SF+IF+SM+IM
dSF <- r*(SF+alpha*IF)-r*N*SF-BFM*(SF*IM)/N
dIF <- (BFM*(SF*IM)/N)-r*N*IF-mu*IF
dSM <- r*(SF+alpha*IF)-r*N*SM-BMF*(SM*IF)/N
dIM <- (BMF*(SM*IF)/N)-r*N*IM-mu*IM
return(list(c(dSF, dIF, dSM, dIM)))
})
}
times <- 0:60
parms <- expand.grid(mu = c(0, 0.1, 0.2, 0.3, 1),
alpha = seq(0, 1, 0.1),
r = 0.2,
BFM = 1.2,
BMF = 1)
initial_state <- c(SF = 0.49, IF = 0.01, SM = 0.49, IM = 0.01)
## run all simulations and store it as list of matrices
tmp <- lapply(1:nrow(parms), function(i)
cbind(run = i,
ode(y = initial_state, times = times,
func = KModel, parms = parms[i,])
)
)
## convert list of matrices to single data frame
out <- as.data.frame(do.call("rbind", tmp))
## add run number to parameter table
parms <- as.data.frame(cbind(parms, run = 1:nrow(parms)))
## join the two tables together and create plots
out %>%
left_join(parms, by = "run") %>%
ggplot(aes(time, IF)) + geom_line() + facet_grid(mu ~ alpha)
## or with colors
out %>%
left_join(parms, by = "run") %>%
mutate(alpha = factor(alpha)) %>%
ggplot(aes(time, IF, color = alpha)) + geom_line() + facet_grid( ~ mu)
Related
I am trying to write a for loop that will generate a correlation for a fixed column (LPS0) vs. all other columns in the data set. I don't want to use a correlation matrix because I only care about the correlation of LPS0 vs all other columns, not the correlations of the other columns with themselves. I then want to include an if statement to print only the significant correlations (p.value <= 0.05). I ran into some issues where some of the p.values are returned as NA, so I switched to an if_else loop. However, I am now getting an error. My code is as follows:
for(i in 3:ncol(microbiota_lps_0_morm)) {
morm_0 <- cor.test(microbiota_lps_0_morm$LPS0, microbiota_lps_0_morm[[colnames(microbiota_lps_0_morm)[i]]], method = "spearman")
if_else(morm_0$p.value <= 0.05, print(morm_0), print("Not Sig"), print("NA"))
}
The first value is returned, and then the loop stops with the following error:
Error in if_else():
! true must be length 1 (length of condition), not 8.
Backtrace: 1. dplyr::if_else(morm_0$p.value <= 0.05, print(morm_0), print("Not Sig"), print("NA"))
How can I make the loop print morm only when p.value <- 0.05?
Here's a long piece of code which aytomates the whole thing. it might be overkill but you can just take the matrix and use whatever you need. it makes use of the tidyverse.
df <- select_if(mtcars,is.numeric)
glimpse(df)
# keeping real names
dict <- cbind(original=names(df),new=paste0("v",1:ncol(df)))
# but changing names for better data viz
colnames(df) <- paste0("v",1:ncol(df))
# correlating between variables + p values
pvals <- list()
corss <- list()
for (coln in colnames(df)) {
pvals[[coln]] <- map(df, ~ cor.test(df[,coln], .)$p.value)
corss[[coln]] <- map(df, ~ cor(df[,coln], .))
}
# Keeping both matrices in a list
matrices <- list(
pvalues = matrix(data=unlist(pvals),
ncol=length(names(pvals)),
nrow=length(names(pvals))),
correlations = matrix(data=unlist(corss),
ncol=length(names(corss)),
nrow=length(names(corss)))
)
rownames(matrices[[1]]) <- colnames(df)
rownames(matrices[[2]]) <- colnames(df)
# Creating a combined data frame
long_cors <- expand.grid(Var1=names(df),Var2=names(df)) %>%
mutate(cor=unlist(matrices["correlations"]),
pval=unlist(matrices["pvalues"]),
same=Var1==Var2,
significant=pval<0.05,
dpcate=duplicated(cor)) %>%
# Leaving no duplicants, non-significant or self-correlation results
filter(same ==F,significant==T,dpcate==F) %>%
select(-c(same,dpcate,significant))
# Plotting correlations
long_cors %>%mutate(negative=cor<0) %>%
ggplot(aes(x=Var1,y=Var2,
color=negative,size=abs(cor),fill=Var2,
label=round(cor,2)))+
geom_label(show.legend = F,alpha=0.2)+
scale_color_manual(values = c("black","darkred"))+
# Sizing each correlation by it's magnitude
scale_size_area(seq(1,100,length=length(unique(long_cors$Var1))))+ theme_light()+
theme(axis.text = element_text(face = "bold",size=12))+
labs(title="Correlation between variables",
caption = "p < 0.05")+xlab("")+ylab("")
If you want to correlate a column of a matrix with the remaining columns, you can do so with one function call:
mtx <- matrix(rnorm(800), ncol=8)
cor(mtx[,1], mtx[,-1])
However, you will not get p-values. For getting p-values, I would recommend this approach:
library(tidyverse)
significant <- map_dbl(2:ncol(mtx),
~ cor.test(mtx[,1], mtx[,.], use="p", method="s")$p.value)
Whenever you feel like you need a for loop in R, chances are, you should be using another approach. for is a very un-R construct, and R gives many better ways of handling the same issues. map_* family of functions from tidyverse is but one of them. Another approach, in base R, would be to use apply:
significant <- apply(mtx[,-1], 2,
\(x) cor.test(x, mtx[,1], method="s", use="p")$p.value)
I have a simulated data created like this:
average_vector = c(0,0,25)
sigma_matrix = matrix(c(4,1,0,1,8,0,0,0,9),nrow=3,ncol=3)
set.seed(12345)
data0 = as.data.frame(mvrnorm(n =20000, mu = average_vector, Sigma=sigma_matrix))
names(data0)=c("hard","smartness","age")
set.seed(13579)
data0$final=0.5*data0$hard+0.2*data0$smartness+(-0.1)*data0$age+rnorm(n=dim(data0)[1],mean=90,sd=6)
Now, I want to randomly sample 50 students 1,000 times (1,000 sets of 50 people), I used this code:
datsub<-(replicate(1000, sample(1:nrow(data0),50)))
After that step, I encountered a issue: I want to ask if I want to run a regression model with the 50 selected people (1,000 times), and record/store the point estimates of “hard” from model 4, where is given like this:
model4 = lm(formula = final ~ hard + smartness + age, data = data0), and plot the variation around the line of 0.5 (true value), is there any way I can achieve that? Thanks a lot!
I would highly suggest looking into either caret or the newer (and still maintained) TidyModels if you're just getting into R modelling. Either of these will make your life easier, once you get used to the dplyr-like syntax.
What you're trying to do is bootstrapping. Here is the manual approach using only base functions.
n <- nrow(data0)
k <- 1000
ns <- 50
samples <- replicate(k, sample(seq_len(n), ns))
params <- vector('list', k)
for(i in seq_len(n)){
params[[i]] <- coef( lm(formula = final ~ hard + smartness + age, data = data0[samples[, i],]) )
}
# merge params into columns
params <- do.call(rbind, params)
# Create plot from here.
plot(x = seq_len(n), y = params[, "hard"])
abline(h = 0.5)
Note the above may have a few typos as your example is not reproducible.
I am trying to understand Neural Networks better so I am trying to implement a simple perceptron from scratch in R. I know that this is very inefficient as there are many libraries that do this extemely well optimized but my goal is to understand the basics of neural networks better and work my way forward to more complex models.
I have created some artificial test data with a very simple linear decision boundary and split this into a training set and a test set. I then ran a logistic regression on the training data and checked the predictions from the test-set and got +99% accuray, which was to be expected given the simple nature of the data. I then tried implementing a perceptron with 2 inputs, 1 neuron, 1000 iterations, a learning rate of 0.1 and a sigmoid activation function.
I would expect to get very similar accuracy to the logistic regression model but my results are a lot worse (around 70% correct classifications in the training set). so I definitly did something wrong. The predictions only seem to get better after the first couple of iterations and then just go back and forth around a specific value (I tried with many different learning rates, no success). I'm attaching my script and I#m thankful for any advice! I think the problem lies in the calculation of the error or the weight adjustment but I can't put my finger on it...
### Reproducible Example for StackOverflow
#### Setup
# loading libraries
library(data.table)
#remove scientifc notation
options(scipen = 999)
# setting seed for random number generation
seed <- 123
#### Selfmade Test Data
# input points
x1 <- runif(10000,-100,100)
x2 <- runif(10000,-100,100)
# setting decision boundary to create output
output <- vector()
output[0.5*x1 + -1.2*x2 >= 50] <- 0
output[0.5*x1 + -1.2*x2 < 50] <- 1
# combining to dataframe
points <- cbind.data.frame(x1,x2,output)
# plotting all data points
plot(points$x1,points$x2, col = as.factor(points$output), main = "Self-created data", xlab = "x1",ylab = "x2")
# split into test and training sets
trainsize = 0.2
set.seed(seed)
train_rows <- sample(1:dim(points)[1], size = trainsize * dim(points)[1])
train <- points[train_rows,]
test <- points[-c(train_rows),]
# plotting training set only
plot(train$x1,train$x2, col = as.factor(train$output), main = "Self-created data (training set)", xlab = "x1",ylab = "x2")
#### Approaching the problem with logistic regression
# building model
train_logit <- glm(output ~ x1 + x2, data = train, family = "binomial", maxit = 10000)
summary(train_logit)
# testing performance in training set
table(round(train_logit$fitted.values) == train$output)
# testing performance of train_logit model in test set
table(test$output == round(predict(train_logit,test[,c(1,2)], type = "response")))
# We get 100% accuracy in the training set and near 100% accuracy in the test set
#### Approaching Problem with a Perceptron from scratch
# setting inputs, outputs and weights
inputs <- as.matrix(train[,c(1,2)])
output <- as.matrix(train[,3])
set.seed(123456)
weights <- as.matrix(runif(dim(inputs)[2],-1,1))
## Defining activation function + derivative
# defining sigmoid and it's derivative
sigmoid <- function(x) {1 / (1 + exp(-x))}
sig_dir <- function(x){sigmoid(x)*(1 - sigmoid(x))}
## Perceptron nitial Settings
bias <- 1
# number of iterations
iterations <- 1000
# setting learning rate
alpha <- 0.1
## Perceptron
# creating vectors for saving results per iteration
weights_list <- list()
weights_list[[1]] <- weights
errors_vec <- vector()
outputs_vec <- vector()
# saving results across iterations
weights_list_all <- list()
outputs_list <- list()
errors_list <- list()
# looping through the backpropagation algorithm "iteration" # times
for (j in 1:iterations) {
# Loop for backpropagation with updating weights after every datapoint
for (i in 1:dim(train)[1]) {
# taking the weights from the last iteration of the outer loop as a starting point
if (j > 1) {
weights_list[[1]] <- weights
}
# Feed Forward (Should we really round this?!)
output_pred <- round(sigmoid(sum(inputs[i,] * as.numeric(weights)) + bias))
error <- output_pred - output[i]
# Backpropagation (Do I need the sigmoid derivative AND a learning rate? Or should I only take one of them?)
weight_adjustments <- inputs[i,] * (error * sig_dir(output_pred)) * alpha
weights <- weights - weight_adjustments
# saving progress for later plots
weights_list[[i + 1]] <- weights
errors_vec[i] <- error
outputs_vec[[i]] <- output_pred
}
# saving results for each iteration
weights_list_all[[j]] <- weights_list
outputs_list[[j]] <- outputs_vec
errors_list[[j]] <- errors_vec
}
#### Formatting Diagnostics for easier plotting
# implementing empty list to transform weightslist
WeightList <- list()
# collapsing individual weightslist into datafames
for (i in 1:iterations) {
WeightList[[i]] <- t(data.table::rbindlist(weights_list_all[i]))
}
# pasting dataframes together
WeightFrame <- do.call(rbind.data.frame, WeightList)
colnames(WeightFrame) <- paste("w",1:dim(WeightFrame)[2], sep = "")
# pasting dataframes together
ErrorFrame <- do.call(rbind.data.frame, errors_list)
OutputFrame <- do.call(rbind.data.frame, outputs_list)
##### Plotting Results
# Development of Mean Error per iteration
plot(rowMeans(abs(ErrorFrame)),
type = "l",
xlab = "Sum of absolute Error terms")
# Development of Weights over time
plot(WeightFrame$w1, type = "l",xlim = c(1,dim(train)[1]), ylim = c(min(WeightFrame),max(WeightFrame)), ylab = "Weights", xlab = "Iterations")
lines(WeightFrame$w2, col = "green")
# lines(WeightFrame$w3, col = "blue")
# lines(WeightFrame$w4, col = "red")
# lines(WeightFrame$w5, col = "orange")
# lines(WeightFrame$w6, col = "cyan")
# lines(WeightFrame$w7, col = "magenta")
# Empty vector for number of correct categorizations per iteration
NoCorr <- vector()
# Computing percentage of correct predictions per iteration
colnames(OutputFrame) <- paste("V",1:dim(OutputFrame)[2], sep = "")
Output_mat <- as.matrix(OutputFrame)
for (i in 1:iterations) {
NoCorr[i] <- sum(output == Output_mat[i,]) / nrow(train)
}
# plotting number of correct predictions per iteration
plot(NoCorr, type = "l")
# Performance in training set after last iteration
table(output,round(OutputFrame[iterations,]))
First of all, welcome to the world of Neural Networks :).
Secondly, I want to recommend a great article to you, which I personally used to get a better understanding of backtracking and the whole NN learning stuff: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/. Might be a bit rough to get through sometimes, and for the general implementation I think it is much easier to follow pseudocode from a NN book. However, to understand what is going on this is article is very nice!
Thirdly, I will hopefully solve your problem :)
You comment yourself already with whether you should really round that output_pred. Yes you should.. if you want to use that output_pred to make a prediction! However, if you want to use it for learning it is generally not good! The reason for this is that if you round it for learning, than an output which was rounded up from 0.51 to 1 with target output 1 will not learn anything as the output was the same as the target and thus is perfect. However, 0.99 would have been a lot better of a prediction than 0.51 and thus there is definitely something to learn!
I am not 100% sure if this solves all your problems (im not an R programmer) and gets your accuracy up to 99%, but it should solve some of it, and hopefully the intuition is also clear :)
This is my reproducible example :
#http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/
library("neuralnet")
require(ggplot2)
traininginput <- as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
Hidden_Layer_1 <- 1 # value is randomly assigned
Hidden_Layer_2 <- 1 # value is randomly assigned
Threshold_Level <- 0.1 # value is randomly assigned
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=c(Hidden_Layer_1, Hidden_Layer_2), threshold = Threshold_Level)
#Test the neural network on some test data
testdata <- as.data.frame((1:13)^2) #Generate some squared numbers
net.results <- predict(net.sqrt, testdata) #Run them through the neural network
cleanoutput <- cbind(testdata,sqrt(testdata),
as.data.frame(net.results))
colnames(cleanoutput) <- c("Input","ExpectedOutput","NeuralNetOutput")
ggplot(data = cleanoutput, aes(x= ExpectedOutput, y= NeuralNetOutput)) + geom_point() +
geom_abline(intercept = 0, slope = 1
, color="brown", size=0.5)
rmse <- sqrt(sum((sqrt(testdata)- net.results)^2)/length(net.results))
print(rmse)
At here, when my Hidden_Layer_1 is 1, Hidden_Layer_2 is 2, and the Threshold_Level is 0.1, my rmse generated is 0.6717354.
Let's say we try for the other example,
when my Hidden_Layer_1 is 2, Hidden_Layer_2 is 3, and the Threshold_Level is 0.2, my rmse generated is 0.8355925.
How can I create a table that will automatically calculate the value of rmse when user assign value to the Hidden_Layer_1, Hidden_Layer_2, and Threshold_Level. ( I know how to do it in Excel but not in r haha )
The desired table should be looked like this :
I wish that I have Trial(s), Hidden_Layer_1, Hidden_Layer_2, Threshold_Level, and rmse in my column, and the number of rows can be generated infinitely by entering some actionButton (if possible), means user can keep on trying until they got the rmse they desired.
How can I do that? Can anyone help me? I will definitely learn from this lesson as I am quite new to r.
Thank you very much for anyone who willing to give a helping hand to me.
Here is a way to create the table of values that can be displayed with the data frame viewer.
# initialize an object where we can store the parameters as a data frame
data <- NULL
# function to receive a row of parameters and add them to the
# df argument
addModelElements <- function(df,trial,layer1,layer2,threshold,rmse){
newRow <- data.frame(trial = trial,
Hidden_Layer_1 = layer1,
Hidden_Layer_2 = layer2,
Threshold = threshold,
RMSE = rmse)
rbind(df,newRow)
}
# once a model has been run, call addModelElements() with the
# model parameters
data <- addModelElements(data,1,1,2,0.1,0.671735)
data <- addModelElements(data,2,2,3,0.2,0.835593)
...and the output:
View(data)
Note that if you're going to create scores or hundreds of rows of parameters & RMSE results before displaying any of them to the end user, the code should be altered to improve the efficiency of rbind(). In this scenario, we build a list of sets of parameters, convert them into data frames, and use do.call() to execute rbind() only once.
# version that improves efficiency of `rbind()
addModelElements <- function(trial,layer1,layer2,threshold,rmse){
# return row as data frame
data.frame(trial = trial,
Hidden_Layer_1 = layer1,
Hidden_Layer_2 = layer2,
Threshold = threshold,
RMSE = rmse)
}
# generate list of data frames and rbind() once
inputParms <- list(c(1,1,2,0.1,0.671735),
c(1,1,2,0.3,0.681935),
c(2,2,3,0.2,0.835593))
parmList <- lapply(inputParms,function(x){
addModelElements(x[1],x[2],x[3],x[4],x[5])
})
# bind to single data frame
data <- do.call(rbind,parmList)
View(data)
...and the output:
I'd like to evaluate the time to extract data from a raster time series using different file types (geotiff, binary) or objects (RasterBrick, RasterStack). I created a function that will extract the time series from a random point of the raster object and I then use microbenchmark to test it.
Ex.:
# read a random point from a raster stack
sample_raster <- function(stack) {
poi <- sample(ncell(stack), 1)
raster::extract(stack, poi)
}
# opening the data using different methods
data_stack <- stack(list.files(pattern = '3B.*tif'))
data_brick <- brick('gpm_multiband.tif')
bench <- microbenchmark(
sample_stack = sample_raster(data_stack),
sample_brick = sample_raster(data_brick),
times = 10
)
boxplot(bench)
# this fails because sampled point is different
bench <- microbenchmark(
sample_stack = sample_raster(data_stack),
sample_brick = sample_raster(data_brick),
times = 10,
check = 'equal'
)
I included a sample of my dataset here
With this I can see that sampling on RasterBrick is faster than stacks (R Raster manual also says so -- good). The problem is that I'm sampling at different points at each evaluated expression. So I can't check if the results are the same. What I'd like to do is sample at the same location (poi) on both objects. But have the location be different for each iteration. I tried to use the setup option in microbenchmark but from what I figured out, the setup is evaluated before each function is timed, not once per iteration. So generating a random poi using the setup will not work.
Is it possible to pass the same argument to the functions being evaluated in microbenchmark?
Result
Solution using microbenchmark
As suggested (and explained bellow), I tried the bench package with the press call. But for some reason it was slower than setting the same seed at each microbenchmark iteration, as suggested by mnist. So I ended up going back to microbenchmark. This is the code I'm using:
library(microbenchmark)
library(raster)
annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')
x <- 0
y <- 0
bm <- microbenchmark(
ext = {
x <- x + 1
set.seed(x)
poi = sample(raster_size, 1)
raster::extract(annual_brick, poi)
},
slc = {
y <- y + 1
set.seed(y)
poi = sample(raster_size, 1)
raster::extract(annual_stack, poi)
},
check = 'equal'
)
Solution using bench::press
For completeness sake, this was how I did, using the bench::press. In the process, I also separated the code for selecting the random cell from the point sampling function. So I can time only the point sampling part of the code. Here is how I'm doing it:
library(bench)
library(raster)
annual_brick <- raster::brick('data/gpm_tif_annual/gpm_2016.tif')
annual_stack <- raster::stack('data/gpm_tif_annual/gpm_2016.tif')
bm <- bench::press(
pois = sample(ncell(annual_brick), 10),
mark(
iterations = 1,
sample_brick = raster::extract(annual_brick, pois),
sample_stack = raster::extract(annual_stack, pois)
)
)
My approach would be to set the same seats for each option in microbenachmark but change them prior to each function call. See the output and how the same seats are used for both calls eventually
x <- 0
y <- 0
microbenchmark::microbenchmark(
"checasdk" = {
# increase seat value by 1
x <- x + 1
print(paste("1", x))
set.seed(x)},
"check2" = {
y <- y + 1
print(paste("2", y))
set.seed(y)
}
)
If I understand correctly, the OP has two requirements:
The same data points should be sampled when timing the two expressions in order to check the results are identical.
In addition, timing of the two expressions is to be repeated for different data points sampled.
Using the same random numbers
As suggested by Roman, set.seed() can be used to set the seed values for R's random number generator. If the same parameter is used, the sequence of generated random numbers will be the same.
sample_raster() can be modified to ensure that the random number generator will be initiliased for each call.
sample_raster <- function(stack) {
set.seed(1L)
poi <- sample(ncell(stack), 1)
raster::extract(stack, poi)
}
This will met requirement 1 but not requirement 2 as the same data samples will be used for all repetitions.
Different random numbers in repetitions
The OP has asked:
Is it possible to pass the same argument to the functions being
evaluated in microbenchmark?
One possibility is to use for or lapply() to loop over a sequence of seed values as suggested in answers to a similar question.
In this case, I suggest to use the bench package for benchmarking. It has a press() function which runs bench::mark() across a grid of parameters.
For this, sample_raster() gets a second parameter:
sample_raster <- function(stack, seed) {
set.seed(seed)
poi <- sample(ncell(stack), 1L)
# cat(substitute(f), s, poi, "\n") # just to check, NOT to use for timings
raster::extract(stack, poi)
}
The timings are executed for different seeds as given in vector seed_vec.
library(bench)
bm <- press(
seed_vec = 1:10,
mark(
iterations = 1L,
sample_stack = sample_raster(data_stack, seed_vec),
sample_brick = sample_raster(data_brick, seed_vec)
)
)
Note that the length of seed_vec determines the number of repetitions with different poi, now. The iterations parameter to mark() specifies how often the timings are to be repeated for the same seed / poi.
The results can be plotted using
library(ggplot2)
autoplot(bm)
or summarized using
library(dplyr)
bm %>%
group_by(expression = expression %>% as.character()) %>%
summarise(median = median(median), n_itr = n())