How can I keep a conditional Object available within a function? - r

I am trying to make a simulation script to generate data from a Stan model for different numbers of subjects. I first use the model to simulate responses, then I use those simulated responses to re-fit the model.
My current approach is to generate the simulated responses on the first run of the function (when seed == 1) and assign the result to the global environment.
Is there a way to skip assigning the output of to the global environment but have that result be accessible for the second and later runs?
sim_data_fit <- function(nSim, nSubj) {
set.seed(nSim)
X <- as.matrix(data.frame(
Var1 = sample(c(0,1), nSubj, replace = TRUE),
Var2 = sample(c(0,1), nSubj, replace = TRUE),
Var3 = scale(rnorm(nSubj, mean = 50, sd = 2), center = TRUE, scale = TRUE)))
if (nSim == 1) {
sim_out <<- sampling(Sim_Mod,
data = list(n = nSubj,
k = ncol(X),
X = X,
Y = rnorm(nSubj, mean = 65, sd = 10),
run_estimation = 0))
}
sim_data <- sim_out %>%
as.data.frame %>%
select(contains("y_sim")) %>%
.[, sample(ncol(.), nSubj)] %>%
apply(., 2, sample, size = 1) %>%
as_tibble() %>%
rename("y_sim" = value)
sampling(Sim_Mod,
data = list(n = nSubj,
k = ncol(X),
X = X,
Y = sim_data$y_sim,
run_estimation = 1))
}
sim_out is what I'm assigning only on the first run (I think). If I don't, I get an error that sim_out doesn't exist.

Related

How to provide group-wise boundaries for parameters in modelling using R nls_multstart?

I am new to using the purrr package in R and I am struggling with trying to pass a further argument to a function inside nls_multstart.
I have a nested data frame that contains data for different combinations of grouping variables.
I want to fit the same model to the data of each combinations of groups in the nested data frame.
So far, I was able to fit the model to each data.
# model
my_model <- function(ymax, k, t) {
ymax * (1 - exp(-k*t))
}
# data
t = seq(from = 1, to = 100, by = 1)
y1 = unlist(lapply(t, my_model, ymax = 500, k = 0.04))
y2 = unlist(lapply(t, my_model, ymax = 800, k = 0.06))
y = c(y1, y2)
a <- rep(x = "a", times = 100)
b <- rep(x = "b", times = 100)
groups <- c(a, b)
df <- data.frame(groups, t, y)
nested <- df %>%
group_by(groups) %>%
nest() %>%
rowwise() %>%
ungroup() %>%
mutate(maximum = map_dbl(map(data, "y"), max))
# set staring values
l <- c(ymax = 100 , k = 0.02)
u <- c(ymax = 300, k = 0.03)
# works, but without group-specific lower and upper boundaries
# fit the model
fit <- nested %>%
mutate(res = map(.x = data,
~ nls_multstart(y ~ my_model(ymax, k, t = t),
data = .x,
iter = 20,
start_lower = l,
start_upper = u,
supp_errors = 'N',
na.action = na.omit)))
However, when trying to use the value in column maximum as a group-specific boundary, R throws the following error:
# using group-specific boundary does not work
# fit the model
fit2 <- nested %>%
mutate(res = map(.x = data,
~ nls_multstart(y ~ my_model(ymax, k, t = t),
data = .x,
iter = 20,
start_lower = l,
start_upper = u,
supp_errors = 'N',
na.action = na.omit,
lower = c(maximum, 0),
upper = c(maximum*1.2, 1))))
Error in nls.lm(par = start, fn = FCT, jac = jac, control = control, lower = lower, :
length(lower) must be equal to length(par)
Can anybody give a hint how to improve on that?

Using dplyr inside map function on nested dataframe

map a custom function to a nested daframe
I am trying to map a custom function to a nested daframe. This function used dplyr and pipe "%>%" operator.
I have tried to enquo the varaibles used inside the function but it keeps on not working. Not sure how to do this.
table <- expand.grid(x = c("bird", "dogs"), year = c(2018,2019), week= c(1:52)) %>%
mutate(christmas = case_when(week == 52 ~1, TRUE ~ 0),
ev = case_when(week == 30 ~ 1, TRUE ~ 0),
alpha = rnorm(n = 208, mean = 10))
myfun_model_alpha <- function(time_s, param, yr, wk){
event <- time_s %>% select(christmas,ev )
time_s <- ts(time_s$alpha, start = c(min(time_s$year),min(time_s$week[time_s$year == min(time_s$year)])),end = c(max(time_s$year),max(time_s$week[time_s$year == max(time_s$year)])), frequency = 52)
#time_s <- ts(time_s$alpha, start = c(2017,01),end = c(2019,20), frequency = 52)
ts_vec <- window(time_s, end = c(yr, wk))
leng <- length(ts_vec)
lambda <- BoxCox.lambda(ts_vec)
model <- auto.arima(ts_vec, lambda = lambda,
biasadj = TRUE,
xreg = matrix(c(fourier(ts_vec, K=param),
event$christmas[1:leng],
event$ev[1:leng]),
nrow = leng,
ncol = param*2+2),
seasonal = FALSE)
forecast <- forecast(model,
16,
xreg=matrix(c(fourier(ts_vec, K=param, h=16),
event$christmas[(leng+1):(leng+16)],
event$ev[(leng+1):(leng+16)]),
nrow = 16,
ncol = param*2+2))$mean
pred = tibble(forecast = forecast,
time_index = 1:16,
actual = window(time_s, start= c(yr,wk))[2:17])
return(pred)
}
# Applying the function t the nested df
table %>%
group_by(x) %>%
nest() %>%
mutate(data = map(data, ~arrange(.x, year, week)),
model = map(data, ~myfun_model_alpha(.,2,2019, 12))) %>%
unnest(model)
It returns errors like: "Error in (function (x) : object 'christmas' not found"
I am trying to get a prediction for the 16 wks horizon with actual values (if available)

In R & dabestr, how do I get grouped differences correctly?

Using dabestr package I'm trying to get the differences between two sets of control & test data. Moifying slightly example from help file I tried:
library(dabestr)
N <- 70
c1 <- rnorm(N, mean = 50, sd = 20)
t1 <- rnorm(N, mean = 200, sd = 20)
ID <- seq(1:N)
long.data <- tibble::tibble(ID = ID, Control1 = c1, Test1 = t1)
meandiff1 <- long.data %>%
tidyr::gather(key = Group, value = Measurement, Control1:Test1)
ID <- seq(1:N) + N
c2 <- rnorm(N, mean = 100, sd = 70)
t2 <- rnorm(N, mean = 100, sd = 70)
long.data <- tibble::tibble(ID = ID, Control2 = c2, Test2 = t2)
meandiff2 <- long.data %>%
tidyr::gather(key = Group, value = Measurement, Control2:Test2)
meandiff <- dplyr::bind_rows(meandiff1, meandiff2)
paired_mean_diff <-
dabest(meandiff, x = Group, y = Measurement,
idx = c("Control1", "Test1", "Control2", "Test2"),
paired = TRUE,
id.col = ID)
plot(paired_mean_diff)
I get these results:
So not only is everything compared to Control1 but also the paired = TRUE option seems to have no effect. I was hoping to get something similar to examples from the package page:
Any pointers on how to achieve that?
For a paired plot, you want to nest the idx keyword option as such:
paired_mean_diff <-
dabest(meandiff, x = Group, y = Measurement,
idx = list(c("Control1", "Test1"),
c("Control2", "Test2")),
paired = TRUE,
id.col = ID)

How to map the mean over bootstrapped samples in a tidy data frame

I'm trying to map a function that will calculate p_hat from the bootstrap samples I mapped previously in my data frame. I am having difficulty with my function or my use of map, which I can say is a work in progress.
library(tidyverse)
library(rsample)
ttSample <- data.frame(grad = c(0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
bootstrapper <- function(n) {bootstraps(data = ttSample, times = n)}
meanTaker <- function(columnVal)
{
for(i in 1:length(columnVal))
{
meanHolder <- vector("double", nrow(columnVal[[i]]))
for(j in 1:nrow(columnVal[[i]]))
{
meanHolder[[j]] <- mean(as.data.frame(columnVal$splits[[j]])$grad)
}
mean(meanHolder)
}
}
bootFrame <- data.frame(n = rep(c(250, 1000, 5000, 10000), 3),
confLev = rep(c(0.9, 0.95, 0.99))) %>%
arrange(n, confLev) %>%
mutate(alpha = 1 - confLev,
upperCI = confLev + (alpha / 2),
lowerCI = confLev - (alpha / 2),
samples = map(ttSample, list),
boots = map(.x = .$n, .f = bootstrapper)) %>%
mutate(p_hat = map(.x = .$boots, .f = meanTaker))
You need to remember that map applies the function to each element of the input, not the entire input. Having that in mind, we can rewrite meanTaker as
meanTaker <- function(boot) {
grads <- sapply(boot$splits, function (split) split$data$grad)
mean(grads)
}

randomize observations by groups (blocks) without replacement

This is a follow up question. The answers in the previous question are doing the random sampling with replacement. How can I change the code so that I assign each observation to on of J "urn" without putting the observation back in the 'lottery'?
This is the code I have right now:
set.seed(9782)
I <- 500
g <- 10
library(dplyr)
anon_id <- function(n = 1, lenght = 12) {
randomString <- c(1:n)
for (i in 1:n)
{
randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
lenght, replace = TRUE),
collapse = "")
}
return(randomString)
}
df <- data.frame(id = anon_id(n = I, lenght = 16),
group = sample(1:g, I, T))
J <- 3
p <- c(0.25, 0.5, 0.25)
randomize <- function(data, urns=2, block_id = NULL, p=NULL, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(is.null(block_id)){
df1 <- data %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
return(df1)
}else{
df1 <- data %>% group_by_(block_id) %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
}
}
df1 <- randomize(data = df, urns = J, block_id = "group", p = p, seed = 9782)
If I change replace = T to replace = F I get the following error:
Error: cannot take a sample larger than the population when 'replace = FALSE'
Clarification of my objective:
Suppose that I have 10 classrooms (or villages, or something like that). To keep it simple, suppose each classroom has 20 students (in reality they will have N_j). Classroom per classroom, I want to assign each student to one of J groups, for example J=3. P says the fraction that will be assigned to each group. For example 25% to group 1 40% to group 2 and 35% to group 3.
This solution is based on #Frank's comment. I created one function that does the randomization for block j and another that calls that function for every block.
randomize_block <- function(data, block=NULL, block_name=NULL, urns, p, seed=9782) {
set.seed(seed)
if(!is.null(block)) {
condition <- paste0(block_name,"==",block)
df <- data %>% filter_(condition)
} else df <- data
if(is.null(p)) p <- rep(1/urns, urns)
N <- nrow(df)
Np <- round(N*p,0)
if(sum(Np)!=N) Np[1] <- N - sum(Np[2:length(Np)])
Urns = rep(seq_along(p), Np)
Urns = sample(Urns)
df$urn <- Urns
return(df)
}
randomize <- function(data, block_name=NULL, urns, p, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(!is.null(block_name)){
blocks <- unique(data[,block_name])
df <- lapply(blocks, randomize_block,
data = data,
block_name=block_name,
urns = urns,
p = p,
seed=seed)
return(data.table::rbindlist(df))
}else {
df <- randomize_block(data = data,
urns = urns, p = p,
seed=seed)
}
}
test <- randomize(data = df, block_name = "group",
urns = 3, p = c(0.25, 0.5, 0.25),
seed=4222016)
I'm trying to figure out if it is possible to use dplyr to do this, alternative solutions implementing that are more than welcome!
My answer to your other question is without replacement, as can be seen below:
block_rand <- as.tibble(randomizr::block_ra(blocks = df$group, conditions = c("urn_1","urn_2","urn_3")))
df2 <- as.tibble(bind_cols(df, block_rand))
df2 %>% janitor::tabyl(group, value)
df2 %>%
group_by(id) %>%
filter(n()>1) %>%
str()

Resources