For loops for nested variables within function in R - r

I would like to iterate through vectors of values and calculate something for every value while being within a function environment in R. For example:
# I have costs for 3 companies
c <- c(10, 20, 30)
# I have the same revenue across all 3
r <- 100
# I want to obtain the profits for all 3 within one variable
result <- list()
# I could do this in a for loop
for(i in 1:3){
result[i] <- r - c[i]
}
Now lets assume I have a model that is very long and I define everything as a function which is to be solved with various random draws for the costs.
# Random draws
n <- 1000
r <- rnorm(n, mean = 100, sd = 10)
c1 <- rnorm(n, mean = 10, sd = 1)
c2 <- rnorm(n, mean = 20, sd = 2)
c3 <- rnorm(n, mean = 30, sd = 3)
X <- data.frame(r, c1, c2, c3)
fun <- function(x){
r <- x[1]
c <- c(x[2], x[3], x[4])
for(i in 1:3){
result[i] <- r - c[i]
}
return(result)
}
I could then evaluate the result for all draws by iterating through the rows of randomly sampled input data.
for(j in 1:n){
x <- X[j,]
y <- fun(x)
}
In this example, the output variable y would entail the nested result variable which comprises of the results for all 3 companies. However, my line of thinking results in an error and I think it has to do with the fact that I try to return a nested variable? Hence my question how you guys would approach something like this.

I would suggest rethinking your coding approach. This is a very un-R-like way of doing things.
For example, the first for loop can be written much more succinctly as
x <- c(10, 20, 30)
r <- 100
result <- lapply(-x, `+`, r)
Then fun becomes something like
fun <- function(x) lapply(-x[-1], `+`, x[1])
To then operate over the rows of a data.frame (which is what you seem to do in the last step), you can use something like
apply(X, 1, fun)
where the MARGIN = 1 argument in apply ensures that you are applying a function per row (as opposed to per column).

Here's an approach using your function and a for loop:
# Random draws
n <- 1000
r <- rnorm(n, mean = 100, sd = 10)
c1 <- rnorm(n, mean = 10, sd = 1)
c2 <- rnorm(n, mean = 20, sd = 2)
c3 <- rnorm(n, mean = 30, sd = 3)
X <- data.frame(r, c1, c2, c3)
result <- list()
fun <- function(x){
r <- x[[1]]
c <- c(x[[2]], x[[3]], x[[4]])
for(i in 1:3){
result[i] <- r - c[i]
}
return(result)
}
# Create a list to store results
profits <- rep(rep(list(1:3)),nrow(X))
# Loop throuhg each row of dataframe and store in profits.
for(i in 1:nrow(X)){
profits_temp <-
fun(list(X[i,"r"],X[i,"c1"],X[i,"c2"],X[i,"c3"]))
for(j in 1:3)
profits[[i]][[j]] <- profits_temp[[j]]
}
# Eye results
profits[[1]]
#> [1] 93.23594 81.25731 70.27699
profits[[2]]
#> [1] 80.50516 69.27517 63.36439

Related

How to write a for loop to compute max of each column for a dataset in R?

A similar question in How to write a double for loop in r with choosing maximal element in one loop?.
The same setup:
If I want to sample theta[j] as first for j=1,2,...,71, then draw replicated( like 1000 times) yrep[k] form Bin(n[j], theta[j]), n[j] is known.
For theta[1], we have yrep[1,1], yrep[1,2], ..., yrep[1,1000]. Then for all theta[j], we will have a matrix of data set of yrep[i,j], i=1,...,71, j=1,..,1000.Then compute mean, max or min of each column yrep[1,1], yrep[1,2], yrep[1,3], ... yrep[1,71], we will get 1000 mean, max or min.
How to write this for loop?
I first try to write a loop to sample theta[j] and yrep. I do not know how to add a code to compute the maximal, mean, and minimal in this loop. I am not sure if this code is right:
theta<-NULL
yrep<-NULL
test<-NULL
k=1
for(i in 1:1000){
for(j in 1:71){
theta[j] <- rbeta(1,samp_A+y[j], samp_B+n[j]-y[j])
yrep[k]<-rbinom(1, n[j], theta[j])
k=k+1
}
t<-c(test, max(yrep))
}
Data is given in How to write a double for loop in r with choosing maximal element in one loop?:
#Data
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
2,1,5,2,5,3,2,7,7,3,3,2,9,10,4,4,4,4,4,4,4,10,4,4,4,5,11,12,
5,5,6,5,6,6,6,6,16,15,15,9,4)
n <-
c(20,20,20,20,20,20,20,19,19,19,19,18,18,17,20,20,20,20,19,19,18,18,25,24,
23,20,20,20,20,20,20,10,49,19,46,27,17,49,47,20,20,13,48,50,20,20,20,20,
20,20,20,48,19,19,19,22,46,49,20,20,23,19,22,20,20,20,52,46,47,24,14)
#Evaluate densities in grid
x <- seq(0.0001, 0.9999, length.out = 1000)
#Compute the marginal posterior of alpha and beta in hierarchical model Use grid
A <- seq(0.5, 15, length.out = 100)
B <- seq(0.3, 45, length.out = 100)
#Make vectors that contain all pairwise combinations of A and B
cA <- rep(A, each = length(B))
cB <- rep(B, length(A))
#Use logarithms for numerical accuracy!
lpfun <- function(a, b, y, n) log(a+b)*(-5/2) +
sum(lgamma(a+b)-lgamma(a)-lgamma(b)+lgamma(a+y)+lgamma(b+n-y)-
lgamma(a+b+n))
lp <- mapply(lpfun, cA, cB, MoreArgs = list(y, n))
#Subtract maximum value to avoid over/underflow in exponentiation
df_marg <- data.frame(x = cA, y = cB, p = exp(lp - max(lp)))
#Sample from the grid (with replacement)
nsamp <- 100
samp_indices <- sample(length(df_marg$p), size = nsamp,
replace = T, prob = df_marg$p/sum(df_marg$p))
samp_A <- cA[samp_indices[1:nsamp]]
samp_B <- cB[samp_indices[1:nsamp]]
df_psamp <- mapply(function(a, b, x) dbeta(x, a, b),
samp_A, samp_B, MoreArgs = list(x = x)) %>%
as.data.frame() %>% cbind(x) %>% gather(ind, p, -x)
This is not very well tested.
There is no need for loops to sample from distributions included in base R, those functions are vectorized on their arguments. Code following the lines below should be able to do what the question asks for.
Ni <- 1000
Nj <- 17
theta <- rbeta(Ni*Nj, rep(samp_A + y, each = Ni), rep(samp_B + n - y, each = Ni))
yrep <- rbinom(Ni*Nj, n, theta)
test1 <- matrix(yrep, nrow = Ni)
mins1 <- matrixStats::colMins(test1)

Loop returning more values than expected and NA's

I'm trying to simulate some data for sample size estimation and my loop is returning unexpected results.
I'm trying to sample a from a vector of generated values with varying numbers of sample sizes and then concatenate means and standard deviations for a number of simulations.
library(MCMCglmm)
library(tidyverse)
Est <- function(n, mean, sd, lower, upper, samp_min, samp_max, samp_int, nsim){
Data <- round(rtnorm(n, mean, sd, lower, upper), digits = 0) # Create a vector to sample from
Samp_size <- seq(samp_min, samp_max, samp_int) # Create vector of sample sizes
# Set up enpty results data frames
Results_samp <- data.frame()
Results <- data.frame()
for(i in 1:nsim){ ## Loop through number of simulations
for (j in seq_along(Samp_size)) { # Loop through sample sizes
Score <- sample(Data, j, replace = TRUE)
Nsubj <- Samp_size[j]
Mean <- mean(Score, na.rm = TRUE)
SD <- sd(Score, na.rm = TRUE)
Results_samp <- rbind(Results_samp,
data.frame(
Nsubj,
Mean,
SD))
}
Results <- rbind(Results, Results_samp)
}
Results
}
Test <- Est(n = 1000, mean = 55, sd = 37, lower = 0, upper = 100,
samp_min = 5, samp_max = 20, samp_int = 5, nsim = 5)
This creates a data frame with 60 rows, where I'm expecting 20 (5 simulations of 4 sample sizes) and I always get NA returned for the sample size of 5.
Can anyone see where I'm going wrong?
Thanks!
Generally, dynamically growing a data.frame with rbind is a very inefficient way of doing things in R. There are almost always better/faster ways of doing what you're trying to do.
That aside, in terms of answering your question, let's take a look at a simplified version of your nested for loop
x1 <- data.frame()
x2 <- data.frame()
for (i in 1:5) {
for (j in 1:4) x1 <- rbind(x1, data.frame(x1 = i, x2 = i^2))
x2 <- rbind(x2, x1)
}
See how x2 has 60 rows?
The reason for that is that you never reset x1. If we fix that
x1 <- data.frame()
x2 <- data.frame()
for (i in 1:5) {
for (j in 1:4) x1 <- rbind(x1, data.frame(x1 = i, x2 = i^2))
x2 <- rbind(x2, x1)
x1 <- data.frame()
}
we have nrow(x2) = 20, as expected.

How to create a loop to generate increasing sample sizes in a simulation

I'm trying to create a simulation to calculate the confidence interval for a binomial proportion. So far I have a function that calculates the lower and upper bounds and I have generated and stored the type of data I want (in a matrix, I'm not sure about that).
How can I create a loop that generates samples with different sizes. I'd like to test how the formula performs when calculating the intervals with sample sizes n=10, 11, 12,... up to 100.
My code so far:
## functions that calculate lower and upper bounds
ll <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) - z*sqrt(p.est*(1-p.est)/n))
}
ul <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) + z*sqrt(p.est*(1-p.est)/n))
}
## my simulation for n=10 and 200 repetitions.
p <- 0.4
n <- 10
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=rep)
ll.res <- apply(x, 2, ll)
ul.res <- apply(x, 2, ul)
hits <- ll.res <= p & p <= ul.res
sum(hits==1)/rep
I'm not sure which values do you want to compare between different sample sizes. But I guess wrapping your simulation in a for and using lists to store the results should work:
nrep=200
hits=list()
value=NULL
ll.res = list()
ul.res = list()
ns = c(10:100)
for(i in 1:length(ns)){
p <- 0.4
n <- ns[i]
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=nrep)
ll.res[[i]] <- apply(x, 2, ll)
ul.res[[i]] <- apply(x, 2, ul,cl=0.95)
hits[[i]] <- ll.res[[i]] <= p & p <= ul.res[[i]]
value[i] = sum(hits[[i]]==1)/rep
}

Plotting a histogram in R

I have to solve the following exercise.
(1) Create 100 Poisson distributed r.v.'s with lambda = 4
(2) Calculate the mean of the sample, generated in (1).
(3) Repeat (1) and (2) 10.000 times.
(4) create a vector, containing the 10.000 means.
(5) plot the vector in a histogram.
Is the following solution(?) right?
> as.numeric(x)
> for(i in 1:10000){
> p <- rpois(100, lambda = 4)
> m <- mean(p)
> append(x, m)
>}
> hist(x, breaks = 20)
It's a little funny. You can quickly do what you ask in more legible ways. For example:
L <- 10000
emptyvector <- rep(NA, L)
for(i in 1:L){
emptyvector[i] <- mean(rpois(100, lambda = 4))
}
hist(emptyvector)
I would have taken advantage of the replicate() function which would create a matrix of results and then run colMeans to quickly get my vector.
meanvector <- colMeans(replicate(10000, rpois(100, lambda = 4)))
hist(meanvector, main = "Mean values from 10,000 runs of \nPoisson n = 100")
hist(replicate(10000, mean(rpois(100, lambda = 4))))
you need to assign x again with the value.
x1 <- x <- NULL
for(i in 1:10000){
p <- rpois(100, lambda = 4)
m <- mean(p)
x[length(x) + 1] <- m
x1 <- append(x1, m)
## X or x1 vector will suffice for histogram
}
hist(x1, breaks = 20)

Obtain 1000 confidence interval from t.test

x <- c(1:100)
y <- c(89:300)
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
I would like to run this 1000 times and create a matrix with the 1000 intervals obtained. I have tried some loops but every time I am getting the same 1000 intervals. However, every time it should give me a different interval since I am sampling each time before performing the t.test.
You can do this with replicate:
x <- c(1:100)
y <- c(89:300)
myCI = function(x,y) {
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
}
CIs = t(replicate(1000, myCI(x,y)))

Resources