x <- c(1:100)
y <- c(89:300)
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
I would like to run this 1000 times and create a matrix with the 1000 intervals obtained. I have tried some loops but every time I am getting the same 1000 intervals. However, every time it should give me a different interval since I am sampling each time before performing the t.test.
You can do this with replicate:
x <- c(1:100)
y <- c(89:300)
myCI = function(x,y) {
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
}
CIs = t(replicate(1000, myCI(x,y)))
Related
I have run a short simulation and want to plot the outcomes of each simulation in terms of the "running sum" over parameter k. For reference, I want to end up with a plot that looks similar to the ones in this article:
https://www.pinnacle.com/en/betting-articles/Betting-Strategy/betting-bankroll-management/VDM2GY6UX3B552BG
The following is the code for the simulation:
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10000
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
If I wasn't being clear in the description let me know.
Okay so here is how you can achieve the plot of the cumulative value over bets (I set nsims <- 10 so that the plot is readable).
First I generate the data :
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
Then I create a dataframe containing the results of the n simulations (10 here) :
df <- as.data.frame(sims)
What we want to plot is the cumulative sum, not the result at a specific bet so we iterate through the columns (i.e. the simulations) to have that value :
for (i in colnames(df)){
df[[i]] <- cumsum(df[[i]])
}
df <- mutate(df, bets = rownames(df))
output <- melt(df, id.vars = "bets", variable.name = 'simulation')
Now we can plot our data :
ggplot(output, aes(bets,value,group=simulation)) + geom_line(aes(colour = simulation))
I would like to iterate through vectors of values and calculate something for every value while being within a function environment in R. For example:
# I have costs for 3 companies
c <- c(10, 20, 30)
# I have the same revenue across all 3
r <- 100
# I want to obtain the profits for all 3 within one variable
result <- list()
# I could do this in a for loop
for(i in 1:3){
result[i] <- r - c[i]
}
Now lets assume I have a model that is very long and I define everything as a function which is to be solved with various random draws for the costs.
# Random draws
n <- 1000
r <- rnorm(n, mean = 100, sd = 10)
c1 <- rnorm(n, mean = 10, sd = 1)
c2 <- rnorm(n, mean = 20, sd = 2)
c3 <- rnorm(n, mean = 30, sd = 3)
X <- data.frame(r, c1, c2, c3)
fun <- function(x){
r <- x[1]
c <- c(x[2], x[3], x[4])
for(i in 1:3){
result[i] <- r - c[i]
}
return(result)
}
I could then evaluate the result for all draws by iterating through the rows of randomly sampled input data.
for(j in 1:n){
x <- X[j,]
y <- fun(x)
}
In this example, the output variable y would entail the nested result variable which comprises of the results for all 3 companies. However, my line of thinking results in an error and I think it has to do with the fact that I try to return a nested variable? Hence my question how you guys would approach something like this.
I would suggest rethinking your coding approach. This is a very un-R-like way of doing things.
For example, the first for loop can be written much more succinctly as
x <- c(10, 20, 30)
r <- 100
result <- lapply(-x, `+`, r)
Then fun becomes something like
fun <- function(x) lapply(-x[-1], `+`, x[1])
To then operate over the rows of a data.frame (which is what you seem to do in the last step), you can use something like
apply(X, 1, fun)
where the MARGIN = 1 argument in apply ensures that you are applying a function per row (as opposed to per column).
Here's an approach using your function and a for loop:
# Random draws
n <- 1000
r <- rnorm(n, mean = 100, sd = 10)
c1 <- rnorm(n, mean = 10, sd = 1)
c2 <- rnorm(n, mean = 20, sd = 2)
c3 <- rnorm(n, mean = 30, sd = 3)
X <- data.frame(r, c1, c2, c3)
result <- list()
fun <- function(x){
r <- x[[1]]
c <- c(x[[2]], x[[3]], x[[4]])
for(i in 1:3){
result[i] <- r - c[i]
}
return(result)
}
# Create a list to store results
profits <- rep(rep(list(1:3)),nrow(X))
# Loop throuhg each row of dataframe and store in profits.
for(i in 1:nrow(X)){
profits_temp <-
fun(list(X[i,"r"],X[i,"c1"],X[i,"c2"],X[i,"c3"]))
for(j in 1:3)
profits[[i]][[j]] <- profits_temp[[j]]
}
# Eye results
profits[[1]]
#> [1] 93.23594 81.25731 70.27699
profits[[2]]
#> [1] 80.50516 69.27517 63.36439
I'm trying to create a simulation to calculate the confidence interval for a binomial proportion. So far I have a function that calculates the lower and upper bounds and I have generated and stored the type of data I want (in a matrix, I'm not sure about that).
How can I create a loop that generates samples with different sizes. I'd like to test how the formula performs when calculating the intervals with sample sizes n=10, 11, 12,... up to 100.
My code so far:
## functions that calculate lower and upper bounds
ll <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) - z*sqrt(p.est*(1-p.est)/n))
}
ul <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) + z*sqrt(p.est*(1-p.est)/n))
}
## my simulation for n=10 and 200 repetitions.
p <- 0.4
n <- 10
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=rep)
ll.res <- apply(x, 2, ll)
ul.res <- apply(x, 2, ul)
hits <- ll.res <= p & p <= ul.res
sum(hits==1)/rep
I'm not sure which values do you want to compare between different sample sizes. But I guess wrapping your simulation in a for and using lists to store the results should work:
nrep=200
hits=list()
value=NULL
ll.res = list()
ul.res = list()
ns = c(10:100)
for(i in 1:length(ns)){
p <- 0.4
n <- ns[i]
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=nrep)
ll.res[[i]] <- apply(x, 2, ll)
ul.res[[i]] <- apply(x, 2, ul,cl=0.95)
hits[[i]] <- ll.res[[i]] <= p & p <= ul.res[[i]]
value[i] = sum(hits[[i]]==1)/rep
}
I have to solve the following exercise.
(1) Create 100 Poisson distributed r.v.'s with lambda = 4
(2) Calculate the mean of the sample, generated in (1).
(3) Repeat (1) and (2) 10.000 times.
(4) create a vector, containing the 10.000 means.
(5) plot the vector in a histogram.
Is the following solution(?) right?
> as.numeric(x)
> for(i in 1:10000){
> p <- rpois(100, lambda = 4)
> m <- mean(p)
> append(x, m)
>}
> hist(x, breaks = 20)
It's a little funny. You can quickly do what you ask in more legible ways. For example:
L <- 10000
emptyvector <- rep(NA, L)
for(i in 1:L){
emptyvector[i] <- mean(rpois(100, lambda = 4))
}
hist(emptyvector)
I would have taken advantage of the replicate() function which would create a matrix of results and then run colMeans to quickly get my vector.
meanvector <- colMeans(replicate(10000, rpois(100, lambda = 4)))
hist(meanvector, main = "Mean values from 10,000 runs of \nPoisson n = 100")
hist(replicate(10000, mean(rpois(100, lambda = 4))))
you need to assign x again with the value.
x1 <- x <- NULL
for(i in 1:10000){
p <- rpois(100, lambda = 4)
m <- mean(p)
x[length(x) + 1] <- m
x1 <- append(x1, m)
## X or x1 vector will suffice for histogram
}
hist(x1, breaks = 20)
How could I avoid for loop calculating xts weighted sum as I am trying to do below:
library(xts)
thetaSum <- function(theta, w=c(1, 1, 1)) {
sum(coredata(theta)*rev(w))
}
n <- 10
tmpVec <- rep(1, n)
tmpDates <- seq(as.Date("2000-01-01"), length = n, by = "day")
theta <- xts(tmpVec, order.by=tmpDates)
N <- 3
thetaSummed <- xts(rep(NA, n), order.by=tmpDates)
for (i in N:n) {
thetaTemp <- theta[(i-N+1):i, ]
thetaSummed[i] <- thetaSum(thetaTemp, w=rep(1, N))
}
thetaSummed
N is a look back period smaller than n.
What are some fast alternatives for for loop?
You can use rollapply.
rollapplyr(theta, width=3, FUN=thetaSum, fill=NA)