I have run a short simulation and want to plot the outcomes of each simulation in terms of the "running sum" over parameter k. For reference, I want to end up with a plot that looks similar to the ones in this article:
https://www.pinnacle.com/en/betting-articles/Betting-Strategy/betting-bankroll-management/VDM2GY6UX3B552BG
The following is the code for the simulation:
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10000
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
If I wasn't being clear in the description let me know.
Okay so here is how you can achieve the plot of the cumulative value over bets (I set nsims <- 10 so that the plot is readable).
First I generate the data :
## Simulating returns over k bets.
odds <- 1.5
k <- 100
return <- odds - 1
edge <- 0.04
pw <- 1/(odds/(1-edge))
pl <- 1-pw
nsims <- 10
set.seed(42)
sims <- replicate(nsims, {
x <- sample(c(-1,return), k, TRUE, prob=c(pl, pw))
})
rownames(sims) <- c(1:k)
colnames(sims) <- c(1:nsims)
Then I create a dataframe containing the results of the n simulations (10 here) :
df <- as.data.frame(sims)
What we want to plot is the cumulative sum, not the result at a specific bet so we iterate through the columns (i.e. the simulations) to have that value :
for (i in colnames(df)){
df[[i]] <- cumsum(df[[i]])
}
df <- mutate(df, bets = rownames(df))
output <- melt(df, id.vars = "bets", variable.name = 'simulation')
Now we can plot our data :
ggplot(output, aes(bets,value,group=simulation)) + geom_line(aes(colour = simulation))
Related
I'd like to test the coverage probabilities for trimmed means, I am using the formula form Wilcox book for confidence intervals:
Confidence interval
The s_w is Winsorised variance and γ is the proportion coefficient, in my code it's denoted as alpha. The problem is, that the code, I have made outputs confidence intervals with 0 always in them, so that the coverage probability is 1. So, I think there is some error in the construction.
Code:
sample_var <- function(data, alpha){
n <- length(data)
data <- sort(data)
data_t <- data[(floor(n*alpha)+1):(n-floor(alpha*n))]
m <- length(data_t)
t_mean <- mean(data_t)
sigma <- (1/(1-2*alpha)^2)* ((1/n) *sum((data_t-t_mean)^2)+ alpha*(data_t[1]-t_mean)^2 +
alpha*(data_t[m]-t_mean)^2)
sigma
}
sample_var <- Vectorize(sample_var, vectorize.args = "alpha")
conf_int <- function(data,alpha){
a <- floor(alpha * n)
n <- length(data)
df <- n-2*a-1
data_t <- data[a:(n-a)]
t_mean <- mean(data_t)
t_quantile <- qt(p = alpha, df = df)
sw <- sample_var(data = data, alpha = alpha)
ul <- t_mean + t_quantile * sw / ((1-2*alpha)*sqrt(n))
ll <- t_mean - t_quantile * sw / ((1-2*alpha)*sqrt(n))
c(ll, ul)
}
Maybe someone sees the error?
EDIT:
Here I tried to construct the intervals using wilcox.test function, but I don't know whether it accurately constructs the interval for the trimmed mean. Furthermore, no matter which alpha I use, for the given data set, I get the same interval. So, I suppose that the subset argument is wrong.
set_seed(1)
data <- rnorm(100)
wilcox_test <- function(data, alpha){
n <- length(alpha)
a <- floor(alpha*n)+1
b <- n-floor(alpha)
wilcox.test(data, subset = data[a:b], conf.int = TRUE)
}
OK...with rnorm(100) and set.seed(1)
Close-ish...
set.seed(1) # note set.seed() is what you want here, I think.
data <- rnorm(100)
wilcox_test_out <- wilcox.test(data, subset = data[a:b], conf.int = .95)
summary(wilcox_test_out)
# Note the CI's are in wilcox_test_out$conf.int for further use should you need them
wilcox_test_out$conf.int
I want to create a custom bootstrap function because I want to better understand what bootstrap is doing and it seems like the other bootstrap libraries out there does not solve my issue.
The Problem: I would like to create my own wald confidence interval function where it takes in the bootstrap data, outputs the confidence interval, test the confidence interval is within a range, and gets the coverage.
Right now, I am getting this type of error:
Error in bootresults[i,}<-waldCI(y=bootdata[i], n=numTrials):number of
items to replace is not a multiple of replacement length
The goal: My goal is to get the bootresults dataset to return 4 columns(p value,One that shows the upper bound, lower bound, and whether or not the p is in the interval) and get a graph similar to this one:
Wald interval chart
Code:
set.seed(42)
samples10 <- list()
i <- 1
while(i < 100) {
sample10[[i]] <- rbinom(1500, size=10, prob=i*.01) ## rows=1500 ;columns=10
i <- i + 1
}
sample10 <- data.frame(samples10)
colnames(sample10) <- c(seq(.01, .99, .01)) ## p-values
waldconfidenceinterval <- function(y, n, alpha=0.05) {
p <- colSums(y)/(n*200)
sd <- sqrt(p*((1 - p)/(n*200)))
z <- qnorm(c(alpha/2, 1 - alpha/2))
ci <- p + z*sd
return(ci)
}
B <- 200
numTrials <- 10
bootresults <- matrix(ncol=length(sample10), nrow=B) ## rows=200, cols=99
## empty matrix in the beginning
set.seed(42)
for(i in seq_len(B)) {
bootdata <- sample10[sample(B, replace=T), ]
bootresults[i, ] <- waldCI(y=bootdata[i], n=numTrials)
## Pseudocode:
# boot_test_data$in_interval <-
# ifelse(boot_test_data$lower1 < i/100 & i/100 < boot_test_data$upper1, 1, 0)
# coverage[i] <- sum(boot_test_data$in_interval) / length(boot_test_data$in_interval)
}
Any help is greatly appreciated since I am fairly new to R.
Looks like that you want to initialize a three-dimensional array bootresults rather than a two-dimensional matrix. In your waldCI() you may use colMeans.
waldCI <- function(y, alpha=0.05) {
p <- colMeans(y)
se <- sqrt(p*(1 - p)/nrow(y))
z <- qnorm(1 - alpha/2)
ci <- p + z*se %*% cbind(lower=-1, upper=1)
return(ci)
}
B <- 200
numTrials <- 10
## initialize array
bootresults1 <- array(dim=c(ncol(samples10), 4, B),
dimnames=list(c(), c("p.values", "lower", "upper", "in.int"), c()))
set.seed(42)
for(i in seq_len(B)) {
samp <- samples10[sample(nrow(samples10), numTrials, replace=F), ]
ci <- waldCI(samp)
bootresults1[,,i] <- cbind(p.values, ci, in.int=ci[, 1] < p.values & p.values < ci[, 2])
}
coverage <- rowMeans(bootresults[,4,])
plot(p.values, coverage, type="l", main="My Plot")
Similar approach, more R-ish, though:
p.values <- seq(.01, .99, .01)
set.seed(42)
samples10 <- `colnames<-`(sapply(p.values, function(pr) rbinom(1.5e3, 1, pr)), p.values)
BOOT <- function(numTrials, ...) {
samp <- samples10[sample(nrow(samples10), numTrials, replace=F), ]
ci <- waldCI(samp, ...)
cbind(p.values, ci, in.int=ci[, 1] < p.values & p.values < ci[, 2])
}
B <- 200
numTrials <- 10
set.seed(42)
bootresults2 <- replicate(B, BOOT(numTrials=10))
stopifnot(all.equal(bootresults1, bootresults2))
Data:
Note, that I used rbinom(..., size=1, ...) to create your sample data. The use of "p" as an object name suggested that the data should be binomial.
set.seed(42)
samples10 <- matrix(nrow=1500, ncol=99, dimnames=list(c(), c(seq(.01, .99, .01))))
i <- 1
while (i < 100) {
samples10[, i] <- rbinom(1500, size=1, prob=i*.01) ## rows=1500 ;columns=10
i <- i + 1
}
Without a while loop, you could proceed vectorized:
p.values <- seq(.01, .99, .01)
set.seed(42)
samples10 <- `colnames<-`(sapply(p.values, function(pr) rbinom(1.5e3, 1, pr)), p.values)
I'm trying to create a simulation to calculate the confidence interval for a binomial proportion. So far I have a function that calculates the lower and upper bounds and I have generated and stored the type of data I want (in a matrix, I'm not sure about that).
How can I create a loop that generates samples with different sizes. I'd like to test how the formula performs when calculating the intervals with sample sizes n=10, 11, 12,... up to 100.
My code so far:
## functions that calculate lower and upper bounds
ll <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) - z*sqrt(p.est*(1-p.est)/n))
}
ul <- function(x, cl=0.95) {
n <- length(x)
p.est <- mean(x)
z = abs(qnorm((1-cl)/2))
return((p.est) + z*sqrt(p.est*(1-p.est)/n))
}
## my simulation for n=10 and 200 repetitions.
p <- 0.4
n <- 10
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=rep)
ll.res <- apply(x, 2, ll)
ul.res <- apply(x, 2, ul)
hits <- ll.res <= p & p <= ul.res
sum(hits==1)/rep
I'm not sure which values do you want to compare between different sample sizes. But I guess wrapping your simulation in a for and using lists to store the results should work:
nrep=200
hits=list()
value=NULL
ll.res = list()
ul.res = list()
ns = c(10:100)
for(i in 1:length(ns)){
p <- 0.4
n <- ns[i]
rep <- 200
dat <- rbinom(rep*n,1,p)
x <- matrix(dat, ncol=nrep)
ll.res[[i]] <- apply(x, 2, ll)
ul.res[[i]] <- apply(x, 2, ul,cl=0.95)
hits[[i]] <- ll.res[[i]] <= p & p <= ul.res[[i]]
value[i] = sum(hits[[i]]==1)/rep
}
I have written a custom likelihood function that fits a multi-data model that integrates mark-recapture and telemetry data (sensu Royle et al. 2013 Methods in Ecology and Evolution). The likelihood function is designed to be flexible in terms of whether and how many covariates are specified for different linear models in different likelihood components which is determined by values supplied as function arguments (i.e., data matrices "detcovs" and "dencovs" in my code). The likelihood function works when I directly supply it to optimization functions (e.g., optim or nlm), but does not play nice with the mle2 function in the bbmle package. My problem is that I continually run into the following error: "some named arguments in 'start' are not arguments to the specified log-likelihood function". This is my first attempt at writing custom likelihood functions so I'm sure there are general coding conventions of which I'm unaware that make such tasks much more efficient and amendable to the mle2 function. Below is my likelihood function, code creating the staring value objects, and code calling the mle2 function. Any advice how to solve the error problem and general comments on writing cleaner functions is welcome. Many thanks in advance.
Edit: As requested, I have simplified the likelihood function and provided code to simulate reproducible data to which the model can be fit. Included in the simulation code are 2 custom functions and use of the raster function from the raster package. Hopefully, I have sufficiently simplified everything to enable others to troubleshoot. Again, many thanks for your help!
Jared
Likelihood function:
CSCR.RSF.intlik2.EXAMPLE <- function(alpha0,sigma,alphas=NULL,betas=NULL,n0,yscr=NULL,K=NULL,X=X,trapcovs=NULL,Gden=NULL,Gdet=NULL,ytel=NULL,stel=NULL,
dencovs=NULL,detcovs=NULL){
#
# this version of the code handles a covariate on log(Density). This is starting value 5
#
# start = vector of starting values
# yscr = nind x ntraps encounter matrix
# K = number of occasions
# X = trap locations
# Gden = matrix with grid cell coordinates for density raster
# Gdet = matrix with gride cell coordinates for RSF raster
# dencovs = all covariate values for all nGden pixels in density raster
# trapcovs = covariate value at trap locations
# detcovs = all covariate values for all nGrsf pixels in RSF raster
# ytel = nguys x nGdet matrix of telemetry fixes in each nGdet pixels
# stel = home range center of telemetered individuals, IF you wish to estimate it. Not necessary
# alphas = starting values for RSF/detfn coefficients excluding sigma and intercept
# alpha0 = starting values for RSF/detfn intercept
# sigma = starting value for RSF/detfn sigma
# betas = starting values for density function coefficients
# n0 = starting value for number of undetected individuals on log scale
#
n0 = exp(n0)
nGden = nrow(Gden)
D = e2dist(X,Gden)
nGdet <- nrow(Gdet)
alphas = alphas
loglam = alpha0 -(1/(2*sigma*sigma))*D*D + as.vector(trapcovs%*%alphas) # ztrap recycled over nG
psi = exp(as.vector(dencovs%*%betas))
psi = psi/sum(psi)
probcap = 1-exp(-exp(loglam))
#probcap = (exp(theta0)/(1+exp(theta0)))*exp(-theta1*D*D)
Pm = matrix(NA,nrow=nrow(probcap),ncol=ncol(probcap))
ymat = yscr
ymat = rbind(yscr,rep(0,ncol(yscr)))
lik.marg = rep(NA,nrow(ymat))
for(i in 1:nrow(ymat)){
Pm[1:length(Pm)] = (dbinom(rep(ymat[i,],nGden),rep(K,nGden),probcap[1:length(Pm)],log=TRUE))
lik.cond = exp(colSums(Pm))
lik.marg[i] = sum( lik.cond*psi )
}
nv = c(rep(1,length(lik.marg)-1),n0)
part1 = lgamma(nrow(yscr)+n0+1) - lgamma(n0+1)
part2 = sum(nv*log(lik.marg))
out = -1*(part1+ part2)
lam = t(exp(a0 - (1/(2*sigma*sigma))*t(D2)+ as.vector(detcovs%*%alphas)))# recycle zall over all ytel guys
# lam is now nGdet x nG!
denom = rowSums(lam)
probs = lam/denom # each column is the probs for a guy at column [j]
tel.loglik = -1*sum( ytel*log(probs) )
out = out + tel.loglik
out
}
Data simulation code:
library(raster)
library(bbmle)
e2dist <- function (x, y){
i <- sort(rep(1:nrow(y), nrow(x)))
dvec <- sqrt((x[, 1] - y[i, 1])^2 + (x[, 2] - y[i, 2])^2)
matrix(dvec, nrow = nrow(x), ncol = nrow(y), byrow = F)
}
spcov <- function(R) {
v <- sqrt(nrow(R))
D <- as.matrix(dist(R))
V <- exp(-D/2)
cov1 <- t(chol(V)) %*% rnorm(nrow(R))
Rd <- as.data.frame(R)
colnames(Rd) <- c("x", "y")
Rd$C <- as.numeric((cov1 - mean(cov1)) / sd(cov1))
return(Rd)
}
set.seed(1234)
co <- seq(0.3, 0.7, length=5)
X <- cbind(rep(co, each=5),
rep(co, times=5))
B <- 10
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
dencovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(dencovs)[[2]][3:4] <- c("dencov1","dencov2")
denr.list <- vector("list",2)
for(i in 1:2){
denr.list[[i]] <- raster(
list(x=seq(0,1,length=10),
y=seq(0,1,length=10),
z=t(matrix(dencovs[,i+2],10,10,byrow=TRUE)))
)
}
B <- 20
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
detcovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(detcovs)[[2]][3:4] <- c("detcov1","detcov2")
detcov.raster.list <- vector("list",2)
trapcovs <- matrix(0,J,2)
for(i in 1:2){
detr.list[[i]] <- raster(
list(x=seq(0,1,length=20),
y=seq(0,1,length=20),
z=t(matrix(detcovs[,i+2],20,20,byrow=TRUE)))
)
trapcovs[,i] <- extract(detr.list[[i]],X)
}
alpha0 <- -3
sigma <- 0.15
alphas <- c(1,-1)
beta0 <- 3
betas <- c(-1,1)
pixelArea <- (dencovs$y[2] - dencovs$y[1])^2
mu <- exp(beta0 + as.matrix(dencovs[,3:4])%*%betas)*pixelArea
EN <- sum(mu)
N <- rpois(1, EN)
pi <- mu/sum(mu)
s <- dencovs[sample(1:nrow(dencovs), size=N, replace=TRUE, prob=pi),1:2]
J <- nrow(X)
K <- 10
yc <- d <- p <- matrix(NA, N, J)
D <- e2dist(s,X)
loglam <- t(alpha0 - t((1/(2*sigma*sigma))*D*D) + as.vector(trapcovs%*%alphas))
p <- 1-exp(-exp(loglam))
for(i in 1:N) {
for(j in 1:J) {
yc[i,j] <- rbinom(1, K, p[i,j])
}
}
detected <- apply(yc>0, 1, any)
yscr <- yc[detected,]
ntel <- 5
nfixes <- 100
poss.tel <- which(s[,1]>0.2 & s[,1]<0.8 & s[,2]>0.2 & s[,2]<0.8)
stel.id <- sample(poss.tel,ntel)
stel <- s[stel.id,]
ytel <- matrix(NA,ntel,nrow(detcovs))
d <- e2dist(stel,detcovs[,1:2])
lam <- t(exp(1 - t((1/(2*sigma*sigma))*d*d) + as.vector(as.matrix(detcovs[,3:4])%*%alphas)))
for(i in 1:ntel){
ytel[i,] <- rmultinom(1,nfixes,lam[i,]/sum(lam[i,]))
}
Specify starting values and call mle2 function:
start1 <- list(alpha0=alpha0,sigma=sigma,alphas=alphas,betas=betas,n0=log(N-nrow(yscr)))
parnames(CSCR.RSF.intlik2.EXAMPLE) <- names(start)
out1 <- mle2(CSCR.RSF.intlik2.EXAMPLE,start=start1,method="SANN",optimizer="optim",
data=list(yscr=yscr,K=K,X=X,trapcovs=trapcovs,Gden=dencovs[,1:2],Gdet=detcovs[,1:2],
ytel=ytel,stel=stel,dencovs=as.matrix(dencovs[,3:4]),detcovs=as.matrix(detcovs[,3:4]))
)
x <- c(1:100)
y <- c(89:300)
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
I would like to run this 1000 times and create a matrix with the 1000 intervals obtained. I have tried some loops but every time I am getting the same 1000 intervals. However, every time it should give me a different interval since I am sampling each time before performing the t.test.
You can do this with replicate:
x <- c(1:100)
y <- c(89:300)
myCI = function(x,y) {
s1 <- sample(x, 30)
s2 <- sample(y, 30)
mytest <- t.test(s1, s2)
mytest$conf.int
}
CIs = t(replicate(1000, myCI(x,y)))