I need to simulate an AR(2) process Y[t]=1/20+(Sqrt(3)/2)Y[t-1]-(1/4)Y[t-2]+e[t]
e[t]~(0,0.02^2)
Simulation has to be over 30 years where the model is measured in quarters.
I've tried with x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(a1, a2)), n = 120, n.start = 100, sd = 0.02)
Using the above, R says the model isn't stationary.
Where a1 and a2 are equal to phi 1 and phi 2 in the model, but I can't figure out how to add Phi 0, or how to set values for y0=0.1 and y-1 = 0.12 which is required.
I've also tried the following
set.seed(9029) # set a seed to fix the simulated numbers
nsim = 1 # no. of simulations
burn = 100 # burn-in periods
n = 220 # sample length + burn-in periods --> sample length = 4quarters*30yrs
tp=(burn+1):n # time points to be sampled
sigerr = 0.02 # error s.d.
a1 = (sqrt(3)/2) # AR(2) coefficient
a2 = 0.25 # AR(2) coefficient = 1/4
a0 = 1/20 # Phi 0
# create data series and error series
y = array(0,c(n,nsim)) # data series
err = array(rnorm(n*nsim,0,sigerr),c(n,nsim)) # iid errors
# simulate y from an AR(2) process
for (k in 1:nsim) {
for (i in 2:n) {
y[i,k] = a0 + a1*y[i-1,k] + a2*y[i-2,k] + err[i,k]
}
}
But keep getting replacement has length zero as an error, and also I still can't find out how to add values for y0 and y-1 equal to 0.1 and 0.12 respectively. Please help I can't seem to find a fix. Thanks.
Related
I need to do a linear mixed model simulation to get power for varying sample sizes.
My model is:
Ratings = y
Fixed effect, x = Ring
Random effect = participants
The code I tried is below. It only returns 'Based on 100 simulations, (0 warnings, 100 errors)
alpha = 0.05, nrow = 2000' ....
Thank you!!
#create a dataframe
library(lmerTest)
library(simr)
library(tidyverse)
Ring = c('Ring', 'NoRing')
#from 1 to 10 (11 is not included).
Ring = rep(Ring, times = 1000)
attractiveness = floor(runif(10, min=1, max=11)) #this creates random numbers
#from 1 to 10 (11 is not included).
participants<-rep(factor(1:100),each=20)
targetID = rep(c(1,2,3,4,5,6,7,8,9,10), each= 2)
targetImage= rep(targetID, times= 100)
Ratings = rep(attractiveness, times = 200)
data<-data.frame(participants, Ring, targetImage, Ratings)
#parameters for the model:
## Intercept and slopes for ring
fixed <- c(3, 0.5)
## Random intercepts for participants
rand <- 0.5
## residual variance
res <- 2
model <- makeLmer(Ratings ~ Ring + (1|participants), fixef=fixed,
VarCorr=rand, sigma=res, data=data)
sim_treat <- powerSim(model, nsim=100, test = fcompare(Ratings~Ring))
sim_treat
I'm trying to run GMM estimation in R, with gmm package.
I find that estimates are very sensitive to option t0, which is starting value.
However, as far as I know, there is no starting value in STATA or SAS to estimates GMM. If the estimates are sensitive to starting value, How should I set t0 to get the same results from commercial programs?
Codes and data included
To be clear, I include my codes and data. I'm trying to estimate lambda. My question is, the estimate of lambda varies too much across starting value (you may set starting value of lambda as 0 and then compare the result). Hence, I'm not sure which estimate should I rely on.
instrument.csv
test_assets.csv
library(tidyverse)
library(gmm)
instrument <- read_csv("instrument.csv")
test_assets <- read_csv("test_assets.csv")
# Define Moment Conditions
g <- function(params, y){
# Number of instrument variables: 6
l <- dim(instrument)[2]
# Moment condition 1: 6 x 1
u_jt <- y - instrument %*% params[1:l]
mom_cond1 <- c(u_jt) * instrument
# Moment condition 2: 6 x 1
u_mt <- test_assets$vwretd - instrument %*% params[(l+1):(l+l)]
mom_cond2 <- c(u_mt) * instrument
# Moment condition 3: 6 x 1
e_jt <- y - params[l+l+1] * u_jt * u_mt
mom_cond3 <- c(e_jt) * instrument
# Moment condition: 18 x 1
mom_cond <- list(mom_cond1, mom_cond2, mom_cond3) %>%
reduce(cbind)
return(mom_cond)
}
# Define starting values.
t0 <- c(gamma1 = -0.01, gamma2 = 0.1, gamma3 = 0.05,
gamma4 = 5, gamma5 = 20, gamma6 = 5,
gamma7 = -0.01, gamma8 = 0.01, gamma9 = 0.01,
gamma10 = 8, gamma11 = 13, gamma12 = 3,
lambda = 10
)
y <- test_assets$decret1
gmm_results <- gmm(g, y, t0)
summary(gmm_results)
My task is to simulate a compound Poisson process defined as:
where
is a Poisson process and Y_i are Gamma(shape,scale) distributed. This is my R code:
# parameter for Poisson distribution.
lambda = 1
# parameters for Gamma distribution.
shape = 7.5
scale = 1
comp.pois = function(t.max, lambda) {
stopifnot(t.max >= 0 && t.max %% 1 == 0)
# offset ns by 1 because first y is 0.
# generate N(t), that is number of arrivals until time t.
ns = cumsum(rpois(n = t.max, lambda = lambda)) + 1
# generate gamma distributed random variables Y_i.
ys = c(0, rgamma(n = max(ns), shape = shape, scale = scale))
# generate all X(t) for t <= t.max.
return(c(0, cumsum(x = ys[ns])))
}
Compute a random sample of X(10) and compare means and variances.
# sample size.
size = 1000
t = 10
# ts is a vector of sample values for X(10).
ts = sapply(1:size, function(i) comp.pois(t, lambda)[t])
# sample mean and variance:
(mean.s = mean(ts))
(var.s = var(ts))
# theoretical mean and variance:
(mean.t = lambda * t * shape * scale)
(var.t = (shape + 1) * shape * scale^2)
output:
> # sample:
> (mean.s = mean(ts))
[1] 63.38403
> (var.s = var(ts))
[1] 184.3264
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 75
> (var.t = (shape + 1) * shape * scale^2)
[1] 63.75
This variance is gigantic, but I cannot spot my mistake. Please help. Thank you.
EDIT:
I used the following algorithm to generate the N(t). I don't know why it is supposed to be better. I took it from Rizzo, Maria L. Statistical computing with R. CRC Press, 2007. The mean is good, but the variance is even worse. I tried sampling from the Gamma distribution only once for the entire simulation (although I'm pretty sure this does not reflect the problem very well) and the mean was off by around 10-40 for t = 10. When resampling for every X(t) (which is what the following code does), the mean is very exact. As pointed out, the variance is horrifying. This is probably not a good solution, but I suppose it is as good as it gets.
lambda = 3
shape = 6
scale = 2
size = 10000
eps = 1e-8
t = 10
# with probability 1-eps, n or less gamma distributed random variables are needed.
n = qpois(1-eps, lambda = lambda * t)
# sample from the gamma distribution. Not sure if it's ok to use the same sample every time.
# with this, the mean is of by about 10%.
# ys = c(rgamma(n = n, shape = shape, scale = scale))
# the interarrival times are exponentially distributed with rate lambda.
pp.exp = function (t0) {
# not sure how many Tn are needed :/
Tn = rexp(1000, lambda)
Sn = cumsum(Tn)
return(min(which(Sn > t0)) - 1)
}
# generate N(t) which follow the poisson process.
ns = sapply(1:size, function (i) pp.exp(t))
# generate X(t) as in the problem description.
xs = sapply(ns, function (n) {
ys = c(rgamma(n = n, shape = shape, scale = scale))
sum(ys[1:n])
})
output (t=10) in this case:
> # compare mean and variance of 'size' samples of X(t) for verification.
> # sample:
> (mean.s = mean(xs))
[1] 359.864
> (var.s = var(xs))
[1] 4933.277
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 360
> (var.t = (shape + 1) * shape * scale^2)
[1] 168
Here is the problem: Five observations on Y are to be taken when X = 4, 8, 12, 16, 20, respectively. The true regression function is E(y) = 20 + 4X, and the ei are independent N(O, 25).
Generate five normal random numbers, with mean 0 and variance 25. Consider these random numbers as the error terms for the five Y observations at X = 4,8, 12, 16, 20 and calculate Y1, Y2 , Y3 , Y4 , and Y5. Obtain the least squares estimates bo and b1, when fitting a straight line to the five cases. Also calculate Yh when Xh = 10 and obtain a 95 percent confidence interval for E(Yh) when Xh = 10. I did part 1, but I need help to repeat it for 200 times.
Repeat part (1) 200 times, generating new random numbers each time.
Make a frequency distribution of the 200 estimates b1. Calculate the mean and standard deviation of the 200 estimates b1. Are the results consistent with theoretical expectations?
What proportion of the 200 confidence intervals for E(Yh) when Xh = 10 include E(Yh)? Is this result consistent with theoretical expectations?
Here's my code so far, I am stumped on how to repeat part 1 for 200 times:
X <- matrix(c(4, 8, 12, 16, 20), nrow = 5, ncol = 1)
e <- matrix(c(rnorm(5,0,sqrt(5))), nrow = 5, ncol = 1)
Y <- 20 + 4 * X + e
mydata <- data.frame(cbind(Y=Y, X=X, e=e))
names(mydata) <- c("Y","X","e")
reg<-lm(Y ~ X, data = mydata)
predict(reg, newdata = data.frame(X=10), interval="confidence")
There is mistake in your code. You want independent N(O, 25) errors, but you passed sqrt(5) as standard error to rnorm(). It should be 5.
We first wrap up your code into a function. This function takes no input, but run experiment once, and returns regression coefficients b0, b1 and prediction fit, lwr, upr in a named vector.
sim <- function () {
x <- c(4, 8, 12, 16, 20)
y <- 20 + 4 * x + rnorm(5,0,5)
fit <- lm(y ~ x)
pred <- predict(fit, data.frame(x = 10), interval = "confidence")
pred <- setNames(c(pred), dimnames(pred)[[2]])
## return simulation result
c(coef(fit), pred)
}
For example, let's try
set.seed(2016)
sim()
#(Intercept) x fit lwr upr
# 24.222348 3.442742 58.649773 47.522309 69.777236
Now we use replicate to repeat such experiment 200 times.
set.seed(0)
z <- t(replicate(200, sim()))
head(z)
# (Intercept) x fit lwr upr
#[1,] 24.100535 3.987755 63.97808 57.61262 70.34354
#[2,] 6.417639 5.101501 57.43265 52.44263 62.42267
#[3,] 20.652355 3.797991 58.63227 52.74861 64.51593
#[4,] 20.349829 3.816426 58.51409 52.59115 64.43702
#[5,] 19.891873 4.095140 60.84327 57.49911 64.18742
#[6,] 24.586749 3.589483 60.48158 53.64574 67.31743
There will be 200 rows, for results of 200 simulations.
The second column contains estimation for b1 under 200 simulations, we compute their mean and standard error:
mean(z[,2])
# [1] 3.976249
sd(z[,2])
# [1] 0.4263377
We know that the true value is 4, and it is evident that our estimate is consistent with true values.
Finally, let's check with 95% confidence interval for prediction at X = 10. The true value is 20 + 4 * 10 = 60, so the proportion of confidence interval that covers this true vale is
mean(z[, "lwr"] < 60 & z[, "upr"] > 60)
## 0.95
which is exactly 0.95.
I am now trying to estimate the sample size needed for A/B testing of website conversion rate. pwr.chisq.test always gives me error message, when I have small value of conversion rate:
# conversion rate for two groups
p1 = 0.001
p2 = 0.0011
# degree of freedom
df = 1
# effect size
w = ES.w1(p1,p2)
pwr.chisq.test(w,
df = 1,
power=0.8,
sig.level=0.05)
**Error in uniroot(function(N) eval(p.body) - power, c(1 + 1e-10, 1e+05)) :
f() values at end points not of opposite sign**
However, if I have larger value for p1 and p2, this code works fine.
# conversion rate for two groups
p1 = 0.01
p2 = 0.011
# degree of freedom
df = 1
# effect size
w = ES.w1(p1,p2)
pwr.chisq.test(w,
df = 1,
power=0.8,
sig.level=0.05)
Chi squared power calculation
w = 0.01
N = 78488.61
df = 1 sig.level = 0.05
power = 0.8
NOTE: N is the number of observations
I think there is a "numerical" explanation to that. If you take a look at the function's code, you can see that the number of samples is computed by uniroot and is supposed to belong to an interval whose boundaries are set to 1e-10 and 1e5. The error message states that this interval does not give you the result: in your case, the upper limit is too small.
Knowing that, we can simply take a wider interval:
w <- 0.00316227766016838
k <- qchisq(0.05, df = 1, lower = FALSE)
p.body <- quote(pchisq(k, df = 1, ncp = N * w^2, lower = FALSE))
N <- uniroot(function(N) eval(p.body) - 0.8, c(1 + 1e-10, 1e+7))$root
The "solution" is N=784886.1... that's a huge number of observations.