I need help, I am doing t test using monte carlo metod but the results shows
Error in t.test(x, alternative = "greater", mu = 500) : unused
arguments (alternative = "greater", mu = 500).
Below is my code
#Compute an empirical probability of type I using Monte Carlo Method
# m is number of replicates
n <- 20
alpha <- .05
mu0 <-500
sigma <- 100
m <- 10000
p <- numeric(m)
for (j in 1:m){
x <- rnorm(n, mu0, sigma)
ttest <- t.test(x, alternative="greater", mu=500)
p[j] <-ttest$p.value
}
p.hat <- mean(p < alpha)
se.hat <-sqrt(p.hat*(1-p.hat)/m)
print(c(p.hat, se.hat))
Related
I need tips of how to calculate the power function of the Chi Square Goodness of Fit test using Monte Carlo Simulations in R. I am familiar with the pwr.chisq function but i need a way to use R to write the code for the Monte Carlo simulation.
I can do it for the T-test as follows:
n <- 100
mean_true <- 17
sd_true <- 2
## Null-Hypothesis (H0: mean_true = mean_0):
mean_0 <- seq(16, 18, len=15)
alpha <- 0.05
B <- 1000
Empirical_Power <- rep(NA, length(mean_0))
for(j in 1:length(Empirical_Power)){
Test_Decisions <- rep(NA, B)
for(i in 1:B){
dat_X <- rnorm(n=n, mean=mean_true, sd = sd_true)
t.Test_result <- t.test(x = dat_X, alternative = "two.sided", mu = mean_0[j])
Test_Decisions[i] <- t.Test_result$p.value < alpha
}
Number_of_Rejections <- length(Test_Decisions[Test_Decisions==TRUE])
Empirical_Power[j] <- Number_of_Rejections/B
}
I need a similar way for Chi-Square and it doesn't seem to work. That's how far i got but clearly it's wrong because no sense can be made from the results:
n <- 100
Frequency_true <- c(50,60,40,47,53)
sd_true <- 2
Frequency_0 <- c(0.2,0.2,0.2,0.2,0.2)
alpha <- 0.05
B <- 1000
Empirical_Power <- rep(NA, length(Frequency_0))
for(j in 1:length(Empirical_Power)){
Test_Decisions <- rep(NA, B)
for(i in 1:B){
dat_X <- rchisq(100000, df=99)
Chisq_result <- chisq.test(x = Frequency_true, p= Frequency_0)
Test_Decisions[i] <- Chisq_result$p.value < alpha
}
Number_of_Rejections <- length(Test_Decisions[Test_Decisions==TRUE])
Empirical_Power[j] <- Number_of_Rejections/B
}
The experiment is to draw 250 balls from a urn with 5 different types of balls where all types of balls are equi-probably drawn. The counts of types of balls drawn are given in vector Frequency_true, that follows a multinomial distribution.
So, in order to have the simulated power of the test, simulate B draws with a fixed total count of 250, using rmultinom, run chi-squared tests of Goodness-of-Fit and compute the proportion of p-values below the significance level alpha.
sim_p_value <- function(B, freq, prob){
Sum <- sum(freq)
x <- rmultinom(B, size = Sum, prob = prob)
apply(x, 2, \(y) chisq.test(y, p = prob)$p.value)
}
Frequency_true <- c(50,60,40,47,53)
Frequency_0 <- c(0.2,0.2,0.2,0.2,0.2)
alpha <- 0.05
B <- 1000
set.seed(2022)
pval <- sim_p_value(B, Frequency_true, Frequency_0)
Empirical_Power <- mean(pval < alpha)
Empirical_Power
#> [1] 0.16
Created on 2022-07-09 by the reprex package (v2.0.1)
Write a simulation experiment to estimate the bias of the estimator λˆ= 1/ X¯ by sampling
using x=rexp(n,rate=5) and recording the values of 1/mean(x). You should find that the
bias is λ/n−1. Here we’ve used λ = 5 but the result will hold for any λ.
Here is my solution ( I dont get λ/n−1). Am I doing something wrong here?
set.seed(1)
lambda <- 5
x <- rexp(n= 1e5, rate = lambda )
samp.mean <- mean(x)
lam.est <- 1/samp.mean
lam.est ##4.986549
bias <- abs(lambda - lam.est)
bias ##0.01345146
To start with, there is a mistake in your formula. The bias of the lambda estimator is not lambda/n-1 but lambda/(n-1)!
Then note that in order to carry out this experiment correctly, it is not enough to calculate the estimated estimator once.
Do the experiment "n" times on the vector of size "nx".
lambda = 3
nx = 150
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
As you can see for lambda = 3 and nx = 150 the expression lambda/(nx-1) is 0.02013423. And your estimated lambda is 3.019485.
lambda = 5
nx = 200
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
However, for lambda = 5 and nx = 200, the expression lambda/(nx-1) is 0.02512563. And your estimated lambda is 5.024315.
Perform this experiment for other values of lambda and nx and you will find that the bias of this estimator is lambda/(n-1).
I am trying to implement a linear regression in R from scratch without using any packages or libraries using the following data:
UCI Machine Learning Repository, Bike-Sharing-Dataset
The linear regression was easy enough, here is the code:
data <- read.csv("Bike-Sharing-Dataset/hour.csv")
# Select the useable features
data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]
# Split the data
trainingObs<-sample(nrow(data1),0.70*nrow(data1),replace=FALSE)
# Create the training dataset
trainingDS<-data1[trainingObs,]
# Create the test dataset
testDS<-data1[-trainingObs,]
x0 <- rep(1, nrow(trainingDS)) # column of 1's
x1 <- trainingDS[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed")]
# create the x- matrix of explanatory variables
x <- as.matrix(cbind(x0,x1))
# create the y-matrix of dependent variables
y <- as.matrix(trainingDS$cnt)
m <- nrow(y)
solve(t(x)%*%x)%*%t(x)%*%y
The next step is to implement the batch update gradient descent and here is where I am running into problems. I dont know where the errors are coming from or how to fix them, but the code works. The problem is that the values being produced are radically different from the results of the regression and I am unsure of why.
The two versions of the batch update gradient descent that I have implemented are as follows (the results of both algorithms differ from one another and from the results of the regression):
# Gradient descent 1
gradientDesc <- function(x, y, learn_rate, conv_threshold, n, max_iter) {
plot(x, y, col = "blue", pch = 20)
m <- runif(1, 0, 1)
c <- runif(1, 0, 1)
yhat <- m * x + c
MSE <- sum((y - yhat) ^ 2) / n
converged = F
iterations = 0
while(converged == F) {
## Implement the gradient descent algorithm
m_new <- m - learn_rate * ((1 / n) * (sum((yhat - y) * x)))
c_new <- c - learn_rate * ((1 / n) * (sum(yhat - y)))
m <- m_new
c <- c_new
yhat <- m * x + c
MSE_new <- sum((y - yhat) ^ 2) / n
if(MSE - MSE_new <= conv_threshold) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
}
iterations = iterations + 1
if(iterations > max_iter) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
}
}
return(paste("MSE=", MSE))
}
AND:
grad <- function(x, y, theta) { # note that for readability, I redefined theta as a column vector
gradient <- 1/m* t(x) %*% (x %*% theta - y)
return(gradient)
}
grad.descent <- function(x, maxit, alpha){
theta <- matrix(rep(0, length=ncol(x)), ncol = 1)
for (i in 1:maxit) {
theta <- theta - alpha * grad(x, y, theta)
}
return(theta)
}
If someone could explain why these two functions are producing different results I would greatly appreciate it. I also want to make sure that I am in fact implementing the gradient descent correctly.
Lastly, how can I plot the results of the descent with varying learning rates and superimpose this data over the results of the regression itself?
EDIT
Here are the results of running the two algorithms with alpha = .005 and 10,000 iterations:
1)
> gradientDesc(trainingDS, y, 0.005, 0.001, 32, 10000)
TEXT_SHOW_BACKTRACE environmental variable.
[1] "Optimal intercept: 2183458.95872599 Optimal slope: 62417773.0184353"
2)
> print(grad.descent(x, 10000, .005))
[,1]
x0 8.3681113
season 19.8399837
mnth -0.3515479
hr 8.0269388
holiday -16.2429750
weekday 1.9615369
workingday 7.6063719
weathersit -12.0611266
temp 157.5315413
atemp 138.8019732
hum -162.7948299
windspeed 31.5442471
To give you an example of how to write functions like this in a slightly better way, consider the following:
gradientDesc <- function(x, y, learn_rate, conv_threshold, max_iter) {
n <- nrow(x)
m <- runif(ncol(x), 0, 1) # m is a vector of dimension ncol(x), 1
yhat <- x %*% m # since x already contains a constant, no need to add another one
MSE <- sum((y - yhat) ^ 2) / n
converged = F
iterations = 0
while(converged == F) {
m <- m - learn_rate * ( 1/n * t(x) %*% (yhat - y))
yhat <- x %*% m
MSE_new <- sum((y - yhat) ^ 2) / n
if( abs(MSE - MSE_new) <= conv_threshold) {
converged = T
}
iterations = iterations + 1
MSE <- MSE_new
if(iterations >= max_iter) break
}
return(list(converged = converged,
num_iterations = iterations,
MSE = MSE_new,
coefs = m) )
}
For comparison:
ols <- solve(t(x)%*%x)%*%t(x)%*%y
Now,
out <- gradientDesc(x,y, 0.005, 1e-7, 200000)
data.frame(ols, out$coefs)
ols out.coefs
x0 33.0663095 35.2995589
season 18.5603565 18.5779534
mnth -0.1441603 -0.1458521
hr 7.4374031 7.4420685
holiday -21.0608520 -21.3284449
weekday 1.5115838 1.4813259
workingday 5.9953383 5.9643950
weathersit -0.2990723 -0.4073493
temp 100.0719903 147.1157262
atemp 226.9828394 174.0260534
hum -225.7411524 -225.2686640
windspeed 12.3671942 9.5792498
Here, x refers to your x as defined in your first code chunk. Note the similarity between the coefficients. However, also note that
out$converged
[1] FALSE
so that you could increase the accuracy by increasing the number of iterations or by playing around with the step size. It might also help to scale your variables first.
I'm trying to estimate a GARCH (1,1) model using maximum likelihood with simulated data. This is what I got:
library(fGarch)
set.seed(1)
garch11<-garchSpec(model = list())
x<-garchSim(garch11, n = 1000)
y <- t(x)
r <- y[1, ]
### Calculate Residuals
CalcResiduals <- function(theta, r)
{
n <- length(r)
omega<-theta[1]
alpha11<-theta[2]
beta11<-theta[3]
sigma.sqs <- vector(length = n)
sigma.sqs[1] <- 0.02
for (i in 1:(n-1)){
sigma.sqs[i+1] <- omega + alpha11*(r[i]^2) + beta11*sigma.sqs[i]
}
return(list(et=r, ht=sigma.sqs))
}
###Calculate the log-likelihood
GarchLogl <- function(theta, r){
res <- CalcResiduals(theta,r)
sigma.sqs <- res$ht
r <- res$et
return(-sum(dnorm(r[-1], mean = 0, sd = sqrt(sigma.sqs[-1]), log = TRUE)))
}
fit2 <- nlm(GarchLogl, # function call
p = rep(1,3), # initial values = 1 for all parameters
hessian = FALSE, # also return the hessian matrix
r = r , # data to be used
iterlim = 500) # maximum iteration
Unfortunately I get the following error message and no results
There were 50 or more warnings (use warnings() to see the first 50)
1: In sqrt(sigma.sqs[-1]) : NaNs produced
2: In nlm(GarchLogl, p = rep(1, 3), hessian = FALSE, data <- r, ... :
NA/Inf durch größte positive Zahl ersetzt
Do you have any idea whats wrong with my code? Thanks a lot!
I've written a function to calculate the MLE estimates of a Generalised Pareto Distribution. When I use it with any data though I'm getting errors like this
1: In log(beta * ksi) : NaNs produced
2: In nlm(loglik, theta, stepmax = 5000, iterlim = 1000) :
NA/Inf replaced by maximum positive value
I was wondering if anyone could spot any mistakes with my code?
MLGPD<-function(data){
xi0 <- 1
beta0 <- 360
theta <- c(xi0, beta0)
excess <- data
assign("tmp", excess)
loglik <- function(theta){
ksi <- theta[1]
beta <- theta[2]
y <- ((tmp - 0.1)/beta)
f <- ((1/ksi)+1)*sum(log(1+y)) + length(tmp) * log(beta*ksi)
f
}
fit <- nlm(loglik, theta, stepmax = 5000, iterlim= 1000)
return(fit)
par.ests <- fit$x
return(par.ests)
}
#Checking our MLE algorithm works:
rgpd<-function(n,ksi, beta){
10000+beta*(((1-runif(n, min=0, max=1))^-ksi)-1)
}
rgpd1 <- rgpd(100, 1, 2.5)
MLGPD(rgpd1)
Thanks!