I'm asked to simulate 50 survival times from an exponential distribution with rate 1.
n <- 50
Tstar <- rexp(n, rate = 1)
Then I have the following quantities:
Y(t) capturing the individuals at risk at time t, i.e.
Y <- function(t){sum(Tstar > t)}
and S(t) is the Kaplan-Meier estimator
S <- function(t)(1 - 1/n * sum(Tstar < t)
But how do I define the following function?
Here Tstar[i] indicates T_i.
If I understand my mathematical notations, consider Reduce + sapply to serve as iterator and summation across specific values:
set.seed(4621)
n <- 50
Tstar <- rexp(n, rate = 1)
Y <- function(t) sum(Tstar > t)
S <- function(t) (1 - 1/n * sum(Tstar < t))
sigma_sq <- function(t) {
Tstar <- Tstar[Tstar < t]
S(t)^2 * Reduce(`+`, sapply(Tstar, function(T_i) 1/(Y(t)*(T_i)^2)))
}
Related
Following this question: How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?
I first sampling 500 eigenvectors v of a random matrix G and then generate 100 different random vectors initial of dimension 500. I normalized them in mats.
#make this example reproducible
set.seed(100001)
n <- 500
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
Then I compute res
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
I want to get res:
res <- lapply(xmats, find_t)
However, it shows error that Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) : f() values at end points not of opposite sign
res is a list. I run hist(unlist(res)) and it worked well.
I am attempting to convolve() 36 beta distributions. The result distribution is one of the two input distributions to the next successive call to convolve(). After every convolve(), the result has row count = (nrow(vector1)+nrow(vector2)-1). In effect, the row count of the result distribution almost doubles with every call to convolve(). This is very inefficient - it makes runtime impossibly long and consumes large amounts of memory. Is there any way to keep the row count constant?
Code example below ...
# Function from https://stat.ethz.ch/pipermail/r-help/2008-July/168762.html
weighted.var <- function(x, w, na.rm = FALSE) {
if (na.rm) {
w <- w[i <- !is.na(x)]
x <- x[i]
}
sum.w <- sum(w)
sum.w2 <- sum(w^2)
mean.w <- sum(x * w) / sum(w)
(sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm = na.rm);
}
# Define beta distribution shape parameters.
s1a <- 3.52; s1b <- 65.35;
s2a <- 1.684; s2b <- 189.12;
s3a <- 5.696; s3b <- 32.34;
s4a <- 1.81; s4b <- 185.5;
# Define intial set of quantiles.
mQ1 <- matrix(data=seq(0,1,1/1000),ncol=1);
for (i in 1:3){
mPDF <- matrix(data=convolve(dbeta(mQ1,s1a,s1b),rev(dbeta(mQ1,s2a,s2b)),type="open"),ncol=1L);
print(paste(nrow(mPDF),' rows',sep=''));
if(i < 3){
# Calculate the merged shape parameters directly from mPDF.
mQ2 <- matrix(data=seq(0,1L,(1L/(nrow(mPDF)-1L))),ncol=1L);
wtMean <- weighted.mean(mQ2,mPDF);
wtStd <- sqrt(weighted.var(mQ2,mPDF));
s1a <- -1L * ((wtMean*(wtStd^2 + wtMean^2 - wtMean))/wtStd^2);
s1b <- ((wtStd^2 + wtMean^2 - wtMean)*(wtMean - 1))/wtStd^2;
s2a <- s3a; s2b <- s3b;
mQ1 <- mQ2;
}
} #i
I am trying to calculate the standardized Pearson Residuals by hand in R. However, I am struggling when it comes to calculating the hat matrix.
I have built my own logistic regression and I am trying to calculate the standardized Pearson residuals in the logReg function.
logRegEst <- function(x, y, threshold = 1e-10, maxIter = 100)
{
calcPi <- function(x, beta)
{
beta <- as.vector(beta)
return(exp(x %*% beta) / (1 + exp(x %*% beta)))
}
beta <- rep(0, ncol(x)) # initial guess for beta
diff <- 1000
# initial value bigger than threshold so that we can enter our while loop
iterCount = 0
# counter for the iterations to ensure we're not stuck in an infinite loop
while(diff > threshold) # tests for convergence
{
pi <- as.vector(calcPi(x, beta))
# calculate pi by using the current estimate of beta
W <- diag(pi * (1 - pi))
# calculate matrix of weights W as defined int he fisher scooring algorithem
beta_change <- solve(t(x) %*% W %*% x) %*% t(x) %*% (y - pi)
# calculate the change in beta
beta <- beta + beta_change # new beta
diff <- sum(beta_change^2)
# calculate how much we changed beta by in this iteration
# if this is less than threshold, we'll break the while loop
iterCount <- iterCount + 1
# see if we've hit the maximum number of iterations
if(iterCount > maxIter){
stop("This isn't converging.")
}
# stop if we have hit the maximum number of iterations
}
n <- length(y)
df <- length(y) - ncol(x)
# calculating the degrees of freedom by taking the length of y minus
# the number of x columns
vcov <- solve(t(x) %*% W %*% x)
logLik <- sum(y * log(pi / (1 - pi)) + log(1 - pi))
deviance <- -2 * logLik
AIC <- -2 * logLik + 2 * ncol(x)
rank <- ncol(x)
list(coefficients = beta, vcov = vcov, df = df, deviance = deviance,
AIC = AIC, iter = iterCount - 1, x = x, y = y, n = n, rank = rank)
# returning results
}
logReg <- function(formula, data)
{
if (sum(is.na(data)) > 0) {
print("missing values in data")
} else {
mf <- model.frame(formula = formula, data = data)
# model.frame() returns us a data.frame with the variables needed to use the
# formula.
x <- model.matrix(attr(mf, "terms"), data = mf)
# model.matrix() creates a design matrix. That means that for example the
#"Sex"-variable is given as a dummy variable with ones and zeros.
y <- as.numeric(model.response(mf)) - 1
# model.response gives us the response variable.
est <- logRegEst(x, y)
# Now we have the starting position to apply our function from above.
est$formula <- formula
est$call <- match.call()
# We add the formular and the call to the list.
nullModel <- logRegEst(x = as.matrix(rep(1, length(y))), y)
est$nullDeviance <- nullModel$deviance
est$nullDf <- nullModel$df
mu <- exp(as.vector(est$x %*% est$coefficients)) /
(1 + exp(as.vector(est$x %*% est$coefficients)))
# computing the fitted values
est$residuals <- (est$y - mu) / sqrt(mu * (1 - mu))
est$mu <- mu
est$x <- x
est$y <- y
est$data <- data
hat <- (t(mu))^(1/2)%*%x%*%(t(x)%*%mu%*%x)^(-1)%*%t(x)%*%mu^(1/2)
est$stdresiduals <- est$residuals/(sqrt(1-hat))
class(est) <- "logReg"
# defining the class
est
}
}
I am struggling when it comes to calculating 𝐻=𝑉̂1/2𝑋(𝑋𝑇𝑉̂𝑋)−1𝑋𝑇𝑉̂1/2. This is called hat in my code.
If I try to calculate the hat matrix (hat) I get the error that I cannot multiply the vector mu and the matrix x in this case: t(x)%*%mu%*%x.
I can see that the rank of the matrices are not identical and therefor I can't multiply them.
Can Anyone see where my mistake is? Help is very appreciated. Thanks!
I have a joint p.d.f.
And I am now comparing the theoretical value of the conditional probability and the empirical value which I
ran the Monte Carlo Approach.
I have to do this in replications 10,000, 100,000, and 1,000,000 draws. May I ask how to put replications in the R code?
Also, for the last step, the conditional probability,
I was using Monte Carlo Approach to do it.
Is there any R code which I can use for multivariate Uniform
distribution to calculate the conditional probability?
Any suggestions would be highly appreciated! Thanks!!
My code was as below:
# f(x.y) = (1/4)*xy, 0<x<2, 0<y<2
# Find P(A) = P(X>1)
f <- function(x){(1/2)*x} # Marginal P(X)
probE <-integrate(f, lower = 1, upper = 2)
cat('\n Pr[ 1 < X ] is \n')
print(probE)
n <- 10000
x<-runif(n, 1,2)
probE.MC <- ((2-1)/n)*sum((1/2)*x)
cat('\n Monte Carlo Pr[1< X ] =',probE.MC,'\n')
# Find P(B) = P(Y<1)
f <- function(y){(1/2)*y} # Marginal P(Y)
probB <-integrate(f, lower = 0, upper = 1)
cat('\n Pr[ 1< Y ] is \n')
probB
typeof(probB)
n <- 10000
y<-runif(n, 0,1)
probB.MC <- ((1-0)/n)*sum((1/2)*y)
cat('\n Monte Carlo Pr[Y < 1] =',probB.MC,'\n')
# Pr[A intersect B]
# P[X>1 and Y <1]
f <- function(x,y){return((1/4)*x*y)}
n <- 100000
a11<-1; a12 <-2; a21 <- 0; a22 <-1
x <-runif(n, a11, a12)
y <- runif(n,a21, a22)
probMC <- ((a12-a11)*(a22-a21)/n)*sum(f(x,y))
probMC
typeof(probMC)
# P[A|B] = p[A intersect B]/ P(B)
probAB <- probMC/probB
First, I reformatted the functions and gave them separate names.
fX <- function(x) {
0.5 * x # Marginal P(X)
}
# Find P(B) = P(Y<1)
fY <- function(y) {
0.5 * y # Marginal P(Y)
}
fXY <- function(x, y) {
1 / length(x) * sum(0.25 * x * y) # joint X,Y
}
The simplest way to do multiple runs is to wrap the MC code in a for loop and save each calculation in an array. Then, at the end, take the mean of the stored values.
So for P[A] you have:
n <- 10000
probA.MC <- numeric(n) # create the array
for (i in 1:10000) {
x<-runif(n, 1,2)
probA.MC[i] <- ((2-1) / n) * sum(0.5 * x)
}
cat('\n Monte Carlo Pr[1 < X] =',mean(probA.MC),'\n')
(I assume probE.MC should have been probA.MC.) The result was Monte Carlo Pr[1 < X] = 0.7500088. The code is analogous for P[B] and that result as Monte Carlo Pr[Y < 1] = 0.2499819.
For the joint probability we use fXY.
n <- 10000
probMC <- numeric(n)
for (i in 1:10000) {
x <- runif(n, 1, 2)
y <- runif(n, 0, 1)
probMC[i] <- ((a12-a11) * (a22-a21)) * fXY(x, y)
}
cat('\n Monte Carlo Pr[X,Y] =',mean(probMC),'\n')
This result was Monte Carlo Pr[X,Y] = 0.1882728.
The last calculation you did should read as follows (note the probB$value from the integration result):
# P[A|B] = p[A intersect B]/ P(B)
probAB <- mean(probMC) / probB$value
print(probAB)
This calculation yielded the result 0.7530913.
Trying to wrap my mind arround vectorizing, trying to make some simulations faster I found this very basic epidemic simulation. The code is from the book http://www.amazon.com/Introduction-Scientific-Programming-Simulation-Using/dp/1420068725/ref=sr_1_1?ie=UTF8&qid=1338069156&sr=8-1
#program spuRs/resources/scripts/SIRsim.r
SIRsim <- function(a, b, N, T) {
# Simulate an SIR epidemic
# a is infection rate, b is removal rate
# N initial susceptibles, 1 initial infected, simulation length T
# returns a matrix size (T+1)*3 with columns S, I, R respectively
S <- rep(0, T+1)
I <- rep(0, T+1)
R <- rep(0, T+1)
S[1] <- N
I[1] <- 1
R[1] <- 0
for (i in 1:T) {
S[i+1] <- rbinom(1, S[i], (1 - a)^I[i])
R[i+1] <- R[i] + rbinom(1, I[i], b)
I[i+1] <- N + 1 - R[i+1] - S[i+1]
}
return(matrix(c(S, I, R), ncol = 3))
}
The core of the simulation is the for loop. My question, is since the code produces the S[i+1] and R[i+1] values from the S[i] and R[i] values, is it possible to vectorize it with an apply function?
Many thanks
It's hard to 'vectorize' iterative calculations, but this is a simulation and simulations are likely to be run many times. So write this to do all the the simulations at the same time by adding an argument M (number of simulations to perform), allocating an M x (T + 1) matrix, and then filling in successive columns (times) of each simulation. The changes seem to be remarkably straight-forward (so I've probably made a mistake; I'm particularly concerned about the use of vectors in the second and third arguments to rbinom, though this is consistent with the documentation).
SIRsim <- function(a, b, N, T, M) {
## Simulate an SIR epidemic
## a is infection rate, b is removal rate
## N initial susceptibles, 1 initial infected, simulation length T
## M is the number of simulations to run
## returns a list of S, I, R matricies, each M simulation
## across T + 1 time points
S <- I <- R <- matrix(0, M, T + 1)
S[,1] <- N
I[,1] <- 1
for (i in seq_along(T)) {
S[,i+1] <- rbinom(M, S[,i], (1 - a)^I[,i])
R[,i+1] <- R[,i] + rbinom(M, I[,i], b)
I[,i+1] <- N + 1 - R[,i+1] - S[,i+1]
}
list(S=S, I=I, R=R)
}