Calculate the conditional probability of a joint p.d.f - r

I have a joint p.d.f.
And I am now comparing the theoretical value of the conditional probability and the empirical value which I
ran the Monte Carlo Approach.
I have to do this in replications 10,000, 100,000, and 1,000,000 draws. May I ask how to put replications in the R code?
Also, for the last step, the conditional probability,
I was using Monte Carlo Approach to do it.
Is there any R code which I can use for multivariate Uniform
distribution to calculate the conditional probability?
Any suggestions would be highly appreciated! Thanks!!
My code was as below:
# f(x.y) = (1/4)*xy, 0<x<2, 0<y<2
# Find P(A) = P(X>1)
f <- function(x){(1/2)*x} # Marginal P(X)
probE <-integrate(f, lower = 1, upper = 2)
cat('\n Pr[ 1 < X ] is \n')
print(probE)
n <- 10000
x<-runif(n, 1,2)
probE.MC <- ((2-1)/n)*sum((1/2)*x)
cat('\n Monte Carlo Pr[1< X ] =',probE.MC,'\n')
# Find P(B) = P(Y<1)
f <- function(y){(1/2)*y} # Marginal P(Y)
probB <-integrate(f, lower = 0, upper = 1)
cat('\n Pr[ 1< Y ] is \n')
probB
typeof(probB)
n <- 10000
y<-runif(n, 0,1)
probB.MC <- ((1-0)/n)*sum((1/2)*y)
cat('\n Monte Carlo Pr[Y < 1] =',probB.MC,'\n')
# Pr[A intersect B]
# P[X>1 and Y <1]
f <- function(x,y){return((1/4)*x*y)}
n <- 100000
a11<-1; a12 <-2; a21 <- 0; a22 <-1
x <-runif(n, a11, a12)
y <- runif(n,a21, a22)
probMC <- ((a12-a11)*(a22-a21)/n)*sum(f(x,y))
probMC
typeof(probMC)
# P[A|B] = p[A intersect B]/ P(B)
probAB <- probMC/probB

First, I reformatted the functions and gave them separate names.
fX <- function(x) {
0.5 * x # Marginal P(X)
}
# Find P(B) = P(Y<1)
fY <- function(y) {
0.5 * y # Marginal P(Y)
}
fXY <- function(x, y) {
1 / length(x) * sum(0.25 * x * y) # joint X,Y
}
The simplest way to do multiple runs is to wrap the MC code in a for loop and save each calculation in an array. Then, at the end, take the mean of the stored values.
So for P[A] you have:
n <- 10000
probA.MC <- numeric(n) # create the array
for (i in 1:10000) {
x<-runif(n, 1,2)
probA.MC[i] <- ((2-1) / n) * sum(0.5 * x)
}
cat('\n Monte Carlo Pr[1 < X] =',mean(probA.MC),'\n')
(I assume probE.MC should have been probA.MC.) The result was Monte Carlo Pr[1 < X] = 0.7500088. The code is analogous for P[B] and that result as Monte Carlo Pr[Y < 1] = 0.2499819.
For the joint probability we use fXY.
n <- 10000
probMC <- numeric(n)
for (i in 1:10000) {
x <- runif(n, 1, 2)
y <- runif(n, 0, 1)
probMC[i] <- ((a12-a11) * (a22-a21)) * fXY(x, y)
}
cat('\n Monte Carlo Pr[X,Y] =',mean(probMC),'\n')
This result was Monte Carlo Pr[X,Y] = 0.1882728.
The last calculation you did should read as follows (note the probB$value from the integration result):
# P[A|B] = p[A intersect B]/ P(B)
probAB <- mean(probMC) / probB$value
print(probAB)
This calculation yielded the result 0.7530913.

Related

How to fit it `Error in hist.default(res) : 'x' must be numeric`?

Following this question: How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?
I first sampling 500 eigenvectors v of a random matrix G and then generate 100 different random vectors initial of dimension 500. I normalized them in mats.
#make this example reproducible
set.seed(100001)
n <- 500
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
Then I compute res
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
I want to get res:
res <- lapply(xmats, find_t)
However, it shows error that Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) : f() values at end points not of opposite sign
res is a list. I run hist(unlist(res)) and it worked well.

Derivative of a function with matrices and vectors for Newton-Raphson method

I've tried to find roots of nonlinear equation by using a Newton-Raphson method.
The problem I stuck in is that I don't get the right first derivative of following equation:
y is the target variable for my prediction, X is a matrix of the predictor variables, Theta is a smoothing parameter.
I need to get the arg min of Q.
For that I want to use this approach of Newton-Raphson:
newton.raphson <- function(f, a, b, tol = 1e-5, n = 1000) {
require(numDeriv) # Package for computing f'(x)
x0 <- a # Set start value to supplied lower bound
k <- n # Initialize for iteration results
# Check the upper and lower bounds to see if approximations result in 0
fa <- f(a)
if (fa == 0.0) {
return(a)
}
fb <- f(b)
if (fb == 0.0) {
return(b)
}
for (i in 1:n) {
dx <- genD(func = f, x = x0)$D[1] # First-order derivative f'(x0)
x1 <- x0 - (f(x0) / dx) # Calculate next value x1
k[i] <- x1 # Store x1
if (abs(x1 - x0) < tol) {
root.approx <- tail(k, n=1)
res <- list('root approximation' = root.approx, 'iterations' = k)
return(res)
}
# If Newton-Raphson has not yet reached convergence set x1 as x0 and continue
x0 <- x1
}
print('Too many iterations in method')
Thanks in advance for your help!

Looping through functions in R

I'm trying to write a loop in R that should do the following:
Calculate the square root of a given positive number using Newtons method. My idea is something like this:
delta <- 0.0000001
x <- input_value
#DO:
x.new = 0.5*(x + mu/x)
x = x.new
#UNTIL:
abs(xˆ2 - mu) < delta
It's meant as a quick way to find the root(s) of a given number.
Does anyone has any ideas as to how to make a loop that does this in R?
This is how I ended up solving my issue:
# PROGRAM
find_roots <- function(f, a, b, delta = 0.00005, n = 1000) {
require(numDeriv) # Package for calculating f'(x)
x_0 <- a # Set start value to supplied lower bound
k <- n # Initialize for iteration results
for (i in 1:n) {
dx <- genD(func = f, x = x_0)$D[1] # First-order derivative f'(x0)
x_1 <- x_0 - (f(x_0) / dx) # Calculate next value x_1
k[i] <- x_1 # Store x_1
# Once the difference between x0 and x1 becomes sufficiently small, output the results.
if (abs(x_1 - x_0) < delta) {
root.approx <- tail(k, n=1)
res <- list('root approximation' = root.approx, 'iterations' = k)
return(res)
}
# If Newton-Raphson has not yet reached convergence set x1 as x0 and continue
x_0 <- x_1
}
print('Too many iterations in method')
}
#Example of it working:
func1 <- function(x) {
x^2 + 3*x + 1
}
# Check out the magic
newton.raphson(func1, 2,3)

How to calculate standardized Pearson residuals by hand in R?

I am trying to calculate the standardized Pearson Residuals by hand in R. However, I am struggling when it comes to calculating the hat matrix.
I have built my own logistic regression and I am trying to calculate the standardized Pearson residuals in the logReg function.
logRegEst <- function(x, y, threshold = 1e-10, maxIter = 100)
{
calcPi <- function(x, beta)
{
beta <- as.vector(beta)
return(exp(x %*% beta) / (1 + exp(x %*% beta)))
}
beta <- rep(0, ncol(x)) # initial guess for beta
diff <- 1000
# initial value bigger than threshold so that we can enter our while loop
iterCount = 0
# counter for the iterations to ensure we're not stuck in an infinite loop
while(diff > threshold) # tests for convergence
{
pi <- as.vector(calcPi(x, beta))
# calculate pi by using the current estimate of beta
W <- diag(pi * (1 - pi))
# calculate matrix of weights W as defined int he fisher scooring algorithem
beta_change <- solve(t(x) %*% W %*% x) %*% t(x) %*% (y - pi)
# calculate the change in beta
beta <- beta + beta_change # new beta
diff <- sum(beta_change^2)
# calculate how much we changed beta by in this iteration
# if this is less than threshold, we'll break the while loop
iterCount <- iterCount + 1
# see if we've hit the maximum number of iterations
if(iterCount > maxIter){
stop("This isn't converging.")
}
# stop if we have hit the maximum number of iterations
}
n <- length(y)
df <- length(y) - ncol(x)
# calculating the degrees of freedom by taking the length of y minus
# the number of x columns
vcov <- solve(t(x) %*% W %*% x)
logLik <- sum(y * log(pi / (1 - pi)) + log(1 - pi))
deviance <- -2 * logLik
AIC <- -2 * logLik + 2 * ncol(x)
rank <- ncol(x)
list(coefficients = beta, vcov = vcov, df = df, deviance = deviance,
AIC = AIC, iter = iterCount - 1, x = x, y = y, n = n, rank = rank)
# returning results
}
logReg <- function(formula, data)
{
if (sum(is.na(data)) > 0) {
print("missing values in data")
} else {
mf <- model.frame(formula = formula, data = data)
# model.frame() returns us a data.frame with the variables needed to use the
# formula.
x <- model.matrix(attr(mf, "terms"), data = mf)
# model.matrix() creates a design matrix. That means that for example the
#"Sex"-variable is given as a dummy variable with ones and zeros.
y <- as.numeric(model.response(mf)) - 1
# model.response gives us the response variable.
est <- logRegEst(x, y)
# Now we have the starting position to apply our function from above.
est$formula <- formula
est$call <- match.call()
# We add the formular and the call to the list.
nullModel <- logRegEst(x = as.matrix(rep(1, length(y))), y)
est$nullDeviance <- nullModel$deviance
est$nullDf <- nullModel$df
mu <- exp(as.vector(est$x %*% est$coefficients)) /
(1 + exp(as.vector(est$x %*% est$coefficients)))
# computing the fitted values
est$residuals <- (est$y - mu) / sqrt(mu * (1 - mu))
est$mu <- mu
est$x <- x
est$y <- y
est$data <- data
hat <- (t(mu))^(1/2)%*%x%*%(t(x)%*%mu%*%x)^(-1)%*%t(x)%*%mu^(1/2)
est$stdresiduals <- est$residuals/(sqrt(1-hat))
class(est) <- "logReg"
# defining the class
est
}
}
I am struggling when it comes to calculating 𝐻=𝑉̂1/2𝑋(𝑋𝑇𝑉̂𝑋)−1𝑋𝑇𝑉̂1/2. This is called hat in my code.
If I try to calculate the hat matrix (hat) I get the error that I cannot multiply the vector mu and the matrix x in this case: t(x)%*%mu%*%x.
I can see that the rank of the matrices are not identical and therefor I can't multiply them.
Can Anyone see where my mistake is? Help is very appreciated. Thanks!

Use the markovchain package to compare two empirically estimated Markov chains

I need to compare two probability matrices to know the degree of proximity of the chains, so I would use the resulting P-Value of the test.
I tried to use the markovchain r package, more specifically the divergenceTest function. But, the problem is that the function is not properly implemented. It is based on the test of the book "Statistical Inference Based on Divergence Measures" on page 139, I contacted the package developers, but they still have not corrected, so I tried to implement, but I'm having trouble, could anyone help me to find the error?
Parameters: freq_matrix: Is a frequency matrix used to estimate the probability matrix. hypothetic: Is the matrix used to compare with the estimated matrix.
divergenceTest3 <- function(freq_matrix, hypothetic){
n <- sum(freq_matrix)
empirical = freq_matrix
for (i in 1:length(hypothetic)){
empirical[i,] <- freq_matrix[i,]/rowSums(freq_matrix)[i]
}
M <- nrow(empirical)
v <- numeric()
out <- 2 * n / .phi2(1)
sum <- 0
c <- 0
for(i in 1:M){
sum2 <- 0
sum3 <- 0
for(j in 1:M){
if(hypothetic[i, j] > 0){
c <- c + 1
}
sum2 <- sum2 + hypothetic[i, j] * .phi(empirical[i, j] / hypothetic[i, j])
}
v[i] <- rowSums(freq_matrix)[i]
sum <- sum + ((v[i] / n) * sum2)
}
TStat <- out * sum
pvalue <- 1 - pchisq(TStat, c-M)
cat("The Divergence test statistic is: ", TStat, " the Chi-Square d.f. are: ", c-M," the p-value is: ", pvalue,"\n")
out <- list(statistic = TStat, p.value = pvalue)
return(out)
}
# phi function for divergence test
.phi <- function(x) {
out <- x*log(x) - x + 1
return(out)
}
# another phi function for divergence test
.phi2 <- function(x) {
out <- 1/x
return(out)
}
The divergence test has been replaced by the verifyHomogeneityfunction. It requires and input list of elements that can be coerced to a raw transition matrix (as of createSequenceMatrix). Then it tests whether they belong to the same unknown DTMC.
See the example below:
myMatr1<-matrix(c(0.2,.8,.5,.5),byrow=TRUE, nrow=2)
myMatr2<-matrix(c(0.5,.5,.4,.6),byrow=TRUE, nrow=2)
mc1<-as(myMatr1,"markovchain")
mc2<-as(myMatr2,"markovchain")
mc
mc2
sample1<-rmarkovchain(n=100, object=mc1)
sample2<-rmarkovchain(n=200, object=mc2)
# should reject
verifyHomogeneity(inputList = list(sample1,sample2))
#should accept
sample2<-rmarkovchain(n=200, object=mc1)
verifyHomogeneity(inputList = list(sample1,sample2))

Resources