How to dd conditions to the bivariate distribution( x-mu =>0) and (x-mu <0) in plotting contour plot? - r

bivariate.ESEP <- function(x, mu, Sigma ,eps)
{
sqrt(det(k))/(2*(gamma(1+2/alp))*sqrt(det(Sigma)))*
exp(-(((t(x-mu)%*% solve((I-eps)^2)%*% solve(Sigma) %*%k%*%(x-mu))^alp/2)*ifelse(x-mu>=0,1,0) +
(((t(mu-x)%*% solve((I+eps)^2)%*% solve(Sigma)%*%k%*%(mu-x)))^alp/2)))*ifelse(x-mu<0,1,0)
}
mu <- c(179.81,22.2)
Sigma <- matrix(c(0.6,0.35,0.35,2), nrow=2)
Sigma<-Sigma1
eps<-matrix(c(-0.186,0,0,0.6), nrow=2)
k<-matrix(c(1,0,0,1), nrow=2)
I<-matrix(c(1,0,0,1), nrow=2)
alp =1
x1 <- seq(150, 220)
x2 <- seq(0, 40)
z <- outer(x1, x2, FUN=function(x1, x2, ... ){
apply(cbind(x1,x2), 1, bivariate.ESEP, ...)
}, mu=mu, Sigma=Sigma ,eps = eps)
plot(dat1, xlab="Ht", ylab="BMI", pch=19, cex=.7)
contour(x1, x2, z, col="blue", drawlabels=FALSE, nlevels= 20,
xlab=expression(x[1]), ylab=expression(x[2]), lwd=1, add = TRUE)
x is a bivariate distribution. So if (x>=mu) then we take the first part of bivariatte.ESEP and putting the other part 0 and similarly if (x

Related

Is there a way to add a line to a specific subplot in R

I'm new to R, trying to understand how plotting works.
I'm trying to plot a graph consisting of three subgraphs. I use the par(mfrow=c(1, 3)) to deal with that. But inside a loop, I want to add various lines to the three graphs. How do I choose which of the three subplots I apply the lines command to?
As an example, see the code below. All the lines commands get applied to the third subgraph, but this is of course not desired. See also the commands # apply lines to first figure! but how? and # apply lines to second figure! but how?
set.seed(1)
n <- 100
x <- seq(0, 4, length.out = n)
no_datasets <- 50
par(mfrow=c(1, 3))
for (i in 1:no_datasets) {
x <- seq(0, 4, length.out = n) # x <- seq(-pi, pi, length.out = n)
y <- sin(x)
errs <- rnorm(n, mean = 0, sd = 0.25) # rnorm(n) generates random numbers whose distribution is normal
t <- y + errs
df <- data.frame(x, y, t, errs)
model1 <- lm(t ~ poly(x, 1), data = df)
model5 <- lm(t ~ poly(x, 5), data = df)
model25 <- lm(t ~ poly(x, 25), data = df)
if (i == 1) {
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
}
t_hat <- predict(model1, df)
# apply lines to first figure! but how?
lines(df$x, t_hat, col="blue")
t_hat <- predict(model5, df)
# apply lines to second figure! but how?
lines(df$x, t_hat, col="blue")
t_hat <- predict(model25, df)
# apply lines to third figure!
lines(df$x, t_hat, col="blue")
}
Since the third plot is last, then it is just adding all the lines to the last plot. But if you nest everything in an if statement for each plot, then you will get the lines on each respective plot.
set.seed(1)
n <- 100
x <- seq(0, 4, length.out = n)
no_datasets <- 50
par(mfrow=c(1, 3))
for (i in 1:no_datasets) {
x <- seq(0, 4, length.out = n) # x <- seq(-pi, pi, length.out = n)
y <- sin(x)
errs <- rnorm(n, mean = 0, sd = 0.25) # rnorm(n) generates random numbers whose distribution is normal
t <- y + errs
df <- data.frame(x, y, t, errs)
model1 <- lm(t ~ poly(x, 1), data = df)
model5 <- lm(t ~ poly(x, 5), data = df)
model25 <- lm(t ~ poly(x, 25), data = df)
if (i == 1) {
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
t_hat <- predict(model1, df)
lines(df$x, t_hat, col="blue")
}
if (i == 2) {
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
t_hat <- predict(model5, df)
lines(df$x, t_hat, col="blue")
}
if (i == 3) {
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
t_hat <- predict(model25, df)
lines(df$x, t_hat, col="blue")
}
}
Ok, if I understand AndrewGB, it is not possible. So then the answer should be:
set.seed(1)
n <- 100
no_datasets <- 50
par(mfrow=c(1, 3))
polynomials <- c(1, 5, 25)
x <- seq(0, 4, length.out = n) # x <- seq(-pi, pi, length.out = n)
y <- sin(x)
for (i in 1:length(polynomials)) {
degree <- polynomials[i]
for (j in 1:no_datasets) {
errs <- rnorm(n, mean = 0, sd = 0.25) # rnorm(n) generates random numbers whose distribution is normal
t <- y + errs
df <- data.frame(x, y, t, errs)
model <- lm(t ~ poly(x, degree), data = df)
if (j == 1) {
plot(df$x, df$y, xlab = "x", ylab="", col="black", type="l")
}
t_hat <- predict(model, df)
lines(df$x, t_hat, col="blue")
}
}

Why do MASS:lm.ridge coefficents differ from those calculated manually?

When performing ridge regression manually, as it is defined
solve(t(X) %*% X + lbd*I) %*%t(X) %*% y
I get different results from those calculated by MASS::lm.ridge. Why? For ordinary linear regression the manual method (computing the pseudoinverse) works fine.
Here is my Minimal, Reproducible Example:
library(tidyverse)
ridgeRegression = function(X, y, lbd) {
Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
t(Rinv %*% t(X) %*% y)
}
# generate some data:
set.seed(0)
tb1 = tibble(
x0 = 1,
x1 = seq(-1, 1, by=.01),
x2 = x1 + rnorm(length(x1), 0, .1),
y = x1 + x2 + rnorm(length(x1), 0, .5)
)
X = as.matrix(tb1 %>% select(x0, x1, x2))
# sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14
# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392
MASS::lm.ridge scales the data before modelling - this accounts for the difference in the coefficients.
You can confirm this by checking the function code by typing MASS::lm.ridge into the R console.
Here is the lm.ridge function with the scaling portion commented out:
X = as.matrix(tb1 %>% select(x0, x1, x2))
n <- nrow(X); p <- ncol(X)
#Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
#X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% tb1$y
d <- Xs$d
lscoef <- Xs$v %*% (rhs/d)
lsfit <- X %*% lscoef
resid <- tb1$y - lsfit
s2 <- sum(resid^2)/(n - p)
HKB <- (p-2)*s2/sum(lscoef^2)
LW <- (p-2)*s2*n/sum(lsfit^2)
k <- 1
dx <- length(d)
div <- d^2 + rep(10, rep(dx,k))
a <- drop(d*rhs)/div
dim(a) <- c(dx, k)
coef <- Xs$v %*% a
coef
# x0 x1 x2
#[1,] 0.01384984 0.8667353 0.9452382

R plottinjg polynomial regression line

I try to plot a polynomial regression line but the plotted line does not make sense. I use the iris dataset from the package datasets. This is my code:
library(datasets)
data2 <- iris[iris$Species != "setosa", ]
x<- data2$Sepal.Length
y <- data2$Species
fit <- lm(y ~ poly(x, 3)) ## polynomial of degree 3
plot(x, y) ## scatter plot (colour: black)
x0 <- seq(min(x), max(x), length = 20) ## prediction grid
y0 <- predict.lm(fit, newdata = list(x = x0)) ## predicted values
lines(x0, y0, col = 2)
I needed to rescale my y-values. This code works:
library(datasets)
data2 <- iris[iris$Species != "setosa", ]
data2["Species"] <- as.numeric(unlist(data2["Species"]))
x<- data2$Sepal.Length
y <- data2$Species
y <- rescale(data2$Species, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE))
fit <- lm(y ~ poly(x, 3)) ## polynomial of degree 3
plot(x, y) ## scatter plot (colour: black)
x0 <- seq(min(x), max(x), length = 20) ## prediction grid
y0 <- predict.lm(fit, newdata = list(x = x0)) ## predicted values
lines(x0, y0, col = 2)

Bivariate normal with marginal and conditional densities

I am trying to create a figure in R. It consists of the contour plot of a bivariate normal distribution for the vector variable (x,y) along with the marginals f(x), f(y); the conditional distribution f(y|x) and the line through the conditioning value X=x (it will be a simple abline(v=x)).
I already got the contour and the abline:
but I don't know how to continue.
Here is the code I used so far:
bivariate.normal <- function(x, mu, Sigma) {
exp(-.5 * t(x-mu) %*% solve(Sigma) %*% (x-mu)) / sqrt(2 * pi * det(Sigma))
}
mu <- c(0,0)
Sigma <- matrix(c(1,.8,.8,1), nrow=2)
x1 <- seq(-3, 3, length.out=50)
x2 <- seq(-3, 3, length.out=50)
z <- outer(x1, x2, FUN=function(x1, x2, ...){
apply(cbind(x1,x2), 1, bivariate.normal, ...)
}, mu=mu, Sigma=Sigma)
contour(x1, x2, z, col="blue", drawlabels=FALSE, nlevels=4,
xlab=expression(x[1]), ylab=expression(x[2]), lwd=1)
abline(v=.7, col=1, lwd=2, lty=2)
text(2, -2, labels=expression(x[1]==0.7))
It would have been helpful if you had provided the function to calculate the marginal distribution. I may have got the marginal distribution function wrong, but I think this gets you what you want:
par(lwd=2,mgp=c(1,1,0))
# Modified to extract diagonal.
bivariate.normal <- function(x, mu, Sigma)
exp(-.5 * diag(t(x-mu) %*% solve(Sigma) %*% (x-mu))) / sqrt(2 * pi * det(Sigma))
mu <- c(0,0)
Sigma <- matrix(c(1,.8,.8,1), nrow=2)
x1 <- seq(-3, 3, length.out=50)
x2 <- seq(-3, 3, length.out=50)
plot(1:10,axes=FALSE,frame.plot=TRUE,lwd=1)
# z can now be calculated much easier.
z<-bivariate.normal(t(expand.grid(x1,x2)),mu,Sigma)
dim(z)<-c(length(x1),length(x2))
contour(x1, x2, z, col="#4545FF", drawlabels=FALSE, nlevels=4,
xlab=expression(x[1]), ylab=expression(x[2]), lwd=2,xlim=range(x1),ylim=range(x2),frame.plot=TRUE,axes=FALSE,xaxs = "i", yaxs = "i")
axis(1,labels=FALSE,lwd.ticks=2)
axis(2,labels=FALSE,lwd.ticks=2)
abline(v=.7, col=1, lwd=2, lty=2)
text(2, -2, labels=expression(x[1]==0.7))
# Dotted
f<-function(x1,x2) bivariate.normal(t(cbind(x1,x2)),mu,Sigma)
x.s<-seq(from=min(x1),to=max(x1),by=0.1)
vals<-f(x1=0.7,x2=x.s)
lines(vals-abs(min(x1)),x.s,lty=2,lwd=2)
# Marginal probability distribution: http://mpdc.mae.cornell.edu/Courses/MAE714/biv-normal.pdf
# Please check this, I'm not sure it is correct.
marginal.x1<-function(x) exp((-(x-mu[1])^2)/2*(Sigma[1,2]^2)) / (Sigma[1,2]*sqrt(2*pi))
marginal.x2<-function(x) exp((-(x-mu[1])^2)/2*(Sigma[2,1]^2)) / (Sigma[2,1]*sqrt(2*pi))
# Left side solid
vals<-marginal.x2(x.s)
lines(vals-abs(min(x1)),x.s,lty=1,lwd=2)
# Bottom side solid
vals<-marginal.x1(x.s)
lines(x.s,vals-abs(min(x2)),lty=1,lwd=2)
My solution in ggplot2, inspired in this post
rm(list=ls())
options(max.print=999999)
library(pacman)
p_load(tidyverse)
p_load(mvtnorm)
my_mean<-c(25,65)
mycors<-seq(-1,1,by=.25)
sd_vec<-c(5,7)
i<-3
temp_cor<-matrix(c(1,mycors[i],
mycors[i],1),
byrow = T,ncol=2)
V<-sd_vec %*% t(sd_vec) *temp_cor
###data for vertical curve
my_dnorm<- function(x, mean = 0, sd = 1, log = FALSE, new_loc, multplr){
new_loc+dnorm(x, mean , sd, log)*multplr
}
##margina Y distribution
yden<-data.frame(y=seq(48,82,length.out = 100),x=my_dnorm(seq(48,82,length.out = 100),my_mean[2],sd_vec[2],new_loc=8,multplr=100))
##conditional distribution
my_int<-(my_mean[2]-(V[1,2]*my_mean[1]/V[1,1]))
my_slp<-V[1,2]/V[1,1]
givenX<-34
mu_givenX<-my_int+givenX*my_slp
sigma2_givenX<-(1-mycors[i]^2)*V[2,2]
y_givenX_range<-seq(mu_givenX-3*sqrt(sigma2_givenX),mu_givenX+3*sqrt(sigma2_givenX),length.out = 100)
yden_x<-data.frame(y=y_givenX_range, x=my_dnorm(y_givenX_range,mu_givenX,sqrt(sigma2_givenX),new_loc=givenX,multplr=80))
yden_x<-data.frame(y=y_givenX_range, x=my_dnorm(y_givenX_range,mu_givenX,sqrt(sigma2_givenX),new_loc=8,multplr=80))
###data for drawing ellipse
data.grid <- expand.grid(x = seq(my_mean[1]-3*sd_vec[1], my_mean[1]+3*sd_vec[1], length.out=200),
y = seq(my_mean[2]-3*sd_vec[2], my_mean[2]+3*sd_vec[2], length.out=200))
q.samp <- cbind(data.grid, prob = dmvnorm(data.grid, mean = my_mean, sigma = V))
###plot
ggplot(q.samp, aes(x=x, y=y, z=prob)) +
geom_contour() + theme_bw()+
geom_abline(intercept = my_int, slope = my_slp, color="red",
linetype="dashed")+
stat_function(fun = my_dnorm, n = 101, args = list(mean = my_mean[1], sd = sd_vec[1], new_loc=35,multplr=100),color=1) +
geom_path(aes(x=x,y=y), data = yden,inherit.aes = FALSE) +
geom_path(aes(x=x,y=y), data = yden_x,inherit.aes = FALSE,color=1,linetype="dashed") +
geom_vline(xintercept = givenX,linetype="dashed")
Created on 2020-10-31 by the reprex package (v0.3.0)

Adding confidence intervals to plot from simulation data in R

I've created a probit simulation based on a likelihood function and simulation, all of which can be replicated with the code below.
This is the likelihood function:
probit.ll <- function(par,ytilde,x) {
a <- par[1]
b <- par[2]
return( -sum( pnorm(ytilde*(a + b*x),log=TRUE) ))
}
This is the function to do the estimates:
my.probit <- function(y,x) {
# use OLS to get start values
par <- lm(y~x)$coefficients
ytilde <- 2*y-1
# Run optim
res <- optim(par,probit.ll,hessian=TRUE,ytilde=ytilde,x=x)
# Return point estimates and SE based on the inverse of Hessian
names(res$par) <- c('a','b')
se=sqrt(diag(solve(res$hessian)))
names(se) <- c('a','b')
return(list(par=res$par,se=se,cov=solve(res$hessian)))
}
And this is the function to generate the simulated model:
probit.data <- function(N=100,a=1,b=1) {
x <- rnorm(N)
y.star <- a + b*x + rnorm(N)
y <- (y.star > 0)
return( as.data.frame(cbind(y,x,y.star)) )
}
This simulates an n size equal 100:
probit.data100 <- function(N=100,a=2,b=1) {
x <- rnorm(N)
y.star <- a + b*x + rnorm(N)
y <- (y.star > 0)
return( as.data.frame(cbind(y,x,y.star)) )
}
#predicted value
se.probit.phat100 <- function(x, par, V) {
z <- par[1] + par[2] * x
# Derivative of q w.r.t. alpha and beta
J <- c( dnorm(z), dnorm(z)*par[2] )
return( sqrt(t(J) %*% V %*% J) )
}
dat100 <- probit.data100()
res100 <- my.probit(dat100$y,dat100$x)
res100
This function below will calculate the confidence intervals based on a non-parametric bootstrap approach (notice the sample function being used):
N <- dim(probit.data(N=100, a=1, b=1))[1]
npb.par <- matrix(NA,100,2)
colnames(npb.par) <- c("alpha","beta")
npb.eystar <- matrix(NA,100,N)
for (t in 1:100) {
thisdta <- probit.data(N=100, a=1, b=1)[sample(1:N,N,replace=TRUE),]
npb.par[t,] <- my.probit(thisdta$y,thisdta$x)$par
}
This function below just cleans up the bootstrap output, and the confidence intervals are what I would like to plot:
processres <- function(simres) {
z <- t(apply(simres,2,function(x) { c(mean(x),median(x),sd(x),quantile(x,c(0.05,0.95))) } ))
rownames(z) <- colnames(simres)
colnames(z) <- c("mean","median","sd","5%","95%")
z
}
processres(npb.par)
I would like to plot a graph like this (the one below), but add confidence intervals based on the processres function above. How can these confidence intervals be added to the plot?
x <- seq(-5,5,length=100)
plot(x, pnorm(1 - 0.5*x), ty='l', lwd=2, bty='n', xlab='x', ylab="Pr(y=1)")
rug(dat100$x)
I'm also open to a different plot code and/or package. I just want a graph based on this simulation with added confidence intervals.
Thanks!
Here's a way to add a shaded CI based on simulation results:
UPDATE: this now plots the expected curve (i.e. using mean alpha & beta values), and correctly passes these means to rnorm.
x <- seq(-5,5,length=100)
plot(x, pnorm(1 - 0.5*x), ty='n', lwd=2, bty='n', xlab='x', ylab="Pr(y=1)",
xaxs = 'i', ylim=c(0, 1))
params <- processres(npb.par)
sims <- 100000
sim.mat <- matrix(NA, ncol=length(x), nrow=sims)
for (i in 1:sims) {
alpha <- rnorm(1, params[1, 1], params[1, 3])
beta <- rnorm(1, params[2, 1], params[2, 3])
sim.mat[i, ] <- pnorm(alpha - beta*x)
}
CI <- apply(sim.mat, 2, function(x) quantile(x, c(0.05, 0.95)))
polygon(c(x, rev(x)), c(CI[1, ], rev(CI[2, ])), col='gray', border=NA)
lines(x, pnorm(params[1, 1] - params[2, 1]*x), lwd=2)
rug(dat100$x)
box()

Resources