Generating iid variates in R - r

I am working on a question and it reads:
Generate 1000 iid variates of X=(x1, x2) with a specific definition that is given in an example. The definition is that x1 is a standard normal distribution so N(0,1). However x2 is approximately x1 when -1 <= x1 <=1 and is x1 otherwise.
I have used the following code to generate bi variate random variables before in R but I do not know how to to get X=(x1, x2) and then plot.
library(gplots)
n<-10^6
sigma<- matrix(c(4,2,2,3), ncol=2)
x<- rmvnorm(n, mean = c(1,2), sigma = sigma)
h2d<- hist2d(x, show = FALSE, same.scale = TRUE, nbins = c(40,40))
persp(h2d$x, h2d$y, h2d$counts/n, ticktype="detailed", theta=30, phi=30, expand=0.5, shade=0.5, col="cyan", ltheta=-30,main="Bivariate Normal Distribution", zlab="f(x1,x2)",xlab="x1",ylab="x2")
So I know the above is not correct but I am not sure if I can do something similar or I am going about this all wrong.

It's simpler than you think:
set.seed(1) # Setting a seed
X1 <- rnorm(1000) # Simulating X1
X2 <- ifelse(abs(X1) <= 1, -X1, X1) # If abs(X1) <= 1, then set X2=-X1 and X2=X1 otherwise.
Since the question is about normal marginals but not normal bivariate distribution, we may look at a bivariate density estimate:
library(MASS)
image(kde2d(X1,X2))
Clearly the shape is not an ellipsoid, so the bivariate distribution is not normal even though both marginals are normal.
It can also be seen analytically. Let Z=X1+X2. If (X1,X2) was bivariate normal, then Z also would be normal. But P(Z = 0) >= P(|X1| <= 1) ~= 0.68, i.e., it has positive mass at zero, which cannot be the case with a continuous distribution.

You can use rnorm base function to generate the normal distribution. And, using simple ifelse function we can get x2, shown below:
x1 <- rnorm(10, mean = 0, sd = 1)
x2 <- ifelse( ((x1 <= 1) & (x1 >= -1)), -x1, x1)
plot(x1, x2, type='p')

Related

Quadprog to constrain coefficients of linear regression

I need some help. I need to fit a linear model (y ~ X1) with the following constraints:
Intercept + Beta1 <= 1; and both Intercept and Beta1 need to belong to [0,1]
After looking how to do this in R, I found that Quadprog seems like the best option. However, I do not fully understand how to set those constraints. I take an example form a webpage (http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html) to adapted to my needs, but the sum of coefficients are greater than 1 in some cases (by teh way, I have to run this model on different data frames, and in some the constraint is ok, but in some cases Intercept + beta1 > 1).
My script is:
n <- 100
x1 <- rnorm(n)
y <- .3 * x1 + .2 * x2 + .5*x3 + rnorm(n)
X <- cbind(1, x1)
Mod <- solve.QP(t(X) %*% X, t(y) %*% X, cbind(matrix(1, nr=2, nc=1),
diag(2),
-diag(2)),
c(1, 0, 0, -1,-1),
meq = 1) ""
As I told before, my problem is that in some cases, intercept plus beta1 are greater than 1, and I do not understand how to set that constraint in Quadprog with that matrix type.

How to do a one-sample location, two-way approximate Z test in R using estimates from the delta method?

I used the delta method to estimate the difference between two coefficients from a glm fit (attached code below). Now, I want to compare this estimate to zero (i.e., a null hypothesis of no difference). One article mentions using a one-sample location, two-way approximate Z test to compute this difference.
However, I cannot find an easy way to do that in R using the delta difference. I looked over the two-sample Z test documentation and possibly thought of using the difference as a substitute in the z-stat formula...but I am not sure if that's the best way to go about it.
##GENERATE DATA SET
y <- c(1:12)
x1 <- rep(c(1000, 4000, 0), each = 4)
x2 <- rep(c(0, 1000, 4000), each = 4)
df <- data.frame(y, x1, x2)
##RUN GLM
library(lmerTest)
g1 <- glm(log(y) ~ x1 + x2, data = df)
##Use delta-method to estimate the difference between coefficients of x1 and x2 (Ritz & Streibig 2008)
library(car)
g1.delta <- deltaMethod(g1,"(-x1) - (-x2)")
Estimate SE 2.5 % 97.5 %
(-x1) - (-x2) 2.3217e-04 7.3180e-05 8.8738e-05 4e-04

Simulating conditional distribution in R

I have a question about simulating conditional distribution.
Suppose
X ~ N(0,1)
Y ~ N(rX, 1-r^2)
I want to simulate Y distribution which is conditioning on X.
The r in here is the correlation, and it can be changed for purpose.
The X distribution code would be as follows;
sd.x <- 1
mean.x <- 0
z2 <- rnorm(1000)
x <- sd.x*z2 + mean.x
But, I have no idea about simulating Y distribution.
I'll be appreciate with help.
It seems you are in the case of a linear regression ...
You can write Y = rX + epsilon, where epsilon folows N(0,(1-r)^2).
You can chek that Y has the properties you are looking for ..
So, in r, to complete your code, somthing like this should be enough :
r <- 0.8
y <- r*x + rnorm(1000, mean = 0, sd = 1-r)
Either use the mvrnorm function from the MASS package, like this:
sample <- mvrnorm(1000, mu=c(0,0), matrix(c(1, r, r, 1-r^2), 2, 2))
Or, as a more general approach simulate X then simulate Y for each value of X
sample <- data.frame(X = rnorm(1000))
sample$Y <- sapply(sample$X, function(x){
rnorm(1, r*x, 1-r ^2)
})

Linear regression with constraints on the coefficients

I am trying to perform linear regression, for a model like this:
Y = aX1 + bX2 + c
So, Y ~ X1 + X2
Suppose I have the following response vector:
set.seed(1)
Y <- runif(100, -1.0, 1.0)
And the following matrix of predictors:
X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)
I want to use the following constraints on the coefficients:
a + c >= 0
c >= 0
So no constraint on b.
I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0 can be used so that all coefficients are >=0 (again, not my goal here).
Note: The solution must be able to handle NA values in the response vector Y, for example with:
Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA
solve.QP can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0 and c >= 0.
First, we can add a column of 1's to X to capture the intercept term, and then we can replicate standard linear regression with solve.QP:
X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1] 0.08614041 0.21433372 -0.13267403
With the sample data from the question, neither constraint is met using standard linear regression.
By modifying both the Amat and bvec parameters, we can add our two constraints:
solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000
Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.
You can handle missing values in Y or X2 just as the lm function does, by removing the offending observations. You might do something like the following as a pre-processing step:
has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]

Adding error variance to output of predict()

I am attempting to take a linear model fitted to empirical data, eg:
set.seed(1)
x <- seq(from = 0, to = 1, by = .01)
y <- x + .25*rnorm(101)
model <- (lm(y ~ x))
summary(model)
# R^2 is .6208
Now, what I would like to do is use the predict function (or something similar) to create, from x, a vector y of predicted values that shares the error of the original relationship between x and y. Using predict alone gives perfectly fitted values, so R^2 is 1 e.g:
y2 <- predict(model)
summary(lm(y2 ~ x))
# R^2 is 1
I know that I can use predict(model, se.fit = TRUE) to get the standard errors of the prediction, but I haven't found an option to incorporate those into the prediction itself, nor do I know exactly how to incorporate these standard errors into the predicted values to give the correct amount of error.
Hopefully someone here can point me in the right direction!
How about simulate(model) ?
set.seed(1)
x <- seq(from = 0, to = 1, by = .01)
y <- x + .25*rnorm(101)
model <- (lm(y ~ x))
y2 <- predict(model)
y3 <- simulate(model)
matplot(x,cbind(y,y2,y3),pch=1,col=1:3)
If you need to do it it by hand you could use
y4 <- rnorm(nobs(model),mean=predict(model),
sd=summary(model)$sigma)

Resources