How to find min max from lm - r

I'm trying to figure out a way to find the minimum/maximum from a fitted quadratic model. In this case the minimum.
x.lm <- lm(Y ~ X + I(X^2))
Edit: To clarify, I can already find the minimum y through min(predict(x.lm)). How can I translate this to it's corresponding x value.

Check this out. Idea is that you have to take fitted values form x.lm fit
#example data
X <- 1:100
Y <- 1:100 + rnorm(n = 100, mean = 0, sd = 4)
x.lm <- lm(Y ~ X + I(X^2))
fits <- x.lm$fitted.values #getting fits, you can take residuals,
# and other parameters too
# I guess you are looking for this.
min.fit = min(fits)
max.fit = max(fits)
After another question
df <- cbind(X, Y, fits)
df <- as.data.frame(df)
index <- which.min(df$fits) #very usefull command
row.in.df <- df[index,]

Related

How to predict gam model with random effect in R?

I am working on predicting gam model with random effect to produce 3D surface plot by plot_ly.
Here is my code;
x <- runif(100)
y <- runif(100)
z <- x^2 + y + rnorm(100)
r <- rep(1,times=100) # random effect
r[51:100] <- 2 # replace 1 into 2, making two groups
df <- data.frame(x, y, z, r)
gam_fit <- gam(z ~ s(x) + s(y) + s(r,bs="re"), data = df) # fit
#create matrix data for `add_surface` function in `plot_ly`
newx <- seq(0, 1, len=20)
newy <- seq(0, 1, len=30)
newxy <- expand.grid(x = newx, y = newy)
z <- matrix(predict(gam_fit, newdata = newxy), 20, 30) # predict data as matrix
However, the last line results in error;
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
variable lengths differ (found for 'r')
In addition: Warning message:
In predict.gam(gam_fit, newdata = newxy) :
not all required variables have been supplied in newdata!
Thanks to the previous answer, I am sure that above codes work without random effect, as in here.
How can I predict gam models with random effect?
Assuming you want the surface conditional upon the random effects (but not for a specific level of the random effect), there are two ways.
The first is to provide a level for the random effect but exclude that term from the predicted values using the exclude argument to predict.gam(). The second is to again use exclude but this time to not provide any data for the random effect and instead stop predict.gam() from checking the newdata using the argument newdata.guaranteed = TRUE.
Option 1:
newxy1 <- with(df, expand.grid(x = newx, y = newy, r = 2))
z1 <- predict(gam_fit, newdata = newxy1, exclude = 's(r)')
z1 <- matrix(z1, 20, 30)
Option 2:
z2 <- predict(gam_fit, newdata = newxy, exclude = 's(r)',
newdata.guaranteed=TRUE)
z2 <- matrix(z2, 20, 30)
These produce the same result:
> all.equal(z1, z2)
[1] TRUE
A couple of notes:
Which you use will depend on how complex the rest of you model is. I would generally use the first option as it provides an extra check against me doing something stupid when creating the data. But in this instance, with a simple model and set of covariates it seems safe enough to trust that newdata is OK.
Your example uses a random slope (was that intended?), not a random intercept as r is not a factor. If your real example uses a factor random effect then you'll need to be a little more careful when creating the newdata as you need to get the levels of the factor right. For example:
expand.grid(x = newx, y = newy,
r = with(df, factor(2, levels = levels(r))))
should get the right set-up for a factor r

Simulating conditional distribution in R

I have a question about simulating conditional distribution.
Suppose
X ~ N(0,1)
Y ~ N(rX, 1-r^2)
I want to simulate Y distribution which is conditioning on X.
The r in here is the correlation, and it can be changed for purpose.
The X distribution code would be as follows;
sd.x <- 1
mean.x <- 0
z2 <- rnorm(1000)
x <- sd.x*z2 + mean.x
But, I have no idea about simulating Y distribution.
I'll be appreciate with help.
It seems you are in the case of a linear regression ...
You can write Y = rX + epsilon, where epsilon folows N(0,(1-r)^2).
You can chek that Y has the properties you are looking for ..
So, in r, to complete your code, somthing like this should be enough :
r <- 0.8
y <- r*x + rnorm(1000, mean = 0, sd = 1-r)
Either use the mvrnorm function from the MASS package, like this:
sample <- mvrnorm(1000, mu=c(0,0), matrix(c(1, r, r, 1-r^2), 2, 2))
Or, as a more general approach simulate X then simulate Y for each value of X
sample <- data.frame(X = rnorm(1000))
sample$Y <- sapply(sample$X, function(x){
rnorm(1, r*x, 1-r ^2)
})

R language, nonlinear model formula predict

I fit an exponential formula with a set of data (x, y). then I want to calculate the y values from the formula with x values beyond the actual data set. It does't work, always prints the y values for the actual x values. Here is the code. What have I done wrong? What's the solution for my task with R language:
data <- data.frame(x=seq(1,69), y=othertable[1:69, 2])
nlsxypw <- nls(data$y ~ a*data$x^b, col2_60, start=list(a=2200000, b=0))
predict(nlsxypw)
#here I want to calculate the y values for x = 70-80
xnew <- seq(70, 80, 1)
predict(nlsxypw, xnew)
#it doesn't print these values, still the actual values for x=1~69.
This is kind of a strange feature with predict.nls (possibly other predict methods as well?), but you have to supply the new data with the same name that your model was defined in terms of:
set.seed(123)
Data <- data.frame(
x = 1:69,
y = ((1:69)**2)+rnorm(69,0,5))
nlsxypw <- nls(y ~ a*(x^b),
data=Data,
start=list(a=2.5, b=1))
##
xnew <- 70:80
## note how newdata is specified
y.pred <- predict(nlsxypw, newdata=list(x=xnew))
> y.pred
[1] 4900.355 5041.359 5184.364 5329.368 5476.373 5625.377 5776.381 5929.386 6084.390 6241.393 6400.397
##
with(
Data,
plot(x,y,pch=20,
xlim=c(0,90),
ylim=c(0,6700)))
lines(fitted(nlsxypw),col="red")
points(
x=xnew,
y=y.pred,
pch=20,
col="blue")
##

Adding error variance to output of predict()

I am attempting to take a linear model fitted to empirical data, eg:
set.seed(1)
x <- seq(from = 0, to = 1, by = .01)
y <- x + .25*rnorm(101)
model <- (lm(y ~ x))
summary(model)
# R^2 is .6208
Now, what I would like to do is use the predict function (or something similar) to create, from x, a vector y of predicted values that shares the error of the original relationship between x and y. Using predict alone gives perfectly fitted values, so R^2 is 1 e.g:
y2 <- predict(model)
summary(lm(y2 ~ x))
# R^2 is 1
I know that I can use predict(model, se.fit = TRUE) to get the standard errors of the prediction, but I haven't found an option to incorporate those into the prediction itself, nor do I know exactly how to incorporate these standard errors into the predicted values to give the correct amount of error.
Hopefully someone here can point me in the right direction!
How about simulate(model) ?
set.seed(1)
x <- seq(from = 0, to = 1, by = .01)
y <- x + .25*rnorm(101)
model <- (lm(y ~ x))
y2 <- predict(model)
y3 <- simulate(model)
matplot(x,cbind(y,y2,y3),pch=1,col=1:3)
If you need to do it it by hand you could use
y4 <- rnorm(nobs(model),mean=predict(model),
sd=summary(model)$sigma)

How to generate random Y at specific X from a linear model in R?

Say we have a linear model f1 that was fit to some x and y data points:
f1 <- lm(y ~ x,data=d)
How can I generate new y values at new x values (that are different from the old x values but are within the range of the old x values) using this f1 fit in R?
stats:::simulate.lm allows you to sample from a linear model fitted with lm. (In contrast to the approach of #Bulat this uses unbiased estimates of the residual variance). To simulate at different values of the independent variable, you could hack around like this:
# simulate example data
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
# fit linear model
mod <- lm(y ~ x, data = df)
# new values of the independent variable
x_new <- 1:100
# replaces fitted values of the model object with predictions for new data,
mod$fitted.values <- predict(mod, data.frame(x=x_new)) # "hack"
# simulate samples appropriate noise and adds it the models `fitted.values`
y_new <- simulate(mod)[, 1] # simulate can return multiple samples (as columns), we only need one
# visualize original data ...
plot(df)
# ... alongside simulated data at new values of the independent variable (x)
points(x_new, y_new, col="red")
(original data in black, simulated in red)
I am looking at the same problem.
In simple terms it can be done by using sample from residuals:
mod <- lm(y ~ x, data = df)
x_new <- c(5) # value that you need to simulate for.
pred <- predict(mod, newdata=data.frame(x = x_new))
err <- sample(mod$residuals, 1)
y <- pred + err
There is a simulate(fit, nsim = 10, XX = x_new) function, that is supposed to do it for you.
You can use predict for this:
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
df
plot(df)
mod <- lm(y ~ x, data = df)
x_new <- 1:100
pred <- predict(mod, newdata=data.frame(x = x_new))
plot(df)
points(x_new, pred)

Resources