Loess regression goes to negative values - r

I'm currently trying to fit a loess regression to my dataset (latitudinal distribution of biomasses). I used the following code:
ggplot(data=test)+
geom_point(aes(y=log10(value+1), x=lat, colour=variable), alpha=0.5)+
stat_smooth(aes(y=log10(value+1), x=lat, colour=variable, fill=variable), size=1, alpha=0.1)+
scale_y_continuous("Depth-integrated biomass (mgC.m-2)")+
scale_x_continuous("Latitude", limits=c(-70, 80), breaks=seq(-70, 80, 10))+
coord_flip()+
theme_bw()+
theme(legend.background = element_rect(colour = "black"))
The problem is that the regression goes below 0 while I have no values below 0...
Is there a way to force the regression not to cross 0 ?
I try changing the "span" value, it's better but some part of the loess curve still goes negative. Xlim=c(0, X) was not good since it cut the curves..
Thanks.

The loess methods assume an unbounded distribution, so can easily go below 0 if you have data near 0. One option in to work on the log scale (fit the model to the log of the y-values, then exponentiate the predicted values for plotting, etc.)

Why would you set xlim if you want to restrict the y-values? Either way, though, xlim and ylim are only used to filter the underlying dataset and so that won't solve your problem. An alternative way to avoid 0 values would be to use a different model: a linear regression shouldn't interpolate negative values if all observed values are positive. Or, maybe something like a logistic regression would be appropriate for your data?
Adding these types of fits to the data is actually pretty easy, just add method = glm and and family = binomial, for example, inside stat_smooth.

Related

How to easily show the equation behind ggplot's geom_smooth

Is there any simple command to show the geom_smooth equation of a non-linear relationship? Something as simple as "show. equation". The equation has to be somewhere, I just want to call the equation used by default.
ggplot(dataset, aes(x=variablex, y=variabley)) +
geom_point()+
geom_smooth()+
theme_bw()
If you look at the documentation for geom_smooth and stat_smooth you can see that it uses stats::loess for small data sets (1,000 observations) and mgcv::gam otherwise:
For method = NULL the smoothing method is chosen based on the size of
the largest group (across all panels). stats::loess() is used for less
than 1,000 observations; otherwise mgcv::gam() is used with formula = y ~ s(x, bs = "cs") with method = "REML". Somewhat anecdotally, loess
gives a better appearance, but is 𝑂(𝑁2) in memory, so does not work
for larger datasets.
So if you want to use the model implied by the geom_smooth fit, you could just call the underlying method (e.g. stats::loess(variabley ~ variablex, data = dataset)) and then use the predict method to calculate values for new data.

How to plot logistic regression with categorical variables as independent variables

As dependent variables, I have a data frame with 0s and 1s (using certain product or not). As independent variables, I have a set of data frames with categorical variables (living in brick house, etc.). I plot logistic regression using ggplot:
g <- ggplot(decision, aes(x=decision_point, y=use)) + geom_point(alpha=.1, size=2, col="red") +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
aes(x = as.numeric(decision_point)),
se = F)
What happens is that it plots a straight line. It seems the categorical variables are turned into numeric (as I wrote) and it just goes through it.
But if I don't use as.numeric, no line shows at all.
What can I do? The line should be a curve. So if independent variables were incremental numeric values, like 0-100, then plotting a curve would be easy. But they are categorical variables, like "Brick House" "Hut", "Others". Hence the problem. Thank you in advance.

Set y-axis for glm probit regression in effect plot

I am currently computing binomial probit glm in R.
For analysis of interaction effects, I use the effects package. I want to plot different interactions, where one of the interacting variables is held constant at a fixed level. I do this for several values of the variable to see how the effect evolves.
I use the following operation:
plot(effect("varL",hx1,given.values=c("varP"=0.7)))
plot(effect("varL",hx1,given.values=c("varP"=0.1)))
However, to graphically compare the different plots, the y-axis should be the same for all plots, which is not the case. When plotting for different varP values, the y axis changes its range.
When specifying ylim, the plot is also incorrect and shows a completely different segment than specified.
I tried what was recommended in this post (Scaling axis in a GLM plot with library "effects"), however, it resulted in an error message:
plot(effect("varL",hx1,given.values=c("varP"=0.7)), ylim = qlogis(c(0, 0.20)))
Error in valid.viewport(x, y, width, height, just, gp, clip, xscale, yscale, :
invalid 'yscale' in viewport
Now my question: how can I set the y-axis for plotting interaction effects with the effect package using a probit glm model? I am sure the problem is that ylim takes the values as specified without adjusting them into the logit and probit scale. qlogis likely works for logit, but not probit.
Below some code to replicate the issue. You see that the y axis "jumps around", which I want to avoid.
install.packages("effects")
require(effects)
varL <- rnorm(100, mean = 1000, sd = 10)
varP <- rnorm(100, mean = 5)
entry <- as.factor(sample(0:1, 100, replace = TRUE))
dat <- data.frame(varL, varP, entry)
hx1 <- glm(entry ~ varL*varP, data = dat, family = binomial(link = "probit"))
plot(effect("varL",hx1,given.values=c("varP"=min(dat$varP))))
plot(effect("varL",hx1,given.values=c("varP"=max(dat$varP))))
Here are the plots with the "jumping" y-axes:
I had a similar problem with logistic regression,actually, and I used the advice Change the y axis on Effect plot in R
Basically all that I needed to do is add the argument rescale.axis=F in addition to ylim=c(0,1)

how to plot estimates through model in R

I'm trying to use R to do some modelling, I've started to use BodyWeight library, since I've seen some examples online. Just to understand and get used to the commands.
I've come to my final model, with estimates and I was wondering how to plot these estimates, but I haven't seen anything online..
Is there a way to plot the values of the estimates with a line, and dots for the values of each observation?
Where can I find information about how to do this, do I have to extract the values myself or it is possible to say plot the estimates of these model?
I'm only starting with R. Any help is welcome.
Thank you
There is no function that just plots the output of a model, since there are usually many different possible ways of plotting the output.
Take a look at the predict function for whatever model type you are using (for example, linear regressions using lm have a predict.lm function).
Then choose a plotting system (you will likely want different panels for different levels of diet, so use either ggplot2 or lattice). Then see if you can describe more clearly in words how you want the plot to look. Then update your question if you get stuck.
Now we've identified which dataset you are using, here's a possible plot:
#Run your model
model <- lme(weight ~ Time + Diet, BodyWeight, ~ 1 | Rat)
summary(model)
#Predict the values
#predict.lme is a pain because you have to specify which rat
#you are interested in, but we don't want that
#manually predicting things instead
times <- seq.int(0, 65, 0.1)
mcf <- model$coefficients$fixed
predicted <-
mcf["(Intercept)"] +
rep.int(mcf["Time"] * times, nlevels(BodyWeight$Diet)) +
rep(c(0, mcf["Diet2"], mcf["Diet3"]), each = length(times))
prediction_data <- data.frame(
weight = predicted,
Time = rep.int(times, nlevels(BodyWeight$Diet)),
Diet = rep(levels(BodyWeight$Diet), each = length(times))
)
#Draw the plot (using ggplot2)
(p <- ggplot(BodyWeight, aes(Time, weight, colour = Diet)) +
geom_point() +
geom_line(data = prediction_data)
)

How can I superimpose modified loess lines on a ggplot2 qplot?

Background
Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-)
My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that:
The residuals don't systematically covary with the predictor
The residuals are homoscedastic with respect to each predictor in the model
On point (2), my textbook has this to say:
Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131)
How can I modify loess lines?
I know how to generate a scatterplot with a "0-line,":
# First, I'll make a simple linear model and get its diagnostic stats
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)
# Now I want to make sure the residuals are homoscedastic
qplot (x = dist, y = .resid, data = mod) +
geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands
But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?
Apology
Folks, I want to apologize for my ignorance. Hadley is absolutely right, and the answer was right in front of me all along. As I suspected, my question was born of statistical, rather than programmatic ignorance.
We get the 68% Confidence Interval for Free
geom_smooth() defaults to loess smoothing, and it superimposes the +1sd and -1sd lines as part of the deal. That's what Hadley meant when he said "Isn't that just a 68% confidence interval?" I just completely forgot that's what the 68% interval is, and kept searching for something that I already knew how to do. It didn't help that I'd actually turned the confidence intervals off in my code by specifying geom_smooth(se = FALSE).
What my Sample Code Should Have Looked Like
# First, I'll make a simple linear model and get its diagnostic stats.
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)
# Now I want to make sure the residuals are homoscedastic.
# By default, geom_smooth is loess and includes the 68% standard error bands.
qplot (x = dist, y = .resid, data = mod) +
geom_abline(slope = 0, intercept = 0) +
geom_smooth()
What I've Learned
Hadley implemented a really beautiful and simple way to get what I'd wanted all along. But because I was focused on loess lines, I lost sight of the fact that the 68% confidence interval was bounded by the very lines I needed. Sorry for the trouble, everyone.
Could you calculate the +/- standard deviation values from the data and add a fitted curve of them to the plot?
Have a look at my question "modify lm or loess function.."
I am not sure I followed your question very well, but maybe a:
+ stat_smooth(method=yourfunction)
will work, provided that you define your function as described here.

Resources