Continuous quantiles of a scatterplot - r

I have a data set, for which I graphed a regression (using ggplot2's stat_smooth) :
ggplot(data = mydf, aes(x=time, y=pdm)) + geom_point() + stat_smooth(col="red")
I'd also like to have the quantiles (if it's simpler, having only the quartiles will do) using the same method. All I manage to get is the following :
ggplot(data = mydf, aes(x=time, y=pdm, z=surface)) + geom_point() + stat_smooth(col="red") + stat_quantile(quantiles = c(0.25,0.75))
Unfortunately, I can't put method="loess" in stat_quantile(), which, if I'm not mistaken, would solve my problem.
(In case it's not clear, desired behavior = non linear regressions for the quantiles, and therefore the regression for Q25 and Q75 being below and above (respectively) my red curve (and Q50, if plotted, would be my red curve)).
Thanks

stat_quantile is, by default, plotting best-fit lines for the 25th and 75th percentiles at each x-value. stat_quantile uses the rq function from the quantreg package (implicitly, method="rq" in the stat_quantile call). As far as I know, rq doesn't do loess regression. However, you can use other flexible functions for the quantile regression. Here are two examples:
B-Spline:
library(splines)
stat_quantile(formula=y ~ bs(x, df=4), quantiles = c(0.25,0.75))
Second-Order Polynomial:
stat_quantile(formula=y ~ poly(x, 2), quantiles = c(0.25,0.75))
stat_quantile is still using rq, but rq accepts formulas of the type listed above (if you don't supply a formula, then stat_quantile is implicitly using formula=y~x). If you use the same formula in geom_smooth as for stat_quantile, you'll have consistent regression methods being used for the quantiles and for the mean expectation.

Related

Adding fixed effects regression line to ggplot

I am plotting panel data using ggplot and I want to add the regression line for my fixed effects model "fixed" to the plot. This is the current code:
# Fixed Effects Model in plm
fixed <- plm(progenyMean ~ damMean, data=finalDT, model= "within", index = c("sireID", "cropNum"))
# Plotting Function
plotFunction <- function(Data){
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_smooth(method = "lm", se = T, formula=fixed)
}
However, the plot doesn't recognise the geom_smooth() and there is no regression line on the plot.
Is it possible to plot a regression line for a fixed effects model here?
OP. Please, include a reproducible example in your next question so that we can help you better. In this case, I'll answer using the same dataset that is used on Princeton's site here, since I'm not too familiar with the necessary data structure to support the plm() function from the package plm. I do wish the dataset could be one that is a bit more dependably going to be present... but hopefully this example remains illustrative even if the dataset is no longer available.
library(foreign)
library(plm)
library(ggplot2)
library(dplyr)
library(tidyr)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")
fixed <-plm(y ~ x1, data=Panel, index=c("country", "year"), model="within")
my_lm <- lm(y ~ x1, data=Panel) # including for some reference
Example: Plotting a Simple Linear Regression
Note that I've also referenced a standard linear model - this is to show you how you can extract the values and plot a line from that to match geom_smooth(). Here's an example plot of that data plus a line plotted with the lm() function used by geom_smooth().
plot <- Panel %>%
ggplot(aes(x1, y)) + geom_point() + theme_bw() +
geom_smooth(method="lm", alpha=0.1, color='gray', size=4)
plot
If I wanted to plot a line to match the linear regression from geom_smooth(), you can use geom_abline() and specify slope= and intercept=. You can see those come directly from our my_lm list:
> my_lm
Call:
lm(formula = y ~ x1, data = Panel)
Coefficients:
(Intercept) x1
1.524e+09 4.950e+08
Extracting those values for my_lm$coefficients gives us our slope and intercept (realizing that the named vector has intercept as the fist position and slope as the second. You'll see our new blue line runs directly over top of the geom_smooth() line - which is why I made that one so fat :).
plot + geom_abline(
slope=my_lm$coefficients[2],
intercept = my_lm$coefficients[1], color='blue')
Plotting line from plm()
The same strategy can be used to plot the line from your predictive model using plm(). Here, it's simpler, since the model from plm() seems to have an intercept of 0:
> fixed
Model Formula: y ~ x1
Coefficients:
x1
2475617827
Well, then it's pretty easy to plot in the same way:
plot + geom_abline(slope=fixed$coefficients, color='red')
In your case, I'd try this:
ggplot(Data, aes(x=damMean, y=progenyMean)) +
geom_point() +
geom_abline(slope=fixed$coefficients)

Draw fitted Exgaussian density curve in ggplot2

I have a set of estimated parameters for an Ex-gaussian curve (i.e. mu, sigma, tau).
Currently I'm creating a visualization of that distribution by simulating data based on those parameters and plotting them in ggplot.
I would rather create a visualization that is effectively a smooth fitted ex-gaussian curve - i.e. an estimated curve for data that presents with the parameters I've estimated. The goal is to not have curves with the same parameters appear differently.
Here is the current simulation approach I'm utilizing:
library(retimes)
library(ggplot2)
g <- rexgauss(1000,mu=1,sigma = 1,tau =1)
g <- as.data.frame(g); colnames(g) <- "obs"
ggplot(g) + geom_density(aes(x = obs), size=1, alpha=.4)
You can use stat_function from ggplot2. It takes a function in fun, and parameters to pass to that function in args. It works well for situations like this where you want to compare a simulation to a calculated distribution, because the x values you supply to aes will be the ones automatically used in showing the function, without you having to do any work to match them up or calculate the range of x values in your simulation.
Here's an example with retimes::rexgauss. I also simplified your data frame creation, and put the parameters in a vector so you can use them in both the simulation and the calculated function.
My laptop is too slow to do all 1000 observations, so yours is probably smoother and closer to the calculated distribution than mine.
library(ggplot2)
exgauss_params <- c(mu = 1, sigma = 1, tau = 1)
exgauss_sim <- data.frame(obs = retimes::rexgauss(n = 100, exgauss_params))
ggplot(exgauss_sim, aes(x = obs)) +
geom_density(aes(color = "simulated")) +
stat_function(aes(color = "calculated"),
fun = retimes::dexgauss, args = exgauss_params)
Created on 2018-05-18 by the reprex package (v0.2.0).

Plot standard error bars from mixed model using ggplot2

I want to plot (using ggplot2) linear mixed effects model (lmer function from lme4) together with error bars representing standard errors. Here is the model:
m1 <- lmer(repinterv ~ prevoutc * outcome * prevtask + (1|id), p1)
Repinterv is a continuous dependent variable, while three factors are binary, within-subjects. Each line of the data frame is a single experimental trial.
While I have a working line to make a fit for each effect and interaction, I'm really strugling with error bars.
p1$fit = model.matrix(m1) %% fixef (m1) # fits
p1$fitse = model.matrix(m1) %% coef(summary(m1))[,2] # standard errors
The first line here calculates the fitted values for each level of the model. I tried to use it for standard errors from model's summary, but the problem is that while fixed effects are presented as the relative difference from the intercept, SE are the actual values (as I understand it). If I use this method then I get summed standard errors for each fit, instead of the actual value from coef(summary(m1)).
ggplot(p1, aes(x = outcome, y = fit, fill = prevoutc)) + # grouped bar plot
facet_wrap(~ prevtask, labeller = gridlab) +
stat_summary(fun.y=mean,geom="bar", position=position_dodge(0.9)) +
geom_errorbar(aes(ymin=fit-fitse, ymax=fit+fitse), width = 0.1, size =0.5, position = position_dodge(0.9))
Can you please tip me if I should use some other operator or a different method to subtract SE for this model?
Edit:
Below are the coefficients of my model. I want to plot estimates and corresponding standard errors.
Estimate Std. Error t value
(Intercept) 335.69881 16.190304 20.734558
prevoutc1 10.74602 7.143445 1.504318
outcome1 37.36665 8.471898 4.410659
prevtask1 12.92135 7.330930 1.762580
prevoutc1:outcome1 -14.39956 9.338283 -1.541992
prevoutc1:prevtask1 17.37322 10.491121 1.655993
outcome1:prevtask1 -29.37134 9.957079 -2.949795
prevoutc1:outcome1:prevtask1 14.75692 13.539756 1.089896
And that's the plot I currently have:

Different slope in 'regression' between ggplot (by suing geom_smooth(method = "lm") , and lm -function

I am using a data-set (Panel).
With this data-set I conduct the following:
1)
ols <-lm(CapNormChange ~ Policychanges, data=Panel) summary(ols)
plot(Panel$CapNormChange, Panel$Policychanges,
pch=19, xlab="CapNormChange", ylab="Policychanges")
abline(lm(Panel$CapNormChange~Panel$Policychanges),lwd=3, col="blue")
and 2)
p2 <- ggplot(data = Panel, mapping = aes(x = CapNormChange, y = Policychanges))
p2 + geom_point(alpha=0.3) + geom_smooth(method = "lm", se=F, color="orange")
I thought that the slopes of the lines of germ-smooth and and abline of the first plot are the same, and also correspond to the parameter of the dependent variable (Policychanges) in the OLS regression.
However, this is not the case ! Instead the ggplot, has a higher intercept (I tested it for different dataset). I really don't understand this, could please somebody give some advice?
In 1) you use CapNormChange as y-variable and Policychanges as x-variable. It's always y ~ x. This doesn't match what you do in the plot command. In 2) you do it the other way around.
OLS regression assumes that only y-values have associated errors. Thus, swapping x and y changes the fit. If you want the same results from both, you'd need orthogonal regression.

Getting vectors out of ggplot2

I am trying to show that there is a wierd "bump" in some data I am analysing (it is to do with market share. My code is here:-
qplot(Share, Rate, data = Dataset3, geom=c("point", "smooth"))
(I appreciate that this is not very useful code without the dataset).
Is there anyway that I can get the numeric vector used to generate the smoothed line out of R? I just need that layer to try to fit a model to the smoothed data.
Any help gratefully received.
Yes, there is. ggplot uses the function loess as the default smoother in geom_smooth. this means you can use loess directly to estimate your smoothing parameters.
Here is an example, adapted from ?loess :
qplot(speed, dist, data=cars, geom="smooth")
Use loess to estimate the smoothed data, and predict for the estimated values::
cars.lo <- loess(dist ~ speed, cars)
pc <- predict(cars.lo, data.frame(speed = seq(4, 25, 1)), se = TRUE)
The estimates are now in pc$fit and the standard error in pc$fit.se. The following bit of code extraxts the fitted values into a data.frame and then plots it using ggplot :
pc_df <- data.frame(
x=4:25,
fit=pc$fit)
ggplot(pc_df, aes(x=x, y=fit)) + geom_line()

Resources