Different colours in ggplot based on geom_smooth - r

I created a ggplot with linear geom_smooth now i would like to have the points, from the geom_point to have a different colour below and above the linear smooth line.
I know I can add the color to the point by doing geom_point(aes(x, y, colour = z)). My problem is how to determine if a point in the plot is below or above the linear line.
Can ggplot2 do this or do have to create a new column in the data frame first?
Below is the sample code with geom_smooth but without the different colours above and below the line.
Any help is appreciated.
library(ggplot2)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
ggplot(df, aes(x,y)) +
geom_point() +
geom_smooth(method = "lm")

I believe ggplot2 can't do this for you. As you say, you could create a new variable in df to make the colouring. You can do so, based on the residuals of the linear model.
For example:
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm")
Note that the colour argument has to be passed to geom_point(), otherwise geom_smooth() will produce a fit to each group separately.
Result:

Related

ggplot2 geom_qq change theoretical data

I have a set of pvalues i.e 0<=pval<=1
I want to plot qqplot using ggplot2
As in the documentation the following code will plot a q_q plot, however if my data are pvalues I want the therotical values to be also probabilites ie. 0<=therotical v<=1
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
I am aware of the qqplot.pvalues from gaston package it does the job but the plot is not as customizable as the ggplot version.
In gaston package the theoretical data are plotted as -log10((n:1)/(n + 1)) where n is number of pvalues. How to pass these values to ggplot as theoritical data?
Assuming you have some p-values, say from a normal distribution you could create it manually
library(ggplot2)
data <- data.frame(outcome = rnorm(150))
data$pval <- pnorm(data$outcome)
data <- data[order(data$pval),]
ggplot(data = data, aes(y = pval, x = pnorm(qnorm(ppoints(nrow(data)))))) +
geom_point() +
geom_abline(slope = 1) +
labs(x = 'theoraetical p-val', y = 'observed p-val', title = 'qqplot (pval-scale)')
Although I am not sure this plot is sensible to use for conclusions.

Adding multiple points to a ggplot ecdf plot

I'm trying to generate a ggplot only C.D.F. plot for some of my data. I am also looking to be able to plot an arbitrary number of percentiles as points on top. I have a solution that works for adding a single point to my curve but fails for multiple values.
This works for plotting one percentile value
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.5)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
However this fails
TestDf <- as.data.frame(rnorm(1000))
names(TestDf) <- c("Values")
percentiles <- c(0.25,0.5,0.75)
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(aes(x = quantile(TestDf$Values, percentiles),
y = percentiles))
With error
Error: Aesthetics must be either length 1 or the same as the data (1000): x, y
How can I add an arbitrary number of points to a stat_ecdf() plot?
You need to define a new dataset, outside of the aesthetics. aes refers to the original dataframe that you used for making the CDF (in the original ggplot argument).
ggplot(data = TestDf, aes(x = Values)) +
stat_ecdf() +
geom_point(data = data.frame(x=quantile(TestDf$Values, percentiles),
y=percentiles), aes(x=x, y=y))

predict x values from simple fitting and annoting it in the plot

I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)

Add regression lines from predictive values in ggplot

I've learnt to do this type of plots with r, and add this regression lines predicted from a model.
## Predict values of the model##
p11=predict(model.coh1, data.frame(COH=coh1, espajpe=1:4))
p12=predict(model.coh1, data.frame(COH=coh2, espaje=1:4))
p11
1 2 3 4
1.996689 2.419994 2.843298 3.266602
p12
1 2 3 4
1.940247 2.414299 2.888351 3.362403
##PLOT##
plot(espapli~espaje, mydata)
lines(1:4,p11, col="red")
lines(1:4,p12, col="green")
Now, I would like to do something similar using ggplot, is that possible? That is, introducing a regression line for these particular values.
#gennaroTedesco gives an answer using the built in smoothing method. I'm not sure that follows the OP. You can do this via geom_line
# example data
set.seed(2125)
x <- rnorm(100)
y <- 1 + 2.5 *x + rnorm(100, sd= 0.5)
lm1 <- lm(y~x)
x2 <- rnorm(100)
p1 <- predict(lm1, data.frame(x= x2), interval= "c")
library(ggplot2)
df <- data.frame(x= x2, yhat= p1[,1], lw= p1[,2], up= p1[,3])
# plot just the fitted points
ggplot(df, aes(x= x, y= yhat)) + geom_line()
# also plot the confidence interval
ggplot(df, aes(x= x, y= yhat)) + geom_line() +
geom_line(aes(x= x, y= up, colour= "red")) +
geom_line(aes(x= x, y= lw, colour= "red")) +
theme(legend.position= "none")
# only the last plot is shown
As a general rule regression lines can be added to ggplot making use of the function geom_smooth. Please see full documentation here. If the values to be fitted are the same ones used in the general aesthetic, then
p <- ggplot(data, aes(x = x, y = y)
p <- p + geom_smooth(method = 'lm')
does the job. Otherwise you need to fully specify the set of data and the model in the geom_smooth aesthetics.

Statistical Model Representation with ggplot2

I will ask my question with a study case and then I'll make my question more general.
Let's first import some libraries and create some data:
require(visreg)
require(ggplot2)
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=c(rep(1,40), rep(2,20), rep(3,5))
dt=data.frame(x=x, y=y)
and run a linear regression of y on x and graph the data and the model with ggplot2
m1 = lm(y~x, data=dt)
ggplot(dt, aes(x,y)) + geom_point() + geom_smooth(formula = y~x, method="anova", data=dt)
Now I would like to consider my xvariable as a nominal variable. So I slightly change my data and run the following model.
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=factor(c(rep(1,40), rep(2,20), rep(3,5))) # this line has changed!
dt=data.frame(x=x, y=y)
m2 = lm(y~x, data=dt)
How can I plot this model m2 with ggplot2? And more globally how can I directly tell ggplot to consider the object m2 in order to create representation of the model?
What I aim to do is the kind of things that can be done using the visreg package
visreg(m2)
So, is there any visreg-like solution for ggplot? something like
ggplot(..,aes(..)) + super_geom_smooth(model = m2)
This is not much different from #rnso's idea. geom_jitter() adds more flavour. I also change the colour of median bar. Hope this helps you!
ggplot(data = m2$model, aes(x = x, y = y)) +
geom_boxplot(fill = "gray90") +
geom_jitter() +
theme_bw() +
stat_summary(geom = "crossbar", width = 0.65, fatten = 0, color = "blue",
fun.data = function(x){return(c(y=median(x), ymin=median(x), ymax=median(x)))})
Following using boxplot is very similar to your desired graph:
ggplot(dt, aes(x,y))+ geom_boxplot(aes(group=x), alpha=0.5)+ geom_jitter()
Just FYI, visreg can now output a gg object:
visreg(m2, gg=TRUE)

Resources