I will ask my question with a study case and then I'll make my question more general.
Let's first import some libraries and create some data:
require(visreg)
require(ggplot2)
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=c(rep(1,40), rep(2,20), rep(3,5))
dt=data.frame(x=x, y=y)
and run a linear regression of y on x and graph the data and the model with ggplot2
m1 = lm(y~x, data=dt)
ggplot(dt, aes(x,y)) + geom_point() + geom_smooth(formula = y~x, method="anova", data=dt)
Now I would like to consider my xvariable as a nominal variable. So I slightly change my data and run the following model.
y = c(rnorm(40,10,1), rnorm(20,11,1), rnorm(5,12,1))
x=factor(c(rep(1,40), rep(2,20), rep(3,5))) # this line has changed!
dt=data.frame(x=x, y=y)
m2 = lm(y~x, data=dt)
How can I plot this model m2 with ggplot2? And more globally how can I directly tell ggplot to consider the object m2 in order to create representation of the model?
What I aim to do is the kind of things that can be done using the visreg package
visreg(m2)
So, is there any visreg-like solution for ggplot? something like
ggplot(..,aes(..)) + super_geom_smooth(model = m2)
This is not much different from #rnso's idea. geom_jitter() adds more flavour. I also change the colour of median bar. Hope this helps you!
ggplot(data = m2$model, aes(x = x, y = y)) +
geom_boxplot(fill = "gray90") +
geom_jitter() +
theme_bw() +
stat_summary(geom = "crossbar", width = 0.65, fatten = 0, color = "blue",
fun.data = function(x){return(c(y=median(x), ymin=median(x), ymax=median(x)))})
Following using boxplot is very similar to your desired graph:
ggplot(dt, aes(x,y))+ geom_boxplot(aes(group=x), alpha=0.5)+ geom_jitter()
Just FYI, visreg can now output a gg object:
visreg(m2, gg=TRUE)
Related
I have a set of pvalues i.e 0<=pval<=1
I want to plot qqplot using ggplot2
As in the documentation the following code will plot a q_q plot, however if my data are pvalues I want the therotical values to be also probabilites ie. 0<=therotical v<=1
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
I am aware of the qqplot.pvalues from gaston package it does the job but the plot is not as customizable as the ggplot version.
In gaston package the theoretical data are plotted as -log10((n:1)/(n + 1)) where n is number of pvalues. How to pass these values to ggplot as theoritical data?
Assuming you have some p-values, say from a normal distribution you could create it manually
library(ggplot2)
data <- data.frame(outcome = rnorm(150))
data$pval <- pnorm(data$outcome)
data <- data[order(data$pval),]
ggplot(data = data, aes(y = pval, x = pnorm(qnorm(ppoints(nrow(data)))))) +
geom_point() +
geom_abline(slope = 1) +
labs(x = 'theoraetical p-val', y = 'observed p-val', title = 'qqplot (pval-scale)')
Although I am not sure this plot is sensible to use for conclusions.
For descriptive plots in R studio, I would like to fit a regression curve in my spaghetti plot. To create the spaghetti plot I used:
library(lattice)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient <- data_head$Patient
Eye <-data_head$Eye
xyplot(GCIP~time_since_on, groups = Patient, type='b', data=data_head)
and I've got this plot
Then I wanted to fit a polynomial curve, so I used this code:
plot.new<- plot(time_since_on,GCIP)
lines(lowess(GCIP ~ time_since_on))
This is what I've got:
What I want is to fit a curve like the one I've got in the image 2 but over the spaghetti plot (with the longitudinal data for each subject).
I've tried to use this code:
library(ggplot2)
library(reshape2)
GCIP <- data_head$GCIP
time_since_on <- data_head$time_since_on
Patient.ID <- data_head$Patient.ID
Eye <-data_head$Eye
Visit <-data_head$Visit
Patient<-data_head$Patient
ggplot(data = reprex, aes(x,y)) +
geom_point(alpha=1, size=2) +
aes(colour=Patient.ID) +
geom_text(aes(label=label), size=2, colour='white') +
geom_path(aes(group=Patient.ID))
ggplot(data= reprex, aes(x = time_since_on, y = GCIP)) +
geom_point(size = 2, alpha= 1, aes(color = Patient.ID)) + #colour points by group
geom_path(aes(group = Patient.ID)) + #spaghetti plot
stat_smooth(method = "lm", formula = y ~ x, aes(group = Patient.ID, colour = group)) + #line of best fit by group
ylab("GCIP (volume)") + xlab("time_since_on (months)") +
theme_bw()
But I don't get anything from this.
COuld anyone help me please?
Here an example taken from the internet
Million Thanks.
Lili
I've learnt to do this type of plots with r, and add this regression lines predicted from a model.
## Predict values of the model##
p11=predict(model.coh1, data.frame(COH=coh1, espajpe=1:4))
p12=predict(model.coh1, data.frame(COH=coh2, espaje=1:4))
p11
1 2 3 4
1.996689 2.419994 2.843298 3.266602
p12
1 2 3 4
1.940247 2.414299 2.888351 3.362403
##PLOT##
plot(espapli~espaje, mydata)
lines(1:4,p11, col="red")
lines(1:4,p12, col="green")
Now, I would like to do something similar using ggplot, is that possible? That is, introducing a regression line for these particular values.
#gennaroTedesco gives an answer using the built in smoothing method. I'm not sure that follows the OP. You can do this via geom_line
# example data
set.seed(2125)
x <- rnorm(100)
y <- 1 + 2.5 *x + rnorm(100, sd= 0.5)
lm1 <- lm(y~x)
x2 <- rnorm(100)
p1 <- predict(lm1, data.frame(x= x2), interval= "c")
library(ggplot2)
df <- data.frame(x= x2, yhat= p1[,1], lw= p1[,2], up= p1[,3])
# plot just the fitted points
ggplot(df, aes(x= x, y= yhat)) + geom_line()
# also plot the confidence interval
ggplot(df, aes(x= x, y= yhat)) + geom_line() +
geom_line(aes(x= x, y= up, colour= "red")) +
geom_line(aes(x= x, y= lw, colour= "red")) +
theme(legend.position= "none")
# only the last plot is shown
As a general rule regression lines can be added to ggplot making use of the function geom_smooth. Please see full documentation here. If the values to be fitted are the same ones used in the general aesthetic, then
p <- ggplot(data, aes(x = x, y = y)
p <- p + geom_smooth(method = 'lm')
does the job. Otherwise you need to fully specify the set of data and the model in the geom_smooth aesthetics.
I created a ggplot with linear geom_smooth now i would like to have the points, from the geom_point to have a different colour below and above the linear smooth line.
I know I can add the color to the point by doing geom_point(aes(x, y, colour = z)). My problem is how to determine if a point in the plot is below or above the linear line.
Can ggplot2 do this or do have to create a new column in the data frame first?
Below is the sample code with geom_smooth but without the different colours above and below the line.
Any help is appreciated.
library(ggplot2)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
ggplot(df, aes(x,y)) +
geom_point() +
geom_smooth(method = "lm")
I believe ggplot2 can't do this for you. As you say, you could create a new variable in df to make the colouring. You can do so, based on the residuals of the linear model.
For example:
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm")
Note that the colour argument has to be passed to geom_point(), otherwise geom_smooth() will produce a fit to each group separately.
Result:
I have a dataset that looks a little like this:
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5))
ggplot(a, aes(x=x,y=y)) + geom_point() +geom_smooth()
I want the same output as that plot, but instead of smooth curve, I just want to take line segments between the mean/sd values for each set of x values. The graph should look similar to the above graph, but jagged, instead of curved.
I tried this, but it fails, even though the x values aren't unique:
ggplot(a, aes(x=x,y=y)) + geom_point() +stat_smooth(aes(group=x, y=y, x=x))
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
?stat_summary is what you should look at.
Here is an example
# functions to calculate the upper and lower CI bounds
uci <- function(y,.alpha){mean(y) + qnorm(abs(.alpha)/2) * sd(y)}
lci <- function(y,.alpha){mean(y) - qnorm(abs(.alpha)/2) * sd(y)}
ggplot(a, aes(x=x,y=y)) + stat_summary(fun.y = mean, geom = 'line', colour = 'blue') +
stat_summary(fun.y = mean, geom = 'ribbon',fun.ymax = uci, fun.ymin = lci, .alpha = 0.05, alpha = 0.5)
You can use one of the built-in summary functions mean_sdl. The code is shown below
ggplot(a, aes(x=x,y=y)) +
stat_summary(fun.y = 'mean', colour = 'blue', geom = 'line')
stat_summary(fun.data = 'mean_sdl', geom = 'ribbon', alpha = 0.2)
Using ggplot2 0.9.3.1, the following did the trick for me:
ggplot(a, aes(x=x,y=y)) + geom_point() +
stat_summary(fun.data = 'mean_sdl', mult = 1, geom = 'smooth')
The 'mean_sdl' is an implementation of the Hmisc package's function 'smean.sdl' and the mult-variable gives how many standard deviations (above and below the mean) are displayed.
For detailed info on the original function:
library('Hmisc')
?smean.sdl
You could try writing a summary function as suggested by Hadley Wickham on the website for ggplot2: http://had.co.nz/ggplot2/stat_summary.html. Applying his suggestion to your code:
p <- qplot(x, y, data=a)
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="blue", geom=geom, width=0.2, ...)
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
This results in this graphic: