Only display correlation coefficient in ggplot stat_cor - r

How do you only display the correlation coefficient in ggpubr::stat_cor, and not the p-value? There doesn't seem to be an argument within stat_cor to specify only one statistic or the other. Is there some other creative work-around?

Following Ben's solution - I including a reproducible example.
First let's use a simple example:
library('ggplot2')
library('ggpubr')
data(iris)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson")
Now, you wanted to display only the correlation coefficient, meaning R.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson", aes(label = ..r.label..))
Actually you can also calculate the R-value independently, and ad a textbox in the plot using annotate(example from iris database):
round(cor(iris$Sepal.Length, iris$Sepal.Width),2)

Related

adding significance brackets to ridgeline plot

I am creating a ridge plot to compare a few groups (using ggridges package) and would like to add significance brackets to show comparisons between some group levels (using ggsignif package).
But this doesn't seem to work because the computation fails in stat_ggsignif.
Here is a reproducible example:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")))
#> Picking joint bandwidth of 0.181
#> Warning in f(..., self = self): NAs introduced by coercion
#> Warning: Computation failed in `stat_signif()`:
#> missing value where TRUE/FALSE needed
Created on 2021-07-29 by the reprex package (v2.0.0)
How can I get these two packages to work with each other? Thanks.
I did not manage to combine A) geom_density_ridges and B) geom_signif. The reason is that (A) requires numerical variable as x and categories as y, while (B) requires numerical variable as y and categories as x. And I have not managed to overwrite this behaviour.
But I assume that you have chosen ridge_plots over simple boxplots as you are interested in a more informative visualization of the distribution. To do so, there is a much better solution than ridge_plots, the so called violin plots. See below a standard boxplot (with labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
See below a violin plot (with jitter and labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin(trim = F) + geom_jitter() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
This does the job unless you are particularly interest in making ggridges and ggsignif work together. Please note that a violin plot is just a folded density plot (see https://en.wikipedia.org/wiki/Violin_plot#:~:text=A%20violin%20plot%20is%20a,by%20a%20kernel%20density%20estimator for more details).
For the same purpose, see also the sina plot (suggestion by tjebo):
library(ggforce)
ggplot(iris, aes(x = Species, y = Sepal.Length, colour = Species)) +
geom_sina() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
Thanks to a new pull request to ggsignif, the following now works:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")),
y_position = 9)
#> Picking joint bandwidth of 0.181
Created on 2021-08-06 by the reprex package (v2.0.1)

Fit an abline to data

I have 50 data points of temperature and humidity that I would like to plot on geom_point and add a linear model to my ggplot. however, I am unable to do so. I have tried abline, geom_line, geom_smooth and lm.
temp_humidity_data <- dplyr::select(data, temperature:humidity)
lm(formula = humidity ~ temperature, data = temp_humidity_data)
ggplot(temp_humidity_data) +
geom_point(aes (x = temperature , y = humidity))
geom_smooth()
How can I go about adding an lm to my `ggplot? any help is appreciated. thank you. And how could i differentiate the temperature and humidity points by colour as well on the plot?
this is what I have currently ^
As mentioned in the comment section, you missed a + sign after geom_point. Besides that, you are also missing a few arguments in geom_smooth:
library(ggplot2)
ggplot(iris) +
geom_point(aes(x = Petal.Length , y = Petal.Width)) +
geom_smooth(aes(x = Petal.Length, y = Petal.Width),
method = "lm", formula = y ~ x)
You need to supply "aesthetics" for x and y, otherwise you would get the following error:
Error: stat_smooth requires the following missing aesthetics: x, y
method = "lm" tells geom_smooth that you want to use the linear model method while formula specifies the model formula to plot. If we don't specify the method, geom_smooth defaults to "loess" (as stated by #Lyngbakr) and gives the warning message:
geom_smooth() using method = 'loess' and formula 'y ~ x'
Since we have to supply the same aesthetics in both geom_point and geom_smooth, a more convenient way would be to write:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x)
Output:
To answer OP's second question of "how could i differentiate the temperature and humidity points by colour as well on the plot?", we can add the color and size aesthetics to geom_point like the following:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point(aes(color = Petal.Length, size = Petal.Width)) +
geom_smooth(method = "lm", formula = y ~ x)
Output:
To change the range of sizes and colors, we use scale_fill_continuous (or scale_color_continuous for color) and scale_size_continuous:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point(aes(fill = Petal.Length, size = Petal.Width), pch = 21) +
geom_smooth(method = "lm", formula = y ~ x) +
scale_fill_continuous(low = "red", high = "blue") +
scale_size_continuous(range = c(1, 10))
Notice that as you increase the size range, some points start to overlap with each other. To make it less confusing, I've used fill instead of color and added pch = 21 ("plot character" of a circle) to wrap around each point. This gives a nice border that separates each point.
Output:

How to draw ggplot of lm(log(y)~)and lm(y~x+x^2) in one plot

Using package ggplot2 and iris, I want to plot a scatterplot with the fitted regression lines.
library(ggplot2)
ggplot(data=iris, aes(x = Petal.Width, y = Petal.Length,color=Species)) +
geom_point(shape=1) +
stat_smooth(method = "lm",formula= 'Petal.Length ~ Petal.Width+I(Petal.Width^2)+SaleType+Petal.Width*Species', data=iris,
aes(x = Petal.Width, y = Petal.Length,color=Species))
**Warning message:
Computation failed in `stat_smooth()`:
variable lengths differ (found for '(weights)')**
I am thinking about the reason get this warning that I have two independent variables, but right now R can't read Species spliting up by colors in stat_smooth. How can I draw two lines which are sames as plot(Petal.Width,fitted(fit)). In addition, If I have another regression model fitted by same group of data, but log(y), fit<-lm(log(Petal.Length)~Petal.Width+Species+Petal.Width*Species,data=iris). Can I put the draws of two regression models into the same graph?
I don't think it is appropriate to combine a transformed regression with a raw value on the same scale. Rather these should be plotted up on different figures. Using the iris dataset you can plot up the raw data like this:
ggplot(data=iris, aes(color=Species)) +
geom_point(aes(x = Petal.Width, y = Sepal.Width)) +
stat_smooth(method = "lm", aes(x = Petal.Width, y = Sepal.Width,color=Species))
Then log transform Sepal.Width into another variable:
iris$LogSepal.Width <- log(iris$Sepal.Width)
Then plot that transformed variable. I hope this helps.
ggplot(data=iris, aes(color=Species)) +
geom_point(aes(x = Petal.Width, y = LogSepal.Width)) +
stat_smooth(method = "lm", aes(x = Petal.Width, y = LogSepal.Width,color=Species))

Plotting a number of regression lines in a single plot

How do I show 2 regression lines on the same plot?
Here are both models:
data(mtcars)
a <- lm(mpg~wt+hp)
b <- lm(mpg~wt+hp+wt*hp)
I plot wt on the x-axis, mpg on the y-axis and hp as the colour.
Here it is in base R:
cr <- colorRamp(c("yellow", "red"))
with(mtcars, {
plot(wt, mpg, col = rgb(cr(hp / max(hp)), max=255),
xlab="Weight", ylab="Miles per Gallon", pch=20)
})
Also, please show how to accomplish this in ggplot2.
Here's the plot:
library(ggplot2)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(aes(col = hp))
p + scale_colour_gradientn(colours=c("green","black"))
Thanks in advance!
The documentation for geom_smooth practically tells you how to do this.
One can use the regression models to predict new values for y and then plot these on the same graph using geom_smooth().
Below is code for ggplot2 that produces what I think you want. The two lines overlap so much that it looks like only one line is plotted and I've set one linetype to dashed to demonstrate this.
I don't know how to achieve this in base R though.
data(mtcars)
library(ggplot2)
a <- lm(mpg~wt+hp, data = mtcars)
b <- lm(mpg~wt+hp+wt*hp, data = mtcars)
mtcars$pred.a <- predict(a)
mtcars$pred.b <- predict(b)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(aes(col = hp)) +
scale_colour_gradientn(colours=c("green","black")) +
geom_smooth(aes(x = wt, y = pred.a), method = "lm", colour = "black", fill = NA) +
geom_smooth(aes(x = wt, y = pred.b), method = "lm", colour = "red", fill = NA, linetype = 4)
p
A base R solution:
a <- lm(mpg~wt+hp, data=mtcars)
b <- lm(mpg~wt+hp+wt*hp, data=mtcars)
wt <- mtcars[, "wt"]
idx <- sort(wt, index.return=TRUE)$ix
plot(mpg~wt, data=mtcars)
lines(wt[idx], predict(a)[idx], col="red")
lines(wt[idx], predict(b)[idx], col="blue")
However, it is not the best visualisation conceivable.
You are asking how to add a regression line, but your regression models produce a regression plane and a regression surface, both higher dimensional than a line. You can find a regression line by conditioning on a chosen value of hp, or show multiple lines for different values of hp.
Using base graphics you can use the Predict.Plot function in the TeachingDemos package to add prediction lines/curves to a plot for a fitted model (or 2). The interactive TkPredict' function in the same package will let you interact with the plot to choose conditioning values, then will produce the call toPredict.Plot` to create the current line. You can the combine the generated commands to include them on the same plot.

Adding a regression line on a ggplot

I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this...
data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50))
ggplot(data,aes(x.plot,y.plot))+stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm',formula=data$y.plot~data$x.plot)
But it is not working either.
In general, to provide your own formula you should use arguments x and y that will correspond to values you provided in ggplot() - in this case x will be interpreted as x.plot and y as y.plot. You can find more information about smoothing methods and formula via the help page of function stat_smooth() as it is the default stat used by geom_smooth().
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', formula= y~x)
If you are using the same x and y values that you supplied in the ggplot() call and need to plot the linear regression line then you don't need to use the formula inside geom_smooth(), just supply the method="lm".
ggplot(data,aes(x.plot, y.plot)) +
stat_summary(fun.data= mean_cl_normal) +
geom_smooth(method='lm')
As I just figured, in case you have a model fitted on multiple linear regression, the above mentioned solution won't work.
You have to create your line manually as a dataframe that contains predicted values for your original dataframe (in your case data).
It would look like this:
# read dataset
df = mtcars
# create multiple linear model
lm_fit <- lm(mpg ~ cyl + hp, data=df)
summary(lm_fit)
# save predictions of the model in the new data frame
# together with variable you want to plot against
predicted_df <- data.frame(mpg_pred = predict(lm_fit, df), hp=df$hp)
# this is the predicted line of multiple linear regression
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_line(color='red',data = predicted_df, aes(x=mpg_pred, y=hp))
# this is predicted line comparing only chosen variables
ggplot(data = df, aes(x = mpg, y = hp)) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)
The simple and versatile solution is to draw a line using slope and intercept from geom_abline. Example usage with a scatterplot and lm object:
library(tidyverse)
petal.lm <- lm(Petal.Length ~ Petal.Width, iris)
ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) +
geom_point() +
geom_abline(slope = coef(petal.lm)[["Petal.Width"]],
intercept = coef(petal.lm)[["(Intercept)"]])
coef is used to extract the coefficients of the formula provided to lm. If you have some other linear model object or line to plot, just plug in the slope and intercept values similarly.
I found this function on a blog
ggplotRegression <- function (fit) {
`require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}`
once you loaded the function you could simply
ggplotRegression(fit)
you can also go for ggplotregression( y ~ x + z + Q, data)
Hope this helps.
If you want to fit other type of models, like a dose-response curve using logistic models you would also need to create more data points with the function predict if you want to have a smoother regression line:
fit: your fit of a logistic regression curve
#Create a range of doses:
mm <- data.frame(DOSE = seq(0, max(data$DOSE), length.out = 100))
#Create a new data frame for ggplot using predict and your range of new
#doses:
fit.ggplot=data.frame(y=predict(fit, newdata=mm),x=mm$DOSE)
ggplot(data=data,aes(x=log10(DOSE),y=log(viability)))+geom_point()+
geom_line(data=fit.ggplot,aes(x=log10(x),y=log(y)))
Another way to use geom_line() to add regression line is to use broom package to get fitted values and use it as shown here
https://cmdlinetips.com/2022/06/add-regression-line-to-scatterplot-ggplot2/

Resources