Fit an abline to data - r

I have 50 data points of temperature and humidity that I would like to plot on geom_point and add a linear model to my ggplot. however, I am unable to do so. I have tried abline, geom_line, geom_smooth and lm.
temp_humidity_data <- dplyr::select(data, temperature:humidity)
lm(formula = humidity ~ temperature, data = temp_humidity_data)
ggplot(temp_humidity_data) +
geom_point(aes (x = temperature , y = humidity))
geom_smooth()
How can I go about adding an lm to my `ggplot? any help is appreciated. thank you. And how could i differentiate the temperature and humidity points by colour as well on the plot?
this is what I have currently ^

As mentioned in the comment section, you missed a + sign after geom_point. Besides that, you are also missing a few arguments in geom_smooth:
library(ggplot2)
ggplot(iris) +
geom_point(aes(x = Petal.Length , y = Petal.Width)) +
geom_smooth(aes(x = Petal.Length, y = Petal.Width),
method = "lm", formula = y ~ x)
You need to supply "aesthetics" for x and y, otherwise you would get the following error:
Error: stat_smooth requires the following missing aesthetics: x, y
method = "lm" tells geom_smooth that you want to use the linear model method while formula specifies the model formula to plot. If we don't specify the method, geom_smooth defaults to "loess" (as stated by #Lyngbakr) and gives the warning message:
geom_smooth() using method = 'loess' and formula 'y ~ x'
Since we have to supply the same aesthetics in both geom_point and geom_smooth, a more convenient way would be to write:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x)
Output:
To answer OP's second question of "how could i differentiate the temperature and humidity points by colour as well on the plot?", we can add the color and size aesthetics to geom_point like the following:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point(aes(color = Petal.Length, size = Petal.Width)) +
geom_smooth(method = "lm", formula = y ~ x)
Output:
To change the range of sizes and colors, we use scale_fill_continuous (or scale_color_continuous for color) and scale_size_continuous:
ggplot(iris, aes(x = Petal.Length , y = Petal.Width)) +
geom_point(aes(fill = Petal.Length, size = Petal.Width), pch = 21) +
geom_smooth(method = "lm", formula = y ~ x) +
scale_fill_continuous(low = "red", high = "blue") +
scale_size_continuous(range = c(1, 10))
Notice that as you increase the size range, some points start to overlap with each other. To make it less confusing, I've used fill instead of color and added pch = 21 ("plot character" of a circle) to wrap around each point. This gives a nice border that separates each point.
Output:

Related

Only display correlation coefficient in ggplot stat_cor

How do you only display the correlation coefficient in ggpubr::stat_cor, and not the p-value? There doesn't seem to be an argument within stat_cor to specify only one statistic or the other. Is there some other creative work-around?
Following Ben's solution - I including a reproducible example.
First let's use a simple example:
library('ggplot2')
library('ggpubr')
data(iris)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson")
Now, you wanted to display only the correlation coefficient, meaning R.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width )) +
geom_point() + theme_bw() +
stat_cor(method = "pearson", aes(label = ..r.label..))
Actually you can also calculate the R-value independently, and ad a textbox in the plot using annotate(example from iris database):
round(cor(iris$Sepal.Length, iris$Sepal.Width),2)

How to use color() instead of facet_grid() to 'split' your data but keep it on the same plot

I'm having trouble substituting color() for facet_grid() when I want to 'split' my data by a variable. Instead of generating individual plots with regression lines, I'm looking to generate a single plot with all regression lines.
Here's my code:
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm", color = Type) # Single plot with all regression lines
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm") + facet_grid(. ~ Type) # Individual plots with regression lines
(The first plot doesn't work) Here's the output:
"Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'Type'
In addition: Warning messages:
1: Removed 12750 rows containing non-finite values (stat_smooth).
2: Removed 12750 rows containing missing values (geom_point)."
Here's a link to the data:
Dataset
You need to supply an aesthetic mapping to geom_smooth, not just a parameter, which means you need to put colour inside aes(). This is what you need to do any time you want to have an graphical element correspond to something in the data rather than a fixed parameter.
Here's an example with the built-in iris dataset. In fact, if you move colour to the ggplot call so it is inherited by geom_point as well, then you can colour the points as well as the lines.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(aes(colour = Species), method = "lm")
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) +
geom_point() +
geom_smooth(method = "lm")
Created on 2018-07-20 by the reprex package (v0.2.0).

how to plot predicted values on lm line for a null model using ggplot in r

Trying to reproduce below base code using ggplot which is yielding
incorrect result
base code
model1 <- lm(wgt ~ 1, data = bdims)
model1_null <- augment(model1)
plot(bdims$hgt, bdims$wgt)
abline(model1, lwd = 2, col = "blue")
pre_null <- predict(model1)
segments(bdims$hgt, bdims$wgt, bdims$hgt, pre_null, col = "red")
ggplot code
bdims %>%
ggplot(aes(hgt, wgt)) +
geom_point() +
geom_smooth(method = "lm", formula = bdims$hgt ~ 1) +
segments(bdims$hgt, bdims$wgt, bdims$hgt, pre_null, col = "red")
Here's an example using the built-in mtcars data:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ 1) +
geom_segment(aes(xend = wt, yend = mean(mpg)), col = "firebrick2")
The formula references the aesthetic dimensions, not the variable names. And you need to use geom_segment not the base graphics segments. In a more complicated case you would pre-compute the model's predicted values for the segments, but for a null model it's easy enough to just use mean inline.

How to curve a line in regression( geom smooth) in r

ggplot(data = wheatX,
aes(x = No.of.species,
y = Weight.of.weed,
color = Treatment)) +
geom_point(shape = 1) +
scale_colour_hue(l = 50) +
geom_smooth(method = glm,
se = FALSE)
This draws a straight line.
But the species number will decrease at somepoint. I want to make the line curve. How can I do it. Thanks
This is going to depend on what you mean by "smooth"
One thing you can do is apply a loess curve:
ggplot() + ... + stat_smooth(method = "loess", formula = biomass ~ numSpecies, size = 1)
Or you can manually build a polynomial model using the regular lm method:
ggplot() + ... + stat_smooth(method = "lm", formula = biomass ~ numSpecies + I(numSpecies^2), size = 1)
You'll need to figure out the exact model you want to use for the second case, hence what I originally meant by the definition of the term "smooth"

How to draw ggplot of lm(log(y)~)and lm(y~x+x^2) in one plot

Using package ggplot2 and iris, I want to plot a scatterplot with the fitted regression lines.
library(ggplot2)
ggplot(data=iris, aes(x = Petal.Width, y = Petal.Length,color=Species)) +
geom_point(shape=1) +
stat_smooth(method = "lm",formula= 'Petal.Length ~ Petal.Width+I(Petal.Width^2)+SaleType+Petal.Width*Species', data=iris,
aes(x = Petal.Width, y = Petal.Length,color=Species))
**Warning message:
Computation failed in `stat_smooth()`:
variable lengths differ (found for '(weights)')**
I am thinking about the reason get this warning that I have two independent variables, but right now R can't read Species spliting up by colors in stat_smooth. How can I draw two lines which are sames as plot(Petal.Width,fitted(fit)). In addition, If I have another regression model fitted by same group of data, but log(y), fit<-lm(log(Petal.Length)~Petal.Width+Species+Petal.Width*Species,data=iris). Can I put the draws of two regression models into the same graph?
I don't think it is appropriate to combine a transformed regression with a raw value on the same scale. Rather these should be plotted up on different figures. Using the iris dataset you can plot up the raw data like this:
ggplot(data=iris, aes(color=Species)) +
geom_point(aes(x = Petal.Width, y = Sepal.Width)) +
stat_smooth(method = "lm", aes(x = Petal.Width, y = Sepal.Width,color=Species))
Then log transform Sepal.Width into another variable:
iris$LogSepal.Width <- log(iris$Sepal.Width)
Then plot that transformed variable. I hope this helps.
ggplot(data=iris, aes(color=Species)) +
geom_point(aes(x = Petal.Width, y = LogSepal.Width)) +
stat_smooth(method = "lm", aes(x = Petal.Width, y = LogSepal.Width,color=Species))

Resources