Line size by number of observations of a factor - r

I have the following plot created from the mtcars data set and this code:
ggplot(mtcars, aes(x = mpg, y = hp, colour = factor(gear))) + geom_point() + geom_smooth(method = lm, se = FALSE)
I want the line size of the linear regression lines to be proportional to the number of observations in each level of factor(gear):
> count(factor(mtcars$gear))
x freq
3 15
4 12
5 5
I've tried calling size = ..count.. and ..n.., inside the main ggplot call and the geom_smooth call with no luck.
Is there a way to do this?

Something like this should work:
library(ggplot2)
library(plyr)
line_size <- count(factor(mtcars$gear))
ggplot(mtcars, aes(x = mpg, y = hp, colour = factor(gear), size=factor(gear))) +
geom_point(size=element_blank()) +
geom_smooth(method = lm, se = FALSE) +
scale_size_manual(values=line_size$freq/4)

Related

How to use color() instead of facet_grid() to 'split' your data but keep it on the same plot

I'm having trouble substituting color() for facet_grid() when I want to 'split' my data by a variable. Instead of generating individual plots with regression lines, I'm looking to generate a single plot with all regression lines.
Here's my code:
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm", color = Type) # Single plot with all regression lines
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm") + facet_grid(. ~ Type) # Individual plots with regression lines
(The first plot doesn't work) Here's the output:
"Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'Type'
In addition: Warning messages:
1: Removed 12750 rows containing non-finite values (stat_smooth).
2: Removed 12750 rows containing missing values (geom_point)."
Here's a link to the data:
Dataset
You need to supply an aesthetic mapping to geom_smooth, not just a parameter, which means you need to put colour inside aes(). This is what you need to do any time you want to have an graphical element correspond to something in the data rather than a fixed parameter.
Here's an example with the built-in iris dataset. In fact, if you move colour to the ggplot call so it is inherited by geom_point as well, then you can colour the points as well as the lines.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(aes(colour = Species), method = "lm")
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) +
geom_point() +
geom_smooth(method = "lm")
Created on 2018-07-20 by the reprex package (v0.2.0).

Receive the equation of the stat_smooth in ggplot2 R mtcars example

Hi I would like to know how can I retrieve the equation of stat_smooth either in the ggplot2 or in a vector or somewhere else. the code that I am using is:
p <- ggplot(data = mtcars, aes(x = disp, y = drat))
p <- p + geom_point() + stat_smooth(method="loess")
p
Thanks
The ggpmisc package can be very usefull. However, it will not work with loess as loess doesn't give a formula. See here: Loess Fit and Resulting Equation
library(ggplot2)
library(ggpmisc)
p <- ggplot(data = mtcars, aes(x = disp, y = drat)) +
geom_point() +
geom_smooth(method="lm", formula=y~x) +
stat_poly_eq(parse=T, aes(label = ..eq.label..), formula=y~x)
p

Animate the process of adding layers to a ggplot2 plot

I am starting to get familiar with gganimate, but I want to extend my gifs further.
For instance, I can throw a frame on one variable in gganimate but what if I want to animate the process of adding entirely new layers/geoms/variables?
Here's a standard gganimate example:
library(tidyverse)
library(gganimate)
p <- ggplot(mtcars, aes(x = hp, y = mpg, frame = cyl)) +
geom_point()
gg_animate(p)
But what if I want the gif to animate:
# frame 1
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point()
# frame 2
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl)))
# frame 3
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl), size = wt))
# frame 4
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl), size = wt)) +
labs(title = "MTCARS")
How might this be accomplished?
You can manually add a frame aesthetic to each layer, though it will include the legends for all of the frames immediately (Intentionally, I believe, to keep ratios/margins, etc. correct:
saveAnimate <-
ggplot(mtcars, aes(x = hp, y = mpg)) +
# frame 1
geom_point(aes(frame = 1)) +
# frame 2
geom_point(aes(color = factor(cyl)
, frame = 2)
) +
# frame 3
geom_point(aes(color = factor(cyl), size = wt
, frame = 3)) +
# frame 4
geom_point(aes(color = factor(cyl), size = wt
, frame = 4)) +
# I don't think I can add this one
labs(title = "MTCARS")
gg_animate(saveAnimate)
If you want to be able to add things yourself, and even see how legends, titles, etc. move things around, you may need to step back to a lower-level package, and construct the images yourself. Here, I am using the animation package which allows you to loop through a series of plots, with no limitations (they need not be related at all, so can certainly show things moving the plot area around. Note that I believe this requires ImageMagick to be installed on your computer.
p <- ggplot(mtcars, aes(x = hp, y = mpg))
toSave <- list(
p + geom_point()
, p + geom_point(aes(color = factor(cyl)))
, p + geom_point(aes(color = factor(cyl), size = wt))
, p + geom_point(aes(color = factor(cyl), size = wt)) +
labs(title = "MTCARS")
)
library(animation)
saveGIF(
{lapply(toSave, print)}
, "animationTest.gif"
)
The gganimate commands in the earlier answers are deprecated as of 2021 and won't accomplish OP's task.
Building on Mark's code, you can now simply create a static ggplot object with multiple layered geoms and then add the gganimate::transition_layers function to create an animation that transitions from layer to layer within the static plot. Tweening functions like enter_fade() and enter_grow() control how elements change into and out of frames.
library(tidyverse)
library(gganimate)
anim <- ggplot(mtcars, aes(x = hp, y = mpg)) +
# Title
labs(title = "MTCARS") +
# Frame 1
geom_point() +
# Frame 2
geom_point(aes(color = factor(cyl))) +
# Frame 3
geom_point(aes(color = factor(cyl), size = wt)) +
# gganimate functions
transition_layers() + enter_fade() + enter_grow()
# Render animation
animate(anim)
the animation package doesn't force you to specify frames in the data. See the example at the bottom of this page here, where an animation is wrapped in a big saveGIF() function. You can specify the duration of individual frames and everything.
The drawback to this is that, unlike the nice gganimate functions, the basic frame-by-frame animation wont hold the plot dimensions/legend constant. But if you can hack your way into displaying exactly what you want for each frame, the basic animation package will serve you well.

ggplot mixture model R

I have a dataset with numeric values and a categorical variable. The distribution of the numeric variable differs for each category. I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.
This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).
If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))
What I want is to have the densities of each category as a sub-density (not integrating to 1). Similiar to the following code (which I only implemented for two of the three iris species)
myIris <- as.data.table(iris)
# calculate density for entire dataset
dens_entire <- density(myIris[, Sepal.Width], cut = 0)
dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])
# calculate density for dataset with setosa
dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)
dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])
# calculate density for dataset with versicolor
dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)
dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])
# plot densities as mixture model
ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) +
geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))
resulting in
Above I hard-coded the number to reduce the y values. Is there any way to do it with ggplot? Or to calculate it?
Thanks for your ideas.
Do you mean something like this? You need to change the scale though.
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..count..)) +
geom_density(aes(x = Sepal.Width, y = ..count..,
group = Species, colour = Species))
Another option may be
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..density..)) +
geom_density(aes(x = Sepal.Width, y = ..density../3,
group = Species, colour = Species))

Overlapping Trend Lines in scatterplots, R

I am trying to overlay multiple trend lines using the geom_smooth() in R. I currently have this code.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=mpg, y = hp, col = "Power"))
+ geom_point(aes(x=mpg, y = drat, col = "Drag Coef."))
(mtcars2 is the normalized form of mtcars)
Which give me this graph.
I am trying to use the geom_smooth(method='lm') to draw two trend lines for the the two variables. Any ideas?
(Bonus: I would also like to implement the 'shape=1' paramater to differentiate the varaibles if possible. The following method does not work)
geom_point(aes(x=mpg, y = hp, col = "Power", shape=2))
Update
I managed to do this.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=disp, y = hp, col = "Power"))
+ geom_point(aes(x=disp, y = mpg, col = "MPG"))
+ geom_smooth(method= 'lm',aes(x=disp, y = hp, col = "Power"))
+ geom_smooth(method= 'lm',aes(x=disp, y = mpg, col = "MPG"))
It looks like this.
But this is an ugly piece of code. If anybody can make this code look prettier, it'd be great. Also, I have not yet been able to implement the 'shape=2' parameter.
It seems like you're making your life harder than it needs to be...you can pass in additional parameters into aes() such as group and shape.
I don't know if I got your normalization right, but this should give you enough to get going in the right direction:
library(ggplot2)
library(reshape2)
#Do some normalization
mtcars$disp_norm <- with(mtcars, (disp - min(disp)) / (max(disp) - min(disp)))
mtcars$hp_norm <- with(mtcars, (hp - min(hp)) / (max(hp) - min(hp)))
mtcars$drat_norm <- with(mtcars, (drat - min(drat)) / (max(drat) - min(drat)))
#Melt into long form
mtcars.m <- melt(mtcars, id.vars = "disp_norm", measure.vars = c("hp_norm", "drat_norm"))
#plot
ggplot(mtcars.m, aes(disp_norm, value, group = variable, colour = variable, shape = variable)) +
geom_point() +
geom_smooth(method = "lm")
Yielding:

Resources