ggplot2 in R: fill underneath a geom_smooth line - r

I am trying to fill in a portion of a plot underneath a geom_smooth() line.
Example:
In the example the data fits on that curve. My data is not as smooth. I want to use geom_point() and a mix of geom_smooth() and geom_area() to fill in the area under the smoothed line while leaving the points above.
A picture of my data with a geom_smooth():
In other words, I want everything underneath that line to be filled in, like in Image 1.

Use predict with the type of smoothing being used. geom_smooth uses loess for n < 1000 and gam for n > 1000.
library(ggplot2)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
geom_ribbon(aes(ymin = 0,ymax = predict(loess(hwy ~ displ))),
alpha = 0.3,fill = 'green')
Which gives:

Related

Overlay a circle on a scatter plot in R

I'm using this link to create the circle: Draw a circle with ggplot2
However, when I try to add my next ggplot which uses data from a CSV, I get two separate graphs. I'd like to have the circle overlay the scatterplot.
ggplot(CSV1, aes(x= Pos.X..µm., y = Pos.Y..µm.)) +
geom_point()
ggplot(dat, aes(x,y)) + geom_path()
Thanks!
You just need to combine it into a single object.
ggplot(CSV1, aes(x= Pos.X..µm., y = Pos.Y..µm.)) +
geom_point() +
geom_path(aes(x,y), data=dat)

Draw a trend line using ggplot

I used ggplot2 to draw a trend line based on my data.
Below is something I've done using spreadsheet.
But I only want to show the trend line (black line as shown in upper plot) rather than all dots as number of observation is > 20,000.
So I tried to do the same thing using ggplot2.
fig_a <- ggplot(df1, aes(data_x, data_y ))
fig_a + stat_smooth(method=lm)
fig_a + stat_smooth(method=gam)
Apparently it does not work well, anyone can help?
Why it gives so many lines rather than single trend line?
You can do the following. Add + geom_smooth(method = "lm") to your ggplot script.
Example using built-in data
ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(method = "lm")

Plot density with ggplot2 without line on x-axis

I use ggplot2::ggplot for all 2D plotting needs, including density plots, but I find that when plotting a number of overlapping densities with extreme outliers on a single space (in different colors) the line on the x-axis becomes a little distracting.
My question is then, can you remove the bottom section of the density plot from being plotted? If so, how?
You can use this example:
library(ggplot2)
ggplot(movies, aes(x = rating)) + geom_density()
Should turn out like this:
How about using stat_density directly
ggplot(movies, aes(x = rating)) + stat_density(geom="line")
You can just draw a white line over it:
ggplot(movies, aes(x = rating)) +
geom_density() +
geom_hline(color = "white", yintercept = 0)

How to change style settings in stacked barchart overlaid with density line (ggplot2)

I am trying to change the style settings of this kind of chart and hope you can help me.
R code:
set_theme(theme_bw)
cglac$pred2<-as.factor(cglac$pred)
ggplot(cglac, aes(x=depth, colour=pred2))
+ geom_bar(aes(y=..density..),binwidth=3, alpha=.5, position="stack")
+ geom_density(alpha=.2)
+ xlab("Depth (m)")
+ ylab("Counts & Density")
+ coord_flip()
+ scale_x_reverse()
+ theme_bw()
which produces this graph:
Here some points:
What I want is to have the density line as black and white lines separated by symbols rather than colour (dashed line, dotted line etc).
The other thing is the histogram itself. How do I get rid of the grey background in the bars?
Can I change the bars also to black and white symbol lines (shaded etc)? So that they would match the density lines?
Last but not least I want to add a second x or in this case y axis, because of flip_coord(). The one I see right now is for the density. The other one I need would then be the count data from the pred2 variable.
Thanks for helping.
Best,
Moritz
Have different line types: inside aes(), put linetype = pred2. To make the line color black, inside geom_density, add an argument color = "black".
The "background" of the bars is called "fill". Inside geom_bar, you can set fill = NA for no fill. A more common approach is to fill in the bars with the colors, inside aes() specify fill = pred2. You might consider faceting by your variable, + facet_wrap(~ pred2, nrow = 1) might look very nice.
Shaded bars in ggplot? No, you can't do that easily. See the answers to this question for other options and hacks.
Second y-axis, similar to the shaded symbol lines, the ggplot creator thinks a second y-axis is a terrible design choice, so you can't do it at all easily. Here's a related question, including Hadley's point of view:
I believe plots with separate y scales (not y-scales that are transformations of each other) are fundamentally flawed.
It's definitely worth considering his point of view, and asking yourself if those design choices are really what you want.
Different linetypes for densities
Here's my built-in data version of what you're trying to do:
ggplot(mtcars, aes(x = hp,
linetype = cyl,
group = cyl,
color = cyl)) +
geom_histogram(aes(y=..density.., fill = cyl),
alpha=.5, position="stack") +
geom_density(color = "black") +
coord_flip() +
theme_bw()
And what I think you should do instead. This version uses facets instead of stacking/colors/linetypes. You seem to be aiming for black and white, which isn't a problem at all in this version.
ggplot(mtcars, aes(x = hp,
group = cyl)) +
geom_histogram(aes(y=..density..),
alpha=.5) +
geom_density() +
facet_wrap(~ cyl, nrow = 1) +
coord_flip() +
theme_bw()

Splitting distribution visualisations on the y-axis in ggplot2 in r

The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:
data("kyphosis", package="rpart")
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
geom_point() +
stat_smooth(method="glm", family="binomial")
This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?
From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.
The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.
Here's one way to go about it:
data("kyphosis", package="rpart")
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
stat_smooth(method="glm", family="binomial")
absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")
dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)
scaling.factor <- 10 # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) +
annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)
This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.
You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)
Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:
model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") +
annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")
The only way around that problem is to use a polygon instead of a density geom.
One final variant: flipping the top density plot along y-axis = 1:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)
I am not sure I get your point, but here an attempt:
dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
facet_grid(grp~.,scales = "free_y") +
#geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
method="glm", family="binomial") +
geom_density(data=subset(dat,grp=='dens'))

Resources