Unintended line across X axis of density plot (r) - r

I am trying to identify why I have a purple line appearing along the x axis that is the same color as "Prypchan, Lida" from my legend. I took a look at the data and do not see any issues there.
ggplot(LosDoc_Ex, aes(x = LOS)) +
geom_density(aes(colour = AttMD)) +
theme(legend.position = "bottom") +
xlab("Length of Stay") +
ylab("Distribution") +
labs(title = "LOS Analysis * ",
caption = "*exluding Residential and WSH",
color = "Attending MD: ")

Usually I'd wait for a reproducible example, but in this case, I'd say the underlying explanation is really quite straightforward:
geom_density() creates a polygon, not a line.
Using a sample dataset from ggplot2's own package, we can observe the same straight line below the density plots, covering the x-axis & y-axis. The colour of the line simply depends on which plot is on top of the rest:
p <- ggplot(diamonds, aes(carat, colour = cut)) +
geom_density()
Workaround 1: You can manually calculate the density values yourself for each colour group in a new data frame, & plot the results using geom_line() instead of geom_density():
library(dplyr)
library(tidyr)
library(purrr)
diamonds2 <- diamonds %>%
nest(-cut) %>%
mutate(density = map(data, ~density(.x$carat))) %>%
mutate(density.x = map(density, ~.x[["x"]]),
density.y = map(density, ~.x[["y"]])) %>%
select(cut, density.x, density.y) %>%
unnest()
ggplot(diamonds2, aes(x = density.x, y = density.y, colour = cut)) +
geom_line()
Workaround 2: Or you can take the data generated by the original plot, & plot that using geom_line(). The colours would need to be remapped to the legend values though:
lp <- layer_data(p)
if(is.factor(diamonds$cut)) {
col.lev = levels(diamonds$cut)
} else {
col.lev = sort(unique(diamonds$cut))
}
lp$cut <- factor(lp$group, labels = col.lev)
ggplot(lp, aes(x = x, y = ymax, colour = cut)) +
geom_line()

There are two simple workarounds. First, if you only want lines and no filled areas, you can simply use geom_line() with the density stat:
library(ggplot2)
ggplot(diamonds, aes(x = carat, y = stat(density), colour = cut)) +
geom_line(stat = "density")
Note that for this to work, we need to set the y aesthetic to stat(density).
Second, if you want the area under the lines to be filled, you can use geom_density_line() from the ggridges package. It works exactly like geom_density() but draws a line (with filled area underneath) rather than a polygon.
library(ggridges)
ggplot(diamonds, aes(x = carat, colour = cut, fill = cut)) +
geom_density_line(alpha = 0.2)
Created on 2018-12-14 by the reprex package (v0.2.1)

Related

Plot proportion in bar chart grouped by another variable

I am currently reading R for Data Science and trying to create some graphs. I understand that to get proportion in bar chart, you need to use group = 1. For example, the code below works:
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color))
But I don't get the same plot for proportions.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group = 1))
I do get proportion but not by color.
Here's one way to do it using ..count..
require(ggplot2)
ggplot(diamonds,aes(cut,..count../sum(..count..),fill=color))+
geom_bar()+
scale_y_continuous(labels=scales::percent)

ggplot2 change line type

I've been trying to plot two line graphs, one dashed and the other solid. I succeeded in doing so in the plot area, but the legend is problematic.
I looked at posts such as Changing the line type in the ggplot legend , but I can't seem to fix the solution. Where have I gone wrong?
library(ggplot2)
year <- 2005:2015
variablea <- 1000:1010
variableb <- 1010:1020
df = data.frame(year, variablea, variableb)
p <- ggplot(df, aes(x = df$year)) +
geom_line(aes(y = df$variablea, colour="variablea", linetype="longdash")) +
geom_line(aes(y = df$variableb, colour="variableb")) +
xlab("Year") +
ylab("Value") +
scale_colour_manual("", breaks=c("variablea", "variableb")
, values=c("variablea"="red", "variableb"="blue")) +
scale_linetype_manual("", breaks=c("variablea", "variableb")
, values=c("longdash", "solid"))
p
Notice that both lines appear as solid in the legend.
ggplot likes long data, so you can map linetype and color to a variable. For example,
library(tidyverse)
df %>% gather(variable, value, -year) %>%
ggplot(aes(x = year, y = value, colour = variable, linetype = variable)) +
geom_line()
Adjust color and linetype scales with the appropriate scale_*_* functions, if you like.

How to create a heatmap with continuous scale using ggplot2 in R

I have got a data frame with several 1000 rows in the form of
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
and would like to make a kind of heatmap in which one axes has a continuous scale (position). The color column is categorical. However due to the large amount of data points I want to use binning, i.e. use it as a continuous variable.
This is more or less how the plot should look like:
I can't think of a way to create such a plot using ggplot2/R. I have tried several geometries, e.g. geom_point()
ggplot(data=df, aes(x=strain, y=pos, color=color)) +
geom_point() +
scale_colour_gradientn(colors=c("yellow", "black", "orange"))
Thanks for your help in advance.
Does this help you?
library(ggplot2)
group = c("gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr1","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr2","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3","gr3")
pos = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
color = c(2,2,2,2,3,3,2,2,3,2,1,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,2,2)
df = data.frame(group, pos, color)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))
Looks like this
Improved version with 3 color gradient if you like
library(scales)
ggplot(data = df, aes(x = group, y = pos)) + geom_tile(aes(fill = color))+ scale_fill_gradientn(colours=c("orange","black","yellow"),values=rescale(c(1, 2, 3)),guide="colorbar")

Shade density plot to the left of vline?

Is it possible to shade a density plot using a vline as cutoff? For example:
df.plot <- data.frame(density=rnorm(100))
library(ggplot2)
ggplot(df.plot, aes(density)) + geom_density() +
geom_vline(xintercept = -0.25)
I tried creating a new variable, but it does not work as I expected
df.plot <- df.plot %>% mutate(color=ifelse(density<(-0.25),"red","NULL"))
ggplot(df.plot, aes(density, fill = color, colour = color)) + geom_density() +
geom_vline(xintercept = -0.25)
I don't know of a way to do that directly with ggplot. But you could calculate the density outside of ggplot over the desired range:
set.seed(4132)
df.plot <- data.frame(density=rnorm(100))
ds <- density(df.plot$density, from = min(df.plot$density), to = -0.25)
ds_data <- data.frame(x = ds$x, y = ds$y)
density() estimates the density for the points given in its first argument. The result will contain x and y values. You can specify the x-range you are interested in with from and to. In order for the density to agree with the one plotted by ggplot(), set the ranges to the minimal and maximal value in df.plot$density. Here, to is set to -0.25, because you only want the part of the density curve to the left of your vline. You can then extract the x and y values with ds$x and ds$y.
The plot is then created by using the same code as you did, but adding an additional geom_area() with the density data that was calculated above:
library(ggplot2)
ggplot(df.plot, aes(density)) + geom_density() +
geom_vline(xintercept = -0.25) +
geom_area(data = ds_data, aes(x = x, y = y))

overlay colored boxplots on parallel coordinate plots with faceting in ggplot2

I have the following example.
require(ggplot2)
# Example Data
x <- data.frame(var1=rnorm(800,0,1),
var2=rnorm(800,0,1),
var3=rnorm(800,0,1),
type=factor(rep(c("x", "y"), length.out=800)),
set=factor(rep(c("A","B","C","D"), each=200))
)
Now, I would like to plot (thin) parallel coordinate plots of these lines, with points for each of the variable values. I would like to overlay a boxplot (each of a different color for each method) on these parallel coordinate plots at the variables values. On top of this, I would like to facet for the groups and types, say using set~type. Is this possible to do using ggplot2?
Any suggestions? Thanks!
You need to put data in long format first. I didn't put in points, since the graph is already cluttered enough, but you can do so by adding a geom_point.
require(tidyr)
x$id <- 1:nrow(x)
x2 <- gather(x, var, value, var1:var3)
Boxplots
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_boxplot(aes(fill = var), alpha = 0.5) +
facet_grid(set ~ type) +
theme_bw()
Or perhaps violins
Replacing the boxplots with violins looks pretty cool as well.
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_violin(aes(fill = var), col = NA, alpha = 0.6) +
facet_grid(set ~ type) +
theme_bw()

Resources