R ggplot: Two histograms (based on two different column) in one graph - r

I want to put two histograms together in one graph, but each of the histogram is based on different column. Currently I can do it like this, But the position=dodge does not work here. And there is no legend (different color for different column).
p <- ggplot(data = temp2.11)
p <- p+ geom_histogram(aes(x = diff84, y=(..count..)/sum(..count..)),
alpha=0.3, fill ="red",binwidth=2,position="dodge")
p <- p+ geom_histogram(aes(x = diff08, y=(..count..)/sum(..count..)),
alpha=0.3,, fill ="green",binwidth=2,position="dodge")

You have to format your table in long format, then use a long variable as aesthetics in ggplot. Using the iris data set as example...
data(iris)
# your method
library(ggplot2)
ggplot(data = iris) +
geom_histogram(aes(x = Sepal.Length, y=(..count..)/sum(..count..)),
alpha=0.3, fill ="red",binwidth=2,position="dodge") +
geom_histogram(aes(x = Sepal.Width, y=(..count..)/sum(..count..)),
alpha=0.3,, fill ="green",binwidth=2,position="dodge")
# long-format method
library(reshape2)
iris2 = melt(iris[,1:2])
ggplot(data = iris2) +
geom_histogram(aes(x = value, y=(..count..)/sum(..count..), fill=variable),
alpha=0.3, binwidth=2, position="identity")

Related

R: Superimposing Two Graphs Together

I am using the R programming language. Using the following link (https://bio304-class.github.io/bio304-book/introduction-to-ggplot2.html) , I made these two plots for the iris dataset:
library(ggplot2)
library(cowplot)
data(iris)
#graph1
setosa.only <- subset(iris, Species == "setosa")
setosa.sepals <- ggplot(setosa.only,
mapping = aes(x = Sepal.Length, y = Sepal.Width))
graph1 = setosa.sepals + geom_point() + sepal.labels
#graph2
graph2 = setosa.sepals +
geom_density2d() +
sepal.labels + labs(subtitle = "I. setosa data only")
cowplot::plot_grid(graph1, graph2, labels = "AUTO")
My question: is it possible to combine both of these graphs together into 1 single plot?
So that it looks something like this? (I tried to draw this by hand):
Thanks
You can add geom_density2d() after geom_point() :
library(ggplot2)
ggplot(setosa.only,
mapping = aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_density2d()

One label for multiple points

I'm making a scatterplot and want to label several points with the same label.
data.frame(label=rep(c("a","b","c"),2), x=rep(c(1:3),2), y=(5,4,7,2,6,9))
As you can see, the labels occur twice each at the same x values, only y differs. I want both [1,5] and [1,2] to be labeled using a single "a", not one "a" for each coordinate.
I'm using R, ggplot2 and ggrepel.
This can work:
dat <- data.frame(label=rep(c("a","b","c"),2), x=rep(c(1:3),2), y=c(5,4,7,2,6,9))
ggplot() + geom_point(data=dat, aes(x=x, y=y)) + geom_text(data=dat[duplicated(dat$label),], aes(x=x, y=y, label=label))
I think this is what you want.
I am using the dplyr or tidyverse package.
library(tidyverse)
Dataset
dat1 <- data.frame(label=rep(c("a","b","c"),2), x=rep(c(1:3),2), y=c(5,4,7,2,6,9))
Creating a dataset for the labels. This creates a label dataset which will pick a labeling point at midpoint Y for a given X.
lab1 <- dat1 %>% group_by(label) %>% mutate(x = x, y = mean(y))
This creates the plot using the original dataset for the points and the label dataset for the labels.
ggplot() +
geom_point(data=dat1, aes(x=x, y=y)) +
geom_text(data=lab1, aes(x=x, y=y, label=label), size = 5) +
theme_grey()
The above actually plots the labels twice on top of each other, but you can't notice. If you really just wanted it once, then you could do the following and update the previous code with lab2. I also changed size so you can see.
lab2 <-unique(lab1)
ggplot() +
geom_point(data=dat1, aes(x=x, y=y)) +
geom_text(data=lab2, aes(x=x, y=y, label=label), size=10) +
theme_grey()
If you wanted the x direction more to the right or higher, you could update your label dataset by adding an offset to your label dataset.
lab1 <- dat1 %>% group_by(label) %>% mutate(x = x+.3, y = mean(y) + .5)
Or you can accomplish the same within geom_text itself using nudge.
ggplot() + geom_point(data=dat1, aes(x=x, y=y)) +
geom_text(data=lab1, aes(x=x, y=y, label=label), size=10, nudge_x = .3, nudge_y = .5) +
theme_grey()

Unintended line across X axis of density plot (r)

I am trying to identify why I have a purple line appearing along the x axis that is the same color as "Prypchan, Lida" from my legend. I took a look at the data and do not see any issues there.
ggplot(LosDoc_Ex, aes(x = LOS)) +
geom_density(aes(colour = AttMD)) +
theme(legend.position = "bottom") +
xlab("Length of Stay") +
ylab("Distribution") +
labs(title = "LOS Analysis * ",
caption = "*exluding Residential and WSH",
color = "Attending MD: ")
Usually I'd wait for a reproducible example, but in this case, I'd say the underlying explanation is really quite straightforward:
geom_density() creates a polygon, not a line.
Using a sample dataset from ggplot2's own package, we can observe the same straight line below the density plots, covering the x-axis & y-axis. The colour of the line simply depends on which plot is on top of the rest:
p <- ggplot(diamonds, aes(carat, colour = cut)) +
geom_density()
Workaround 1: You can manually calculate the density values yourself for each colour group in a new data frame, & plot the results using geom_line() instead of geom_density():
library(dplyr)
library(tidyr)
library(purrr)
diamonds2 <- diamonds %>%
nest(-cut) %>%
mutate(density = map(data, ~density(.x$carat))) %>%
mutate(density.x = map(density, ~.x[["x"]]),
density.y = map(density, ~.x[["y"]])) %>%
select(cut, density.x, density.y) %>%
unnest()
ggplot(diamonds2, aes(x = density.x, y = density.y, colour = cut)) +
geom_line()
Workaround 2: Or you can take the data generated by the original plot, & plot that using geom_line(). The colours would need to be remapped to the legend values though:
lp <- layer_data(p)
if(is.factor(diamonds$cut)) {
col.lev = levels(diamonds$cut)
} else {
col.lev = sort(unique(diamonds$cut))
}
lp$cut <- factor(lp$group, labels = col.lev)
ggplot(lp, aes(x = x, y = ymax, colour = cut)) +
geom_line()
There are two simple workarounds. First, if you only want lines and no filled areas, you can simply use geom_line() with the density stat:
library(ggplot2)
ggplot(diamonds, aes(x = carat, y = stat(density), colour = cut)) +
geom_line(stat = "density")
Note that for this to work, we need to set the y aesthetic to stat(density).
Second, if you want the area under the lines to be filled, you can use geom_density_line() from the ggridges package. It works exactly like geom_density() but draws a line (with filled area underneath) rather than a polygon.
library(ggridges)
ggplot(diamonds, aes(x = carat, colour = cut, fill = cut)) +
geom_density_line(alpha = 0.2)
Created on 2018-12-14 by the reprex package (v0.2.1)

ggplot geom_boxplot and plotting last value with geom_point

I'm new to R. I was trying to plot the last value of each variable in a data frame on top of a boxplot. Without success I was trying:
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(iris, aes(x=unique(iris$Species), y=tail(iris,n=1)))
Thanks, Bill
One approach is
library(tidyverse)
iris1 <- iris %>%
group_by(Species) %>%
summarise(LastVal = last(Sepal.Length))
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(data = iris1, aes(x = Species, y = LastVal))

scatterplot with no x variable

My data set has a response variable and a 2-level factor explanatory variable. Is there a function for creating a scatter plot with no x axis variable? I'd like the variables to be randomly spread out along the x axis to make them easier to see and differentiate the 2 groups by color. I'm able to create a plot by creating an "ID" variable, but I'm wondering if it's possible to do it without it? The "ID" variable is causing problems when I try to add + facet_grid(. ~ other.var) to view the same plot broken out by another factor variable.
#Create dummy data set
response <- runif(500)
group <- c(rep('group1',250), rep('group2',250))
ID <- c(seq(from=1, to=499, by=2), seq(from=2, to=500, by=2))
data <- data.frame(ID, group, response)
#plot results
ggplot() +
geom_point(data=data, aes(x=ID, y=response, color=group))
How about using geom_jitter, setting the x axis to some fixed value?
ggplot() +
geom_jitter(data=data, aes(x=1, y=response, color=group))
You could plot x as the row number?
ggplot() +
geom_point(data=data, aes(x=1:nrow(data), y=response, color=group))
Or randomly order it first?
RandomOrder <- sample(1:nrow(data), nrow(data))
ggplot() +
geom_point(data=data, aes(x= RandomOrder, y=response, color=group))
Here's how you can scatter plot a variable against row index without intermediate variable:
ggplot(data = data, aes(y = response, x = seq_along(response), color = group)) +
geom_point()
To shuffle row index just add a sample function, like this:
ggplot(data = data, aes(y = response, x = sample(seq_along(response)), color = group)) +
geom_point()

Resources