Scatterplot on top of line plot ggplot - r

Example data:
set.seed(245)
cond <- rep( c("control","treatment"), each=10)
xval <- round(10+ rnorm(20), 1)
yval <- round(10+ rnorm(20), 1)
df <- data.frame(cond, xval, yval)
df$xval[cond=="treatment"] <- df$xval[cond=="treatment"] + 1.5
I would like the "treatment" condition be plotted as a line and the "control" data be plotted as a scatter plot. So far, I have found a work around where I specify them to both be lines but chose that the control line be plotted as 'blank' in scale_linetype_manual:
plot <-ggplot(data=df, aes(x=xval, y=yval, group=cond, colour=cond))+
geom_line(aes(linetype=cond))+
geom_point(aes(shape=cond))+
scale_linetype_manual(values=c('blank', 'solid'))
However, there must be a more straightforward way of plotting the control as a scatter plot and the treatment as a line plot. Eventually, I'd like to remove the geom_point from the treatment line. The way it is now, it would remove the its from the control as well leaving me with nothing for the control.
Any insight would be helpful. Thanks.

I hope I have understood you correctly. You may use a "treatment" subset of the data for geom_line, and a "control" subset for geom_point.
After the subsetting, there is only one "cond" for geom_line ("treatment") and one for geom_point ("control"). Thus, I have removed the aes mapping between "cond" and colour, linetype and shape respectively. You may wish to set these aesthetics to desired values instead. Similarly, no need for group in this solution.
ggplot(data = subset(df, cond == "treatment"), aes(x = xval, y = yval)) +
geom_line() +
geom_point(data = subset(df, cond == "control"))
Update following comment from OP, "Now, what if my data had actually three "conditions" where 2 of the conditions would be plotted as lines and the other 1 is scatterplot."
# some data
set.seed(123)
cond <- rep( c("contr","treat", "post-treat"), each = 10)
xval <- rnorm(30)
yval <- rnorm(30)
df <- data.frame(cond, xval, yval)
# plot
ggplot(data = subset(df, cond %in% c("treat", "post-treat")), aes(x = xval, y = yval)) +
geom_line(aes(group = cond, colour = cond)) +
geom_point(data = subset(df, cond == "contr"))

Related

R ggplot2 geom_area() colors beneath a single line produce strange connections

Description of the problem with an example
Imagine I have this data.frame
x <- seq(1, 100) # the indexes
y <- c(seq(1, 25), seq(25, 1), seq(1, 25), seq(25, 1)) # the y values
group_nb <- c(replicate(25, "group1"), replicate(25, "group2"), replicate(25, "group1"), replicate(25, "group2")) # group info
df <- data.frame(x, y, group_nb)
That looks like this:
x y group_nb
-------------------------
1 1 group1
2 2 group1
3 3 group1
My first goal is to plot the line y in function of x with color depending on the group_nb.
My second goal is to shade the area beneath this line correctly.
The problems I am facing are that I have unwanted connections between points in group1 at index (1:25) and point in group1 at index (50:75).
1) Color the line in function of group_nb
To color the same line using ggplot, I first tested this:
ggplot(df, aes(x = x, y = y, group=group_nb, color=group_nb)) +
geom_line()
which gave this plot:
As you can see, there are unwanted connections between points from the same groups.
So I found that by adding aes(group=1) to geom_line() parameters corrected the problem of the unwanted connections between groups.
ggplot(df, aes(x = x, y = y, group=group_nb, color=group_nb)) +
geom_line(aes(group = 1)
This gave the plot that I wanted:
2) Color beneath the line in function of group_nb
Now I want to be able to shade the area beneath the line with the same colors.
Up until now I have this:
ggplot(df, aes(x = x, y = y, group=group_nb, color=group_nb, fill=group_nb)) +
geom_line(aes(group=1)) +
geom_area()
Which makes the same unwanted connections. We can see that the region with indexes (50:75) is hidden by the shading of group2 but should be the color of group1.
I tried adding aes(group=1) to geom_area() but it gave me the error Erreur : Aesthetics can not vary with a ribbon
So my questions are
How can I solve this?
Why adding aes(group = 1) in geom_line() helps.
Thank you for your help.
To address your second question first: the groups that ggplot2 interprets are expected to be part of the same geometric element. That is why you can expect geom_line() with 2 groups to plot two separate lines for the groups and setting group = 1 leads to ggplot interpreting as being part of the same geometric element.
That brings me to the first question, what you have might be two groups from a certain perspective, what you want to plot is slightly different. We can fix this by assigning an id variable to the data with (for example) run length encoding.
library(ggplot2)
x <- seq(1, 100) # the indexes
y <- c(seq(1, 25), seq(25, 1), seq(1, 25), seq(25, 1)) # the y values
group_nb <- c(replicate(25, "group1"), replicate(25, "group2"), replicate(25, "group1"), replicate(25, "group2")) # group info
df <- data.frame(x, y, group_nb)
id <- rle(group_nb)
df$id <- rep.int(seq_along(id$lengths), id$lengths)
ggplot(df, aes(x = x, y = y, group=id, color=group_nb, fill = group_nb)) +
geom_line() +
geom_area(alpha = 0.3)
Created on 2020-11-26 by the reprex package (v0.3.0)
If you need the lines/areas to connect, you'd need to replicate the first observation of a group with the ID from the previous group.

Density over histogram using ggplot2

I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).
ggplot(kz6, aes(x = values)) +
geom_histogram()
However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.
density <- ggplot(kz6, aes(x = x, fill = factor(sex))) +
geom_density()
both <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density()
both_2 <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density(aes(x = kz6[kz6$sex == 1,]))
P.S. some examples contains y=..density.. what does it mean? How to interpret this?
To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.
ggplot(kz6, aes(x = values)) +
geom_histogram(aes(y = ..density..), bins = 20) +
geom_density(aes(group = sex, colour = sex), adjust = 2)
Data creation code.
I will create a test data set from built-in data set iris.
kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)

ggplot2: create a plot using selected facets with part data

I would like to create a plot with
Using part of the data to create a base plot with facet_grid of two columns.
Use remaining part of the data and plot on top of the existing facets but using only a single column.
The sample code:
library(ggplot2)
library(gridExtra)
df2 <- data.frame(Class=rep(c('A','B','C'),each=20),
Type=rep(rep(c('T1','T2'),each=10), 3),
X=rep(rep(1:10,each=2), 3),
Y=c(rep(seq(3,-3, length.out = 10),2),
rep(seq(1,-4, length.out = 10),2),
rep(seq(-2,-8, length.out = 10),2)))
g2 <- ggplot() + geom_line(data = df2 %>% filter(Class %in% c('B','C')),
aes(X,Y,color=Class, linetype=Type)) +
facet_grid(Type~Class)
g3 <- ggplot() + geom_line(data = df2 %>% filter(Class == 'A'),
aes(X,Y,color=Class, linetype=Type)) +
facet_wrap(~Type)
grid.arrange(g2, g3)
The output plots:
How to include g3 plot on g2 plot? The resulting plot should include the g3 two lines twice on two facets.
I assume the plot below is what you were looking for.
library(dplyr)
library(ggplot2)
df_1 <- filter(df2, Class %in% c('B','C')) %>%
dplyr::rename(Class_1 = Class)
df_2 <- filter(df2, Class == 'A')
g2 <- ggplot() +
geom_line(data = df_1,
aes(X, Y, color = Class_1, linetype = Type)) +
geom_line(data = df_2,
aes(X, Y, color = Class, linetype = Type)) +
facet_grid(Type ~ Class_1)
g2
explaination
For tasks like this I found it better to work with two datasets. Since the variable df2$class has three unique values: A, B and C, faceting Class~Type does not give you desired plot, since you want the data for df2$Class == "A" to be displayed in the respective facets.
That's why I renamed variable Class in df_1 to Class_1 because this variable only contains two unique values: B and C.
Faceting Class_1 ~ Type allows you to plot the data for df2$Class == "A" on top without being faceted by Class.
edit
Based on the comment below here is a solution using only one dataset
g2 + geom_line(data = filter(df2, Class == 'A')[, -1],
aes(X, Y, linetype = Type, col = "A"))
Similar / same question: ggplot2:: Facetting plot with the same reference plot in all panels

Plotting a time series where color depends on a category with ggplot

Consider this minimum working example:
library(ggplot2)
x <- c(1,2,3,4,5,6)
y <- c(3,2,5,1,3,1)
data <- data.frame(x,y)
pClass <- c(0,1,1,2,2,0)
plottedGraph <- ggplot(data, aes(x = x, y = y, colour = factor(pClass))) + geom_line()
print(plottedGraph)
I have a time series y = f(x) where x is a timestep. Each timestep should have a color which depends on the category of the timestep, recorded in pClass.
This is the result it gives:
It doesn't make any kind of sense to me why ggplot would connect points with the same color together and not points that follow each other (which is what geom_line should do according to the documentation).
How do I make it plot the following:
You should use group = 1 inside the aes() to tell ggplot that the different colours in fact belong to the same line (ie. group).
ggplot(data, aes(x = x, y = y, colour = factor(pClass), group = 1)) +
geom_line()

Plot two graphs in the same plot [duplicate]

This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 4 years ago.
The solution with ggplot in this question worked really well for my data. However, I am trying to add a legend and everything that I tried does not work...
For example, in the ggplot example in the above question, how I can add a legend to show that the red curve is related to "Ocean" and the green curve is related to "Soil"? Yes, I want to add text that I will define and it is not related to any other variable in my data.frame.
The example below is some of my own data...
Rate Probability Stats
1.0e-04 1e-04 891.15
1.0e-05 1e-04 690
...
etc (it's about 400 rows). And I have two data frames similar to the above one.
So My code is
g <- ggplot(Master1MY, aes(Probability))
g <- g + geom_point(aes(y=Master1MY$Stats), colour="red", size=1)
g <- g + geom_point(aes(y=Transposon1MY$Stats), colour="blue", size=1)
g + labs(title= "10,000bp and 1MY", x = "Probability", y = "Stats")
The plot looks like
I just want a red and blue legend saying "Master" and "Transposon"
Thanks!
In ggplot it is generally most convenient to keep the data in a 'long' format. Here I use the function melt from the reshape2 package to convert your data from wide to long format. Depending how you specify different aesthetics (size, shape, colour et c), corresponding legends will appear.
library(ggplot2)
library(reshape2)
# data from the example you were referring to, in a 'wide' format.
x <- seq(-2, 2, 0.05)
ocean <- pnorm(x)
soil <- pnorm(x, 1, 1)
df <- data.frame(x, ocean, soil)
# melt the data to a long format
df2 <- melt(data = df, id.vars = "x")
# plot, using the aesthetics argument 'colour'
ggplot(data = df2, aes(x = x, y = value, colour = variable)) + geom_line()
Edit, set name and labels of legend
# Manually set name of the colour scale and labels for the different colours
ggplot(data = df2, aes(x = x, y = value, colour = variable)) +
geom_line() +
scale_colour_discrete(name = "Type of sample", labels = c("Sea water", "Soil"))
Edit2, following new sample data
Convert your data, assuming its organization from your update, to a long format. Again, I believe you make your ggplot life easier if you keep your data in a long format. I relate every step with the simple example data which I used in my first answer. Please note that there are many alternative ways to rearrange your data. This is one way, based on the small (non-reproducible) parts of your data you provided in the update.
# x <- seq(-2, 2, 0.05)
# Master1MY$Probability
Probability <- 1:100
# ocean <- pnorm(x)
# Master1MY$Stats
Master1MY <- rnorm(100, mean = 600, sd = 20)
# soil <- pnorm(x,1,1)
# Transposon1MY$Stats
Transposon1MY <- rnorm(100, mean = 100, sd = 10)
# df <- data.frame(x, ocean, soil)
df <- data.frame(Probability, Master1MY, Transposon1MY)
# df2 <- melt(df, id.var = "x")
df2 <- melt(df, id.var = "Probability")
# default
ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
geom_point()
# change legend name and labels, see previous edit using 'scale_colour_discrete'
# set manual colours scale using 'scale_colour_manual'.
ggplot(data = df2, aes(x = Probability, y = value, col = variable)) +
geom_point() +
scale_colour_manual(values = c("red","blue"), name = "Type of sample", labels = c("Master", "Transposon"))

Resources