Connect geom_line only between specified factors - r

I have a dataset that has diameter values for 4 treatment groups for several different months. I am plotting Diameter ~ Treatment for each month, as well as the Diameter changes between months ~ Treatment.
Dataset looks like this:
# the data that contains diameter for each month and diameter differences between months
> head(gatheredDiameterAndTreatmentData)
Treatment Month Diameter
1 Aux_Drop Diameter_mm.Sep01 55.88
2 Aux_Spray Diameter_mm.Sep01 63.50
3 DMSO Diameter_mm.Sep01 66.04
4 Water Diameter_mm.Sep01 43.18
5 Aux_Drop Diameter_mm.Sep01 38.10
6 Aux_Spray Diameter_mm.Sep01 76.20
# data that contains mean diameter and mean diameter changes for each month
> head(subMeansDiameter)
Treatment Month Diameter SEdiam
1 Aux_Drop Diameter_mm.Dec 83.63857 29.62901
2 Aux_Drop Diameter_mm.Feb01 101.20923 24.84024
3 Aux_Drop Diameter_mm.Feb02 110.00154 22.51364
4 Aux_Drop Diameter_mm.Jan 93.00308 25.13485
5 Aux_Drop Diameter_mm.Mar 116.84000 22.19171
6 Aux_Drop Diameter_mm.Nov01 74.50667 17.40454
Here is my code:
# assign the factors name to pick
factorsOnXaxis.DiameterByMonth = c(
"Diameter_mm.Sep01", "DiameterDiff.Sep01ToDec", "Diameter_mm.Dec", "DiameterDiff.DecToMar", "Diameter_mm.Mar")
# assign name to above factors
factorsOnXaxisName = c('Sep','Dec-Sep','Dec', 'Mar-Dec', 'Mar')
# start plotting
gatheredDiameterAndTreatmentData %>%
subset(Diameter != "NA") %>%
ggplot(aes(x = factor(Month), y = Diameter)) +
geom_point(aes(colour = Treatment), na.rm = TRUE,
position = position_dodge(width = 0.2)) +
geom_point(data = subMeansDiameter, size = 4, aes(colour = Treatment),
na.rm = TRUE, position = position_dodge(width = 0.2)) +
theme_bw() + # remove background
# add custom color to the "Treatment" levels
scale_colour_manual(
values = c("Aux_Drop" = "Purple", "Aux_Spray" = "Red",
"DMSO" = "Orange", "Water" = "Green")) +
# rearrange the x-axis
scale_x_discrete(limits = factorsOnXaxis.DiameterByMonth, labels = factorsOnXaxisName) +
# to connect the "subMeans - Diameter" values across time points
geom_line(data = subMeansDiameter, aes(
x = Month, y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
Which gives me a plot like this:
Instead of geom_line connecting line for each time points I want the line to be joined between specified x-axis factors, i.e
between Sep, Dec, March
between Dec-Sep to Mar-Dec
I tried to manipulate the code line that uses geom_line as:
geom_line(data = subMeansDiameter, aes(
x = c("DiameterDiff.Sep01ToDec", "DiameterDiff.DecToMar"), y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
to connect the line between Dec-Sep to Mar-Dec.
But, this is not working. How can I change my code?
Here is the data file I stores as *.tsv.
gatheredDiameterAndTreatmentData = http://s000.tinyupload.com/index.php?file_id=38251290073324236098
subMeans = http://s000.tinyupload.com/index.php?file_id=93947954496987393129

Here you need to define groups explicitly as color is not enough.
Your example is not reproducible but here's something that will give you the idea, here's a plot with no explicit group:
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species)) + geom_line()
And now here's one with a group aesthetic, I have split the data using Sepal.Length's values but you'll most likely use an ifelse deending on the month :
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species,
group = interaction(Species, Sepal.Length > 5.5))) +
geom_line()

Related

how to add significance letters from emmeans to a plot with fitted values

I have a dataset that looks like this with 3 more levels for scarification. Germination is my response variable.
scarification
time
germination
Water
0
0
Water
2
0
Water
4
8
Water
8
23
Ethanol
0
0
Ethanol
2
18
Ethanol
4
19
Ethanol
8
22
I have made a glm for the data and plotted the fitted values, and done pairwise contrasts using emmeans. I'd like to add letters to my bar chart to indicate letters of significance, but am having trouble extracting cld data as the cld function does not work with emmGrid objects, and the variable names used in emmeans are different to those used in the plot. I have tried renaming the variables but that does not work. I have also tried using geom_signif but that does not seem to work either.
geom_signif(comparisons = em,
+ test = "emmeans",
+ map_signif_level = TRUE)
Warning message:
Computation failed in `stat_signif()`
Caused by error in `mapped_discrete()`:
! Can't convert `x` <list> to <double>.
Here is the code I have so far
#make a glm
summary(mod_8 <- glm(cbind(germination, total - germination) ~ scarification*time, data = df, family = binomial))
# make a new df with the predicted values from the model, specifying for stratification to just do 0, 2, 4, and 8 from the continuous variable
mydf <- ggpredict(mod_8, terms = c("time [0,2,4,8]", "scarification"))
#add time as a factor to the new df
mydf$x_fact <- as.factor(mydf$x)
#get contrast values
em <- emmeans(mod_8, ~scarification + time,
at = list(time = c(0, 2, 4, 8)),
trans = "response") %>%
contrast(interaction = c("pairwise", "pairwise"),
by = "time")
#make a grouped bar chart with scarification (group) on the x axis, predicted on the y axis, and grouped by the factor version of time (x_fact)
ggplot(mydf, aes(x = group, y = predicted, fill = x_fact)) +
geom_col(position = "dodge") +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), position = position_dodge(width = 0.9)) +
labs(x = "Scarification", y = "Predicted Germination Proportion", fill = "Time") +
ggtitle("Grouped Bar Chart of Germination by Scarification and Time")
If anyone has any ideas I would appreciate it.

R - (ggplot2 library) - Legends not showing on graphs

What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?

How to customise the x-axis line in ggplot2 so that it is broken according to factors

I am struggling to replicate the x-axis design of a figure I have seen. Is it possible to use ggplot2 to recreate the a-axis?
I have tried to use the lemon package, but this didn't quite replicate the axis as I wanted.
I want the x-axis to look like the image on the left in this
EDIT
Apologies for any confusion, I have now edited in some example data and code as requested.
tidy_data:
Replicate Group Time
1 Control 0.09997222
2 Control 0.04466667
3 Control 0.08608333
4 Control 0.10712500
5 Control 0.11410000
6 Control 0.69333333
7 Control 0.42383333
8 Control 0.06105556
9 Control 0.08676667
1 Treatment 0.13700000
2 Treatment 0.02983333
3 Treatment 0.49608333
4 Treatment 0.97858333
5 Treatment 0.70900000
6 Treatment 0.18683333
7 Treatment 0.45283333
8 Treatment 1.30220833
9 Treatment 1.39908333
results_tbl:
mean_time sem upper_sem lower_sem treatment
0.1534459 0.03681368 0.1902596 0.1166323 Control
0.8238021 0.15860139 0.9824035 0.6652007 Treatment
Plot Code:
figure = ggplot() +
geom_quasirandom(
data = tidy_data,
aes(x = Group, y = Time, colour = Replicate),
size = 5,
varwidth = TRUE
) +
geom_point(
data = results_tbl,
aes(x = treatment, y = mean_time),
colour = "black",
size = 4
) +
geom_errorbar(
data = results_tbl,
aes(x = treatment, ymin = lower_sem, ymax = upper_sem),
alpha = 1,
size = 0.1,
width = 0.1,
colour = "black"
) +
scale_y_continuous(breaks = scales::pretty_breaks()) +
labs(x = "", y = "Mean time spent in zone ± SE (mins)") +
theme_cowplot(font_size = 16, line_size = 1) +
theme(axis.title.y = element_text(vjust = 2))

Plot grouped barplot with absolute and percent values + labels

I am quite new to R and especially to ggplot. For my next result I think I have to change from plot() to ggplot() where I need your help:
I have a dataframe with numeric values. One column is an absolute number, the other one is the belonging percentage value. I have 3 of this "two groups" indicators a, b and c.
The rownames are the 6 observations and are stored in the first column "X".
I want to plot them in a kind of grouped barplot, where the absolute+percent column is next to each other for the 3 indicators.
Sample dataframe:
df = data.frame(X = c("e 1","e 1,5","e 2","e 2,5","e 3","e 3,5","e 4"),
a_abs=c(-0.3693,-0.0735,-0.019,0.0015,0,-0.0224,-0.0135),
a_per=c(-0.4736,-0.0943,-0.0244,0.0019,0,-0.0287,-0.0173),
b_abs=c(-0.384,-0.0733,-0.0173,0.0034,0,-0.0204,-0.0179),
b_per=c(-0.546,-0.1042,-0.0246,0.0048,0,-0.029,-0.0255),
c_abs=c(-0.3876,-0.0738,-0.019,0.0015,0,-0.0225,-0.0137),
c_per=c(-0.4971,-0.0946,-0.0244,0.0019,0,-0.0289,-0.0176))
Thanks to #jonspring i got the following plot by using this code:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 2),
stat = str_sub(column, start = 4)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.001),
scales::percent(per, accuracy = 0.01)))
df3$group = gsub(df3$group,pattern = "CK",replacement = "Cohen's\nKappa")
df3$group = gsub(df3$group,pattern = "JA",replacement = "Jaccard")
df3$group = gsub(df3$group,pattern = "KA",replacement = "Krippen-\ndorff's Alpha")
crg = ifelse(df3$abs< 0,"red","darkgreen")
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group,
yend = 0),
color = crg) +
geom_point() +
geom_text(vjust = 1.5,
size = 3,
lineheight = 1.2) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X) +
labs(x= "Exponent", y = "Wert")
plot output
When i zoom and have the positive values visible, the labels are written inside the segments. How to place them above / below depending of a positive or negative value?
Zoom with coord_cartesian(ylim = c(-0.015,0.005))
zoomed plot
Thank you for your helping hands.
EDIT: I found the solution already. Like the color changement from red to green i used ifelse for the vjust parameter.
There are a lot of varieties of ways to display this sort of data with ggplot. I highly recommend you check out https://r4ds.had.co.nz/data-visualisation.html if you haven't already.
One suggestion you'll find there is that ggplot almost always works better if you first convert your data into long (aka "tidy") form. This puts each of the dimensions of the data into its own column, so that you can map the dimension to a visual aesthetic. Here's one way to do that:
library(tidyverse)
df2 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3),
value_label = if_else(stat == "per",
scales::percent(value, accuracy = 0.1),
scales::comma(value, accuracy = 0.01)))
Now, the group a/b/c is in its own column, as is the type of data abs/per, the values are all together in one column, and we also have text labels that suit the type of data.
> head(df2)
X column value group stat value_label
1 e 1 a_abs -0.3693 a abs -0.37
2 e 1,5 a_abs -0.0735 a abs -0.07
3 e 2 a_abs -0.0190 a abs -0.02
4 e 2,5 a_abs 0.0015 a abs 0.00
5 e 3 a_abs 0.0000 a abs 0.00
6 e 3,5 a_abs -0.0224 a abs -0.02
With that out of the way, it's simpler to try out different combinations of ggplot options, which can help highlight different comparisons within the data.
For instance, if you want to compare the different observations within each group, you could put each group into a facet, and each observation along the x axis:
ggplot(df2, aes(X, value, label = value_label)) +
geom_segment(aes(xend = X, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~group)
Or if you want to highlight how the different groups compared within each observation, you could swap them, like this:
ggplot(df2, aes(group, value, label = value_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~X)
You might also try combining the abs and per data, since they only vary slightly based on the different denominators applicable to each group and/or observation. To do that, it might be simpler to transform the data to keep each abs and per together:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.01),
scales::percent(per, accuracy = 0.1)))
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 1.5, size = 2, lineheight = 0.8) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X)

Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included

First off, here is my data frame:
> df.combined
MLSupr MLSpred MLSlwr BPLupr BPLpred BPLlwr
1 1.681572 1.392213 1.102854 1.046068 0.8326201 0.6191719
2 3.363144 2.784426 2.205708 2.112885 1.6988250 1.2847654
3 5.146645 4.232796 3.318946 3.201504 2.5999694 1.9984346
4 6.930146 5.681165 4.432184 4.368555 3.6146180 2.8606811
5 8.713648 7.129535 5.545422 5.480557 4.5521112 3.6236659
6 10.497149 8.577904 6.658660 6.592558 5.4896044 4.3866506
7 12.280651 10.026274 7.771898 7.681178 6.3907488 5.1003198
8 14.064152 11.474644 8.885136 8.924067 7.4889026 6.0537381
9 15.847653 12.923013 9.998373 10.125539 8.5444783 6.9634176
10 17.740388 14.429805 11.119222 11.327011 9.6000541 7.8730970
11 19.633122 15.936596 12.240071 12.620001 10.7425033 8.8650055
12 21.525857 17.443388 13.360919 13.821473 11.7980790 9.7746850
13 23.535127 19.010958 14.486789 15.064362 12.8962328 10.7281032
14 25.544397 20.578528 15.612659 16.307252 13.9943865 11.6815215
15 27.553667 22.146098 16.738529 17.600241 15.1368357 12.6734300
16 29.562937 23.713668 17.864399 18.893231 16.2792849 13.6653384
17 31.572207 25.281238 18.990268 20.245938 17.4678163 14.6896948
18 33.581477 26.848807 20.116138 21.538928 18.6102655 15.6816033
19 35.590747 28.416377 21.242008 22.891634 19.7987969 16.7059597
20 37.723961 30.047177 22.370394 24.313671 21.0352693 17.7568676
So, as you can see, i have predicted values along with the upper and lower bounds of their 95% CI. I'd like to plot the lines and their ribbons for MLS and BPL in the same plot but i'm not quite sure how.
Right now, for a single data set, I am using this command:
ggplot(BULISeason, aes(x = 1:length(BULISeason$`Running fit`), y = `Running fit`)) +
geom_line(aes(fill = "black")) +
geom_ribbon(aes(ymin = `Running lwr`, ymax = `Running upr`, fill = "red"),alpha = 0.25)
Note: The variables are different for the independent data frames.
You can, of course, construct your plots as a series of layers like you imply in your question. For that you can use the following code:
ggplot(data = df.combined) +
geom_ribbon(aes(x = x, ymin = MLSlwr, ymax = MLSupr),
fill = "blue", alpha = 0.25) +
geom_line(aes(x = x, y = MLSpred), color = "black") +
geom_ribbon(aes(x = x, ymin = BPLlwr, ymax = BPLupr),
fill = "red", alpha = 0.25) +
geom_line(aes(x = x, y = BPLpred), color = "black")
and obtain something like this:
However, reshaphing your dataset to a "tidy", or long format, has some advantages. For example you could map the origin of the predictions into a color and the type of prediction into line types in the resulting plot:
You can achieve that using the following code:
library(tidyr)
tidy.data <- df.combined %>%
# add id variable
mutate(x = 1:20) %>%
# reshape to long format
gather("variable", "value", 1:6) %>%
# separate variable names at position 3
separate(variable,
into = c("model", "line"),
sep = 3,
remove = TRUE)
# plot
ggplot(data = tidy.data, aes(x = x,
y = value,
linetype = line,
color = model)) +
geom_line() +
scale_linetype_manual(values = c("dashed", "solid", "dashed"))
You can still use ribbons in your plot by spreading your dataframe back to a wide(r) format:
# back to wide
wide.data <- tidy.data %>%
spread(line, value)
# plot with ribbon
ggplot(data = wide.data, aes(x = x, y = pred)) +
geom_ribbon(aes(ymin = lwr, ymax = upr, fill = model), alpha = .5) +
geom_line(aes(group = model))
Hope this helps!

Resources