Label or Highlight Specific Rows in ggplot2 - r

I have a great looking geom_tile plot, but I need a way to highlight specific rows or label specific rows based on a binary value.
Here is a small subset of data in wide format and resulting output:
df <- structure(list(bin_level = c(0,1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
)), class = "data.frame", row.names = 1:2)
> df
bin_level sequence X236 X237 X238 X239 X240 X241
1 0 L19088.1 G G A T A G
2 1 chr1_43580199_43586187 . . a C c G
The actual dataset is much larger, with 1045 observations of 3096 variables.
My goal is to plot this massive dataset as a heatmap with colors for each different nucleotide and be able to differentiate between rows with bin_levels of 0 and 1.
The following code makes a great plot, but doesn't include the bin_level differences I need to see. I would like to highlight the entire row if the bin_level is 1, but I haven't been able to find anything on how to do such a thing. I am already using nucleotides for the aes fill variable, so I need something else. The best option I've come up with so far is to color the row labels. I used info from this post to try an ifelse statement to color based on the bin_level variable.
The biggest problems here are
Row axis titles are much too long and too many to look good
There are only 53 bin_level rows with a 1 (of 1045 total), so why does it look like a LOT more red than there should be?
I want the red labels (bin_level =1's) at the top of the plot, and the mix of black/red makes me think my arrange(bin_level) piece isn't working right.
Please let me know if you know of a better way to accomplish what I'm trying to accomplish, or can help make my code work better than it is currently. Thank you!
df %>%
## reshape to long table
## (one column each for sequence, position and nucleotide):
pivot_longer(-c("Sequence", "bin_level"), ## stack all columns *except* sequence and bin_level
names_to = 'position',
values_to = 'nucleotide'
) %>%
arrange(bin_level) %>%
## create the plot:
ggplot() +
geom_tile(aes(x = position, y = Sequence, fill = nucleotide),
height = 1 ## adjust to visually separate sequences
) +
scale_fill_manual(values = c('a'='#ea0064', 'c'='#008a3f', 'g'='#116eff',
't'='#cf00dc', '\U00B7'='#000000', 'X' ='#ffffff'
)
) +
labs(x = 'x-axis-title', y='Sequence') +
## remove x-axis (=position) elements: they'll probably be too dense:
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_text(colour = ifelse(levels(df$bin_level)==1, "red", "black"))
)

While passing a vector of colors to element_text() is a quick option in some cases IMHO in more general cases it is error prone and requires to keep an eye on the way you ordered your data. Instead I would suggest to have a look at the ggtext package which introduces the theme element element_markdown and allows for styling text using some HTML, CSS and markdown.
Moreover, besides the issue already pointed out by #I_O another issue is that you wrangle the data manipulation steps together with the plotting code in one pipeline. As a consequence while you arrange your data by bin_level you use the original unmanipulated, unarranged dataset df which by the way is still in wide format for the color assignment. That's why personally I would always recommend to split the data wrangling and the plotting except for very simple cases.
Finally, while your arranged your data by bin_level what really matters is the order of sequence, i.e. you have to set the order of sequence after arranging for which I use forecast::fct_inorder.
Note: To make your example more realistic I duplicated your dataset to add two more rows.
library(tidyr)
library(dplyr)
library(ggplot2)
df_long <- df %>%
pivot_longer(-c("sequence", "bin_level"),
names_to = "position",
values_to = "nucleotide"
) %>%
arrange(bin_level) %>%
mutate(
sequence = if_else(bin_level == 1, paste0("<span style='color: red'>", sequence, "</span>"), sequence),
sequence = forcats::fct_inorder(sequence))
ggplot(df_long) +
geom_tile(aes(x = position, y = sequence, fill = nucleotide),
height = 1
) +
scale_fill_manual(values = c(
"a" = "#ea0064", "c" = "#008a3f", "g" = "#116eff",
"t" = "#cf00dc", "\U00B7" = "#000000", "X" = "#ffffff"
)) +
labs(x = "x-axis-title", y = "Sequence") +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = ggtext::element_markdown()
)
DATA
df <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.1", "chr1_43580199_43586187"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df1 <- structure(list(
bin_level = c(0, 1), sequence = c("L19088.2", "chr1_43580199_43586187.2"), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G")
), class = "data.frame", row.names = 1:2)
df <- dplyr::bind_rows(df, df1)

While you arrange the data by bin level before feeding it into ggplot, the plot's vertical arrangement follows the y-value (which is: sequence). You could create a combination of bin_level and sequence to arrange and plot the data by:
df %>%
...
## reformat bin_level to a three-digit character, so that
## 002 properly precedes 011 (otherwise 11 would come before 2)
mutate(dummy = paste(sprintf('%03.0f', bin_level),
Sequence, sep = '_')) %>%
arrange(dummy) %>%
...
## ggplot instructions:
ggplot() + ... +
geom_tile(aes(y = dummy, ...)) +
## remove the bin_level prefix ('00x_') for labelling:
scale_y_discrete(labels = gsub('.*_', '', df$dummy)) +
... +
theme(axis.text.y = element_text(
## note: df$bin_level NOT levels(df$bin_level)
colour = ifelse(df$bin_level == 1, "red", "black"))
)
mind that using element_text to colour labels might not function in the future:
Vectorized input to element_text() is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
(console warning)

Related

How can I transform my data frame from wide to long in R?

I have issues with transforming my data frame from wide to long. I'm well aware that there are plenty of excellent vignettes out there, which explain gather() or pivot_longer() very precisely (e.g. https://www.storybench.org/pivoting-data-from-columns-to-rows-and-back-in-the-tidyverse/). Nevertheless, I'm still stuck for days now and this drives me crazy. Thus, I dediced to ask the internet. You.
I have a data frame that looks like this:
id <- c(1,2,3)
year <- c(2018,2003,2011)
lvl <- c("A","B","C")
item.1 <- factor(c("A","A","C"),levels = lvl)
item.2 <- factor(c("C","B","A"),levels = lvl)
item.3 <- factor(c("B","B","C"),levels = lvl)
df <- data.frame(id,year,item.1,item.2,item.3)
So we have an id variable for each observation (e.g. movies). We have a year variable, indicating when the observation took place (e.g. when the movie was released). And we have three factor variables that assessed different characteristics of the observation (e.g. cast, storyline and film music). Those three factor variables share the same factor levels "A","B" or "C" (e.g. cast of the movie was "excellent", "okay" or "shitty").
But in my wildest dreams, the data more look like this:
id.II <- c(rep(1, 9), rep(2, 9), rep(3,9))
year.II <- c(rep(2018, 9), rep(2003, 9), rep(2011,9))
item.II <- rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3)
rating.II <- rep(c("A", "B", "C"), 9)
number.II <- c(1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1)
df.II <- data.frame(id.II,year.II,item.II,rating.II,number.II)
So now the data frame would be way more useable for further analysis. For example, the next step would be to calculate for each year the number (or even better percentage) of movies that were rated as "excellent".
year.III <- factor(c(rep(2018, 3), rep(2003, 3), rep(2011,3)))
item.III <- factor(rep(c(1, 2, 3), 3))
number.A.III <- c(1,0,0,1,0,0,0,1,0)
df.III <- data.frame(year.III,item.III,number.A.III)
ggplot(data=df.III, aes(x=year.III, y=number.A.III, group=item.III)) +
geom_line(aes(color=item.III))+
geom_point(aes(color=item.III))+
theme(panel.background = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
legend.position = "bottom")+
labs(colour="Item")
Or even more important to me, show for each item (cast, storytelling, film music) the percentage of being rated as "excellent", "okay" and "shitty".
item.IV <- factor(rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3))
rating.IV <- factor(rep(c("A", "B", "C"), 9))
number.IV <- c(2,0,1,1,1,1,0,2,1)
df.IV <- data.frame(item.IV,rating.IV,number.IV)
df.IV
ggplot(df.IV,aes(fill=rating.IV,y=number.IV,x=item.IV))+
geom_bar(position= position_fill(reverse = TRUE), stat="identity")+
theme(axis.title.y = element_text(size = rel(1.2), angle = 0),
axis.title.x = element_blank(),
panel.background = element_blank(),
legend.title = element_blank(),
legend.position = "bottom")+
labs(x = "Item")+
coord_flip()+
scale_x_discrete(limits = rev(levels(df.IV$item.IV)))+
scale_y_continuous(labels = scales::percent)
My primary question is: How do I transform the data frame df into df.II?
That would make my day. Wrong. My weekend.
And if you could then also give a hint how to proceed from df.II to df.III and df.IV that would be absolutely mindblowing. However, I don't want to burden you too much with my problems.
Best wishes
Jascha
Does this achieve what you need?
library(tidyverse)
df_long <- df %>%
pivot_longer(cols = item.1:item.3, names_to = "item", values_to = "rating") %>%
mutate(
item = str_remove(item, "item.")
)
df2 <- crossing(
df_long,
rating_all = unique(df_long$rating)
) %>%
mutate(n = rating_all == rating) %>%
group_by(id, year, item, rating_all) %>%
summarise(n = sum(n))
df3 <- df2 %>%
filter(item == "3")

Add ticks in-between discrete groups on x-axis

I want to replace one of my grouped boxplots (below) to before-after kind, but keep it grouped. This one was made using ggboxplot() from ggpubr. I know there's also ggpaired() but I couldn't manage to make it grouped like this one.
Thanks to this question I was able to create grouped before-after graph like this one. I would now like to change the axis from 4 marks to just 2 (just "yes" and "no", since "before" and "after" are still in the legend.
Here's my code with dummy data:
library(tidyverse)
set.seed(123)
data.frame(ID = rep(LETTERS[1:10], 2),
consent = rep(sample(c("Yes", "No"), 10, replace = T), 2),
height = sample(rnorm(20, 170, sd = 10)),
ind = rep(c("before", "after"), each = 2)
) %>%
ggplot(aes(x = interaction(ind, consent), y = height, color = ind))+
geom_point()+
geom_line(aes(group = interaction(ID, consent)), color = "black")+
scale_x_discrete("response")
Is it even possible to reduce number of categories on axis? Or can I create grouped plot using ggpaired(), but without using facets?
Solution can be to create dummy numeric variable (in-between before and after) and put it on the x-axis. Then you can change it's names.
# Generate OP data
library(tidyverse)
set.seed(123)
df <- data.frame(ID = rep(LETTERS[1:10], 2),
consent = rep(sample(c("Yes", "No"), 10, replace = T), 2),
height = sample(rnorm(20, 170, sd = 10)),
ind = rep(c("before", "after"), each = 2)
)
df$name <- paste(df$consent, df$ind)
# Generate dummy numeric variable for `name` combinations
foo <- data.frame(name = c("Yes before", "Yes", "Yes after",
"No before", "No", "No after"),
X = 1:6)
# name X
# 1 Yes before 1
# 2 Yes 2
# 3 Yes after 3
# 4 No before 4
# 5 No 5
# 6 No after 6
And now we just need to map name to X and put it on x-axis:
df <- merge(foo, df)
ggplot(df, aes(X, height))+
geom_point(aes(color = ind)) +
geom_line(aes(group = interaction(ID, consent))) +
scale_x_continuous(breaks = c(2, 5), labels = foo$name[c(2, 5)])
#camille made me think about facety solution. Apparently, it is possible to put facet labels not just to the bottom of the plot, but even under the axis. Which solved my problem without having to modify my dataframe:
library(ggpubr) #for theme_pubr and JCO palette
ggplot(df, aes(x = ind, y = height, group = ID))+
geom_point(aes(color = ind), size = 3)+
geom_line()+
labs(y = "Height")+
facet_wrap(~ consent,
strip.position = "bottom", ncol = 5)+ #put facet label to the bottom
theme_pubr()+
color_palette("jco")+
theme(strip.placement = "outside", #move the facet label under axis
strip.text = element_text(size = 12),
strip.background = element_blank(),
axis.title.x = element_blank(),
legend.position = "none")
Result with dataframe from the question:

ggplot geom_bar where x = multiple columns

How can I go about making a bar plot where the X comes from multiple values of a data frame?
Fake data:
data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)),
col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))
What I'm trying to do is plot the number (also, ideally, the percentage) of Y's and N's in col4 based on grouped by col1, col2, and col3.
Overall, if there are 50 rows and 25 of the rows have Y's, I should be able to make a graph that looks like this:
I know a basic barplot with ggplot is:
ggplot(data, aes(x = col1, fil = col4)) + geom_bar()
I'm not looking for how many of col4 is found per col3 by col2, though, so facet_wrap() isn't the trick, I think, but I don't know what to do instead.
You need to first convert your data frame into a long format, and then use the created variable to set the facet_wrap().
data_long <- tidyr::gather(data, key = type_col, value = categories, -col4)
ggplot(data_long, aes(x = categories, fill = col4)) +
geom_bar() +
facet_wrap(~ type_col, scales = "free_x")
A very rough approximation, hoping it'll spark conversation and/or give enough to start.
Your data is too small to do much, so I'll extend it.
set.seed(2)
n <- 100
d <- data.frame(
cat1 = sample(c('A','B','C'), size=n, replace=TRUE),
cat2 = sample(c(2012L,2013L,2014L,2015L), size=n, replace=TRUE),
cat3 = sample(c('^','v','<','>'), size=n, replace=TRUE),
val = sample(c('X','Y'), size=n, replace=TRUE)
)
I'm using dplyr and tidyr here to reshape the data a little:
library(ggplot2)
library(dplyr)
library(tidyr)
d %>%
tidyr::gather(cattype, cat, -val) %>%
filter(val=="Y") %>%
head
# Warning: attributes are not identical across measure variables; they will be dropped
# val cattype cat
# 1 Y cat1 A
# 2 Y cat1 A
# 3 Y cat1 C
# 4 Y cat1 C
# 5 Y cat1 B
# 6 Y cat1 C
The next trick is to facet it correctly:
d %>%
tidyr::gather(cattype, cat, -val) %>%
filter(val=="Y") %>%
ggplot(aes(val, fill=cattype)) +
geom_bar() +
facet_wrap(~cattype+cat, nrow=1)
Depending on what you want here, you can also achieve something like what you want using melt from the reshape package.
(NOTE: this solution is very similar to Phil's, and you could convert it to be just let his if you made col4 your fill instead, didn't filter by only "Y"s and included a facet wrap)
Following on from your data setup:
library(reshape)
#Reshape the data to sort it by all the other column's categories
data$col2 <- as.factor(as.character(data$col2))
breakdown <- melt(data, "col4")
#Our x values are the individual values, e.g. A, 2012, Down.
#Our fill is what we want it grouped by, in this case variable, which is our col1, col2, col3 (default column name from melt)
ggplot(subset(breakdown, col4 == "Y"), aes(x = value, fill = variable)) +
geom_bar() +
# scale_x_discrete(drop=FALSE) +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
ylab("Number of Yes's")
I'm not 100% sure what you want, but perhaps this is more like it?
EDIT
To show percentages of Yes's instead we can use ddply from the plyr package to create a data frame which has each of the variables with their yes percentages, then make the barplot plot a value rather than a count.
#The ddply applies a function to a data frame grouped by columns.
#In this case we group by our col1, col2 and col3 as well as the value.
#The function I apply just calculated the percentage, i.e. number of yeses/number of responses
plot_breakdown <- ddply(breakdown, c("variable", "value"), function(x){sum(x$col4 == "Y")/nrow(x)})
#When we plot we not add y = V1 to plot the percentage response
#Also in geom_bar I've now added stat = 'identity' so it doesn't try and plot counts
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable)) +
geom_bar(aes(group = factor(variable)), position = "dodge", stat = 'identity') +
scale_x_discrete(drop=FALSE) +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
ylab("Percentage of Yes's") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))
The last line I've added to the ggplot is to just make the y axis look a bit more percentage-y :)
In the comments you've mentioned you want to do this as the sample sizes are different and you want to give some kind of fair comparison between categories. My advice is to be careful here. Percentages look good, but can really misconstrue thing if sample sizes are small. To say 0% answered yes when you only got one response is heavily biased, for example. My advice here would be to either exclude columns with what you deem too small a sample size, or take advantage of the colour field.
#Adding an extra column using ddply again which generates a 1 if the sample size is less than 3, and a 0 otherwise
plot_breakdown <- cbind(plot_breakdown,
too_small = factor(ddply(breakdown, c("variable", "value"), function(x){ifelse(nrow(x)<3,1,0)})[,3]))
#Same ggplot as before, except with a colour variable now too (outside line of bar)
#Because of this I also added a way to customise the colours which display, and the names of the colour legend
ggplot(plot_breakdown, aes(x = value, y = V1, fill = variable, colour = too_small)) +
geom_bar(size = 2, position = "dodge", stat = 'identity') +
scale_x_discrete(drop=FALSE) +
labs(fill = "Variable", colour = "Too small?") +
scale_fill_discrete(labels = c("Letters", "Year", "Direction")) +
scale_colour_manual(values = c("black", "red"), labels = c("3+ response", "< 3 responses")) +
ylab("Percentage of Yes's") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25), labels = c("0%", "25%", "50%", "75%", "100%"))
If you actually group your Y's and N's by the other three columns, there will be one observation in each group. However, if you had repeated Y's and N's you could recode them to 1's and 0's, and get the percentage. Here's an example:
library(tidyverse)
data <- data.frame(col1 = rep(c("A", "B", "C", "B", "C", "A", "A", "B", "B", "A", "C")),
col2 = rep(c(2012, 2012, 2012, 2013, 2013, 2014, 2014, 2014, 2015, 2015, 2015)),
col3 = rep(c("Up", "Down", "Up", "Up", "Down", "Left", "Right", "Up", "Right", "Down", "Up")),
col4 = rep(c("Y", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "N", "Y")))
data %>%
dplyr::group_by(col1,col2,col3) %>%
mutate(col4 = ifelse(col4 == "Y", 1,0)) %>%
dplyr::summarise(percentage = mean(col4)) %>%
ggplot(aes(x = col1, y = percentage, color = as.factor(col2), fill = col3)) +
geom_col(position = position_dodge(width = .5))

ggplo2 in R: geom_segment displays different line than geom_line

Say I have this data frame:
treatment <- c(rep("A",6),rep("B",6),rep("C",6),rep("D",6),rep("E",6),rep("F",6))
year <- as.numeric(c(1999:2004,1999:2004,2005:2010,2005:2010,2005:2010,2005:2010))
variable <- c(runif(6,4,5),runif(6,5,6),runif(6,3,4),runif(6,4,5),runif(6,5,6),runif(6,6,7))
se <- c(runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5))
id <- 1:36
df1 <- as.data.table(cbind(id,treatment,year,variable,se))
df1$year <- as.numeric(df1$year)
df1$variable <- as.numeric(df1$variable)
df1$se <- as.numeric(df1$se)
As I mentioned in a previous question (draw two lines with the same origin using ggplot2 in R), I wanted to use ggplot2 to display my data in a specific way.
I managed to do so using the following script:
y1 <- df1[df1$treatment=='A'&df1$year==2004,]$variable
y2 <- df1[df1$treatment=='B'&df1$year==2004,]$variable
y3 <- df1[df1$treatment=='C'&df1$year==2005,]$variable
y4 <- df1[df1$treatment=='D'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y6 <- df1[df1$treatment=='F'&df1$year==2005,]$variable
p <- ggplot(df1,aes(x=year,y=variable,group=treatment,color=treatment))+
geom_line(aes(y = variable, group = treatment, linetype = treatment, color = treatment),size=1.5,lineend = "round") +
scale_linetype_manual(values=c('solid','solid','solid','dashed','solid','dashed')) +
geom_point(aes(colour=factor(treatment)),size=4)+
geom_errorbar(aes(ymin=variable-se,ymax=variable+se),width=0.2,size=1.5)+
guides(colour = guide_legend(override.aes = list(shape=NA,linetype = c("solid", "solid",'solid','dashed','solid','dashed'))))
p+labs(title="Title", x="years", y = "Variable 1")+
theme_classic() +
scale_x_continuous(breaks=c(1998:2010), labels=c(1998:2010),limits=c(1998.5,2010.5))+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y3),colour='blue1',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y4),colour='blue1',size=1.5,linetype='dashed')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y5),colour='red3',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y6),colour='red3',size=1.5,linetype='dashed')+
scale_color_manual(values=c('blue1','red3','blue1','blue1','red3','red3'))+
theme(text = element_text(size=12))
As you can see I used both geom_line and geom_segment to display the lines for my graph.
It's almost perfect but if you look closely, the segments that are drawn (between 2004 and 2005) do not display the same line size, even though I used the same arguments values in the script (i.e. size=1.5 and linetype='solid' or dashed).
Of course I could change manually the size of the segments to get similar lines, but when I do that, segments are not as smooth as the lines using geom_line.
Also, I get the same problem (different line shapes) by including the size or linetype arguments within the aes() argument.
Do you have any idea what causes this difference and how I can get the exact same shapes for both my segments and lines ?
It seems to be an anti-aliasing issue with geom_segment, but that seems like a somewhat cumbersome approach to begin with. I think I have resolved your issue by duplicating the A and B treatments in the original data frame.
# First we are going to duplicate and rename the 'shared' treatments
library(dplyr)
library(ggplot2)
df1 %>%
filter(treatment %in% c("A", "B")) %>%
mutate(treatment = ifelse(treatment == "A",
"AA", "BB")) %>%
bind_rows(df1) %>% # This rejoins with the original data
# Now we create `treatment_group` and `line_type` variables
mutate(treatment_group = ifelse(treatment %in% c("A", "C", "D", "AA"),
"treatment1",
"treatment2"), # This variable will denote color
line_type = ifelse(treatment %in% c("AA", "BB", "D", "F"),
"type1",
"type2")) %>% # And this variable denotes the line type
# Now pipe into ggplot
ggplot(aes(x = year, y = variable,
group = interaction(treatment_group, line_type), # grouping by both linetype and color
color = treatment_group)) +
geom_line(aes(x = year, y = variable, linetype = line_type),
size = 1.5, lineend = "round") +
geom_point(size=4) +
# The rest here is more or less the same as what you had
geom_errorbar(aes(ymin = variable-se, ymax = variable+se),
width = 0.2, size = 1.5) +
scale_color_manual(values=c('blue1','red3')) +
scale_linetype_manual(values = c('dashed', 'solid')) +
labs(title = "Title", x = "Years", y = "Variable 1") +
scale_x_continuous(breaks = c(1998:2010),
limits = c(1998.5, 2010.5))+
theme_classic() +
theme(text = element_text(size=12))
Which will give you the following
My numbers are different since they were randomly generated.
You can then modify the legend to your liking, but my recommendation is using something like geom_label and then be sure to set check_overlap = TRUE.
Hope this helps!

Stacked bar plot in violin plot shape

Maybe this is a stupid idea, or maybe it's a brain wave. I have a dataset of lipid classes in 4 different species. The data is proportional, and the sums are 1000. I want to visualise the differences in proportions for each class in each species. Generally a stacked bar would be the way to go here, but there are several classes, and it becomes uninterpretable since only the bottom class shares a baseline (see below).
And this appears to be the best option of a bad bunch, with pie and donut charts being nothing short of sneered at.
I was then inspired by this creation Symmetrical, violin plot-like histogram?, which creates a sort of stacked distribution violin plot (see below).
I am wondering if this could somehow be converted into a stacked violin, such that each segment represents a whole variable. In the case of my data, species' A and D would be 'fat' around the TAG segment, and 'skinnier' at the STEROL segment. This way the proportions are depicted horizontally, and always have a common baseline. Thoughts?
Data:
structure(list(Sample = c("A", "A", "A", "B", "B", "B", "C",
"C", "C", "D", "D"), WAX = c(83.7179798600773, 317.364310355766,
20.0147496567679, 93.0194886619568, 78.7886829173726, 79.3445694220837,
91.0020522660375, 88.1542855137005, 78.3313314713951, 78.4449591023115,
236.150030864875), TAG = c(67.4640254081232, 313.243238213156,
451.287867136276, 76.308508343969, 40.127554151831, 91.1910102221636,
61.658394708941, 104.617259648364, 60.7502685224869, 80.8373642262043,
485.88633863193), FFA = c(41.0963382465756, 149.264019576272,
129.672579626868, 51.049208042632, 13.7282635713804, 30.0088572108344,
47.8878116348504, 47.9564218319094, 30.3836532949481, 34.8474205480686,
10.9218910757234), `DAG1,2` = c(140.35876401479, 42.4556176551009,
0, 0, 144.993393432366, 136.722412691012, 0, 140.027443968931,
137.579074961889, 129.935353616471, 46.6128854387559), STEROL = c(73.0144390122309,
24.1680929257195, 41.8258704279641, 78.906816661241, 67.5678558060943,
66.7150537517493, 82.4794113296791, 76.7443442992891, 68.9357008866253,
64.5444668132533, 29.8342694785768), AMPL = c(251.446564854412,
57.8713327050339, 306.155806819949, 238.853696442419, 201.783872969561,
175.935515655693, 234.169038776536, 211.986239116884, 196.931330316831,
222.658181144794, 73.8944654414811), PE = c(167.99718650752,
43.3839497916674, 22.1937177530762, 150.315149187176, 153.632530721031,
141.580725482114, 164.215442147509, 155.113323256627, 143.349000132624,
128.504657216928, 50.6281347160092), PC = c(174.904702096271,
52.2494387772846, 28.8494085790995, 191.038328534942, 190.183655117756,
175.33290326259, 199.2632149392, 175.400682364295, 176.64926273487,
163.075864395099, 66.071984352649), LPC = c(0, 0, 0, 120.508804125665,
109.194191312608, 103.16895230176, 119.324634197247, 0, 107.09037767833,
97.151732936871, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L), .Names = c("Sample", "WAX", "TAG",
"FFA", "DAG1,2", "STEROL", "AMPL", "PE", "PC", "LPC"))
This is essentially a horizontal bar plot:
library(reshape2)
DFm <- melt(DF, id.vars = "Sample")
DFm1 <- DFm
DFm1$value <- -DFm1$value
DFm <- rbind(DFm, DFm1)
ggplot(DFm, aes(x = "A", y = value / 10, fill = variable, color = variable)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
theme_minimal() +
facet_wrap(~ Sample, nrow = 1, switch = "x") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank())

Resources