R: connect points on a graph (ggplot2) - r

Suppose I have data in the following form:
library(ggplot2)
Data <- data.frame(
"ID" = c("ABC111", "ABC111", "ABC111", "ABC111", "ABC112", "ABC112", "ABC112", "ABC113", "ABC113", "ABC114", "ABC115"),
"color" = c("red", "red", "red", "red", "blue", "blue", "blue", "green", "green", "black", "yellow"),
"start_date" = c("2005/01/01", "2006/01/01", "2007/01/01", "2008/01/01", "2009/01/01", "2010/01/01", "2011/01/01", "2012/01/01", "2013/01/01", "2014/01/01", "2015/01/01"),
"end_date" = c("2005/09/01", "2006/06/01", "2007/04/01", "2008/05/07", "2009/06/01", "2010/10/01", "2011/12/12", "2013/05/01", "2013/06/08", "2015/01/01", "2016/08/09")
)
Data$ID = as.factor(Data$ID)
Data$color = as.factor(Data$color)
Now what I want to do is for each row, plot the start_date and the end_date ... and then connect them with a straight line. I believe this can be done with geom_line() in ggplot2.
I want something that looks like this:
I tried using the following code:
q <- qplot(start_date, end_date, data=Data)
q <- q + geom_line(aes(group = ID))
q
But the graph looks completely different than what I expected.
Can anyone please show me what I am doing wrong?
Thanks

Does the following work for you?
ggplot(data = Data, aes(start_date, end_date, color = ID))+
geom_line(aes(group = ID))+
geom_point()
or maybe geom_segment ?
# Adding x and y coordinates for geom_segment
Data$x <- as.character(as.Date(Data$start_date) + (as.Date(Data$end_date) - as.Date(Data$start_date)))
Data$y <- 1:nrow(Data)
ggplot(data = Data, aes(x, y, colour = ID))+
geom_segment(aes(xend = start_date, yend = end_date))

Here's a solution using the tidyverse package. I used the number of each row in the original data as the y-axis values in the plot. As these values are meaningless, I removed the y-axis title, labels and ticks from the plot.
library(tidyverse)
Data %>%
# Number each row in its order of appearance,
# save this numbers in a new column named order
rowid_to_column("order") %>%
# Change data from wide to long format
pivot_longer(cols = c(start_date, end_date),
names_to = "date_type",
values_to = "date") %>%
# Ggplot, use date as x, order as y, ID as col and order as group
ggplot(aes(x = date,
y = order,
col = ID,
group = order)) +
# Draw points
geom_point()+
# Draw lines
geom_line() +
# Maybe you want to remove the y axis title, text and ticks
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
# I added a vertical format to the x axis labels
# it might easier to read this way
axis.text.x = element_text(angle = 90, vjust = 0.5))

Related

match colours with specific variables

I have a colour data frame that I join with my input data to match the colours to categories. The issue is that when using fill=mycolour the legend displays the colour names and not the names of my categories.
I would like fill to be name_assigned while still matching the colours in mycolors.
df %>%
dplyr::left_join(colors.variable, by="name_assigned") %>%
ggplot(aes(reorder(chr,chr,function(x)-length(x)),y=name_assigned, fill=mycolors)) +
geom_bar(aes(y = (..count..))) +
scale_fill_identity()
You don't need the join, from the colors data set create a named colors vector and use it in scale_fill_manual.
Also, you seem to have swapped x and y coordinates.
library(ggplot2)
set.seed(2022)
df <- data.frame(
chr = rbinom(1e3, 1, 0.5),
name_assigned = sample(letters[1:3], 1e3, TRUE)
)
colors.variable <- data.frame(
name_assigned = letters[1:3],
mycolors = c("pink", "purple", "seagreen")
)
mycolors <- with(colors.variable, setNames(mycolors, name_assigned))
ggplot(df, aes(name_assigned, fill = name_assigned)) +
geom_bar() +
scale_fill_manual(values = mycolors)

Trying to change colour of one variable in ggplot geom_bar dependent on the string

I have a for loop to run through a tonne of microbiome data (using phyloseq) and generate plots for multiple experiments.
ggplot(data_M1, aes(x = Sample, y = Abundance, fill = get(i))) +
geom_bar(stat = "identity")+
facet_wrap(vars(Status, Time.Point, Treatment), scales = "free", ncol=2)+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
guides(fill = guide_legend(reverse = TRUE, keywidth = 1, keyheight = 1, title = i))+
ylab(yaxisname)+
ggtitle(plotname)+
ggsave(ggsavename, last_plot())
Example outcome:
What I am trying to do though is make all the "_unclassified" samples/ sequencing data grey... so maybe I need some kind of if statement with str_contains?
Happy to dput a reproducible example if required but someone might have a simple solution.
Thank you!
#camille's comment about a minimal reproducible example is germaine. We need know nothing about your facets, guides or call to ggsave to answer your question.
First, generate some test data
library(tidyverse)
d <- tibble(
Species=rep(c("s__reuteri", "s__guilliermondii",
"o__Clostridiales_unclassified", "k__bacteria_unclassified"),
each=4),
Sample=as.factor(rep(1:4, times=4)),
Abundance=runif(16)
)
Generate custom labels and colours
labels <- unique(d$Species)
# Make sure length of availableColours is long enough to accommodate the maximum length of labels
availableColours <- c("red", "blue", "green", "orange", "yellow")
legendColours <- ifelse(str_detect(labels, fixed("unclassified")), "grey", availableColours)
Create the plot
d %>%
ggplot(aes(x=Sample, y=Abundance, fill=Species)) +
geom_bar(stat="identity") +
scale_fill_manual(labels=labels, values=legendColours)
Giving
If you want to "pool" all the unclassified species, then
d1 <- d %>%
mutate(
LegendSpecies=ifelse(
str_detect(
Species,
fixed("unclassified")
),
"Unclassified",
Species
)
)
legendColours <- ifelse(str_detect(unique(d1$LegendSpecies), fixed("Unclassified")), "grey", availableColours)
d1 %>%
ggplot(aes(x=Sample, y=Abundance, fill=LegendSpecies)) +
geom_bar(stat="identity")+
scale_fill_manual(labels=unique(d1$LegendSpecies), values=legendColours)
Giving

How to make a Grouped Bar Plot with a line

I have the following dataset:
HIU,0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375
TTHY,0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0
Full,0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951
I made a grouped bar plot according to the rows of HIU and TTHY (figure 1). But I want to add a line according to the "Full" row, such as the second image.
Figure 1:
Figure 2:
How can I do it with R? This is my current code:
df = read.csv('TTR-HIU/resultados.csv',header=FALSE,colClasses=c("NULL",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df.bar <- barplot(as.matrix(df[-nrow(df),]),beside=TRUE,col=c("darkblue","red"))
Using ggplot2, you could try something like this:
# put data in data frame:
df <- data.frame(HIU = c(0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375),
TTHY = c(0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0),
Full= c(0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951))
library(ggplot2)
library(tidyr) # to make data long (gather)
# create x-values:
df$x <- as.factor(seq_len(nrow(df)))
# make data long for ggplot2:
df_long <- df %>% gather(key, value, -x)
ggplot() +
# plot bars:
geom_col(data = subset(df_long, key %in% c("HIU", "TTHY")),
mapping = aes(x = x, y = value, fill = key),
position = position_dodge()) +
# plot lines:
geom_line(data = subset(df_long, key == "Full"),
mapping = aes(x = x, y = value, group = key, color = key),
size = 2) +
# make plot look a little like your desired output:
scale_color_manual(values = c("Full" = "yellow")) +
scale_fill_manual(values = c("HIU" = "blue", "TTHY" = "red")) +
theme_minimal() +
theme(axis.title = element_blank(),
legend.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
However, you might have to put your data in data-frame-shape as in this example. Use dput to show how your data exactly looks like, if you need further help...

Separate boxes for two grouping variables when color by only one variable

Here is an example from the geom_boxplot man page:
p = ggplot(mpg, aes(class, hwy))
p + geom_boxplot(aes(colour = drv))
which looks like this:
I would like to make a very similar plot, but with (yearmon formatted) dates where the class variable is in the example, and a factor variable where drv is in the example.
Here is some sample data:
df_box = data_frame(
Date = sample(
as.yearmon(seq.Date(from = as.Date("2013-01-01"), to = as.Date("2016-08-01"), by = "month")),
size = 10000,
replace = TRUE
),
Source = sample(c("Inside", "Outside"), size = 10000, replace = TRUE),
Value = rnorm(10000)
)
I have tried a bunch of different things:
Put an as.factor around the date variable, then I no longer have the nicely spaced out date scale for the x-axis:
df_box %>%
ggplot(aes(
x = as.factor(Date),
y = Value,
# group = Date,
color = Source
)) +
geom_boxplot(outlier.shape = NA) +
theme_bw() +
xlab("Month Year") +
theme(
axis.text.x = element_text(hjust = 1, angle = 50)
)
On the other hand, if I use Date as an additional group variable as suggested here, adding color no longer has any additional impact:
df_box %>%
ggplot(aes(
x = Date,
y = Value,
group = Date,
color = Source
)) +
geom_boxplot() +
theme_bw()
Any ideas as to how achieve the output of #1 while still maintaining a yearmon scale x-axis?
Since you need separate boxes for each combination of Date and Source, use interaction(Source, Date) as the group aesthetic:
ggplot(df_box, aes(x = Date, y = Value,
colour = Source,
group = interaction(Source, Date))) +
geom_boxplot()

Histogram of stacked boxes in ggplot2

The graph I'm currently trying to make falls a little between two stools. I want to make a histogram that is composed of stacked and labelled boxes. Here's an example of exactly the sort of thing I'm talking about, taken from a recent article in the New York Times:
http://farm8.staticflickr.com/7109/7026409819_1d2aaacd0a.jpg
Is it possible to achieve this using ggplot2?
To amplify the question somewhat, so far what I have is:
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15)
)
ggplot(dfr, aes(x=percent, fill=name)) + geom_bar() +
stat_bin(geom="text", aes(label=name))
...which I'm clearly doing all wrong. Ultimately what I'd ideally like is something along the lines of the manually-modified graph below, with (say) letters A to M filled one shade and N to Z filled another.
http://farm8.staticflickr.com/7116/7026536711_4df9a1aa12.jpg
Here you go!
set.seed(3421)
# added type to mimick which candidate is supported
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15),
type = sample(c("A", "B"), 26, replace = TRUE)
)
# easier to prepare data in advance. uses two ideas
# 1. calculate histogram bins (quite flexible)
# 2. calculate frequencies and label positions
dfr <- transform(dfr, perc_bin = cut(percent, 5))
dfr <- ddply(dfr, .(perc_bin), mutate,
freq = length(name), pos = cumsum(freq) - 0.5*freq)
# start plotting. key steps are
# 1. plot bars, filled by type and grouped by name
# 2. plot labels using name at position pos
# 3. get rid of grid, border, background, y axis text and lables
ggplot(dfr, aes(x = perc_bin)) +
geom_bar(aes(y = freq, group = name, fill = type), colour = 'gray',
show_guide = F) +
geom_text(aes(y = pos, label = name), colour = 'white') +
scale_fill_manual(values = c('red', 'orange')) +
theme_bw() + xlab("") + ylab("") +
opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(),
axis.ticks = theme_blank(), panel.border = theme_blank(),
axis.text.y = theme_blank())

Resources