Suppose I have data in the following form:
library(ggplot2)
Data <- data.frame(
"ID" = c("ABC111", "ABC111", "ABC111", "ABC111", "ABC112", "ABC112", "ABC112", "ABC113", "ABC113", "ABC114", "ABC115"),
"color" = c("red", "red", "red", "red", "blue", "blue", "blue", "green", "green", "black", "yellow"),
"start_date" = c("2005/01/01", "2006/01/01", "2007/01/01", "2008/01/01", "2009/01/01", "2010/01/01", "2011/01/01", "2012/01/01", "2013/01/01", "2014/01/01", "2015/01/01"),
"end_date" = c("2005/09/01", "2006/06/01", "2007/04/01", "2008/05/07", "2009/06/01", "2010/10/01", "2011/12/12", "2013/05/01", "2013/06/08", "2015/01/01", "2016/08/09")
)
Data$ID = as.factor(Data$ID)
Data$color = as.factor(Data$color)
Now what I want to do is for each row, plot the start_date and the end_date ... and then connect them with a straight line. I believe this can be done with geom_line() in ggplot2.
I want something that looks like this:
I tried using the following code:
q <- qplot(start_date, end_date, data=Data)
q <- q + geom_line(aes(group = ID))
q
But the graph looks completely different than what I expected.
Can anyone please show me what I am doing wrong?
Thanks
Does the following work for you?
ggplot(data = Data, aes(start_date, end_date, color = ID))+
geom_line(aes(group = ID))+
geom_point()
or maybe geom_segment ?
# Adding x and y coordinates for geom_segment
Data$x <- as.character(as.Date(Data$start_date) + (as.Date(Data$end_date) - as.Date(Data$start_date)))
Data$y <- 1:nrow(Data)
ggplot(data = Data, aes(x, y, colour = ID))+
geom_segment(aes(xend = start_date, yend = end_date))
Here's a solution using the tidyverse package. I used the number of each row in the original data as the y-axis values in the plot. As these values are meaningless, I removed the y-axis title, labels and ticks from the plot.
library(tidyverse)
Data %>%
# Number each row in its order of appearance,
# save this numbers in a new column named order
rowid_to_column("order") %>%
# Change data from wide to long format
pivot_longer(cols = c(start_date, end_date),
names_to = "date_type",
values_to = "date") %>%
# Ggplot, use date as x, order as y, ID as col and order as group
ggplot(aes(x = date,
y = order,
col = ID,
group = order)) +
# Draw points
geom_point()+
# Draw lines
geom_line() +
# Maybe you want to remove the y axis title, text and ticks
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
# I added a vertical format to the x axis labels
# it might easier to read this way
axis.text.x = element_text(angle = 90, vjust = 0.5))
Related
I have a colour data frame that I join with my input data to match the colours to categories. The issue is that when using fill=mycolour the legend displays the colour names and not the names of my categories.
I would like fill to be name_assigned while still matching the colours in mycolors.
df %>%
dplyr::left_join(colors.variable, by="name_assigned") %>%
ggplot(aes(reorder(chr,chr,function(x)-length(x)),y=name_assigned, fill=mycolors)) +
geom_bar(aes(y = (..count..))) +
scale_fill_identity()
You don't need the join, from the colors data set create a named colors vector and use it in scale_fill_manual.
Also, you seem to have swapped x and y coordinates.
library(ggplot2)
set.seed(2022)
df <- data.frame(
chr = rbinom(1e3, 1, 0.5),
name_assigned = sample(letters[1:3], 1e3, TRUE)
)
colors.variable <- data.frame(
name_assigned = letters[1:3],
mycolors = c("pink", "purple", "seagreen")
)
mycolors <- with(colors.variable, setNames(mycolors, name_assigned))
ggplot(df, aes(name_assigned, fill = name_assigned)) +
geom_bar() +
scale_fill_manual(values = mycolors)
I have a for loop to run through a tonne of microbiome data (using phyloseq) and generate plots for multiple experiments.
ggplot(data_M1, aes(x = Sample, y = Abundance, fill = get(i))) +
geom_bar(stat = "identity")+
facet_wrap(vars(Status, Time.Point, Treatment), scales = "free", ncol=2)+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
guides(fill = guide_legend(reverse = TRUE, keywidth = 1, keyheight = 1, title = i))+
ylab(yaxisname)+
ggtitle(plotname)+
ggsave(ggsavename, last_plot())
Example outcome:
What I am trying to do though is make all the "_unclassified" samples/ sequencing data grey... so maybe I need some kind of if statement with str_contains?
Happy to dput a reproducible example if required but someone might have a simple solution.
Thank you!
#camille's comment about a minimal reproducible example is germaine. We need know nothing about your facets, guides or call to ggsave to answer your question.
First, generate some test data
library(tidyverse)
d <- tibble(
Species=rep(c("s__reuteri", "s__guilliermondii",
"o__Clostridiales_unclassified", "k__bacteria_unclassified"),
each=4),
Sample=as.factor(rep(1:4, times=4)),
Abundance=runif(16)
)
Generate custom labels and colours
labels <- unique(d$Species)
# Make sure length of availableColours is long enough to accommodate the maximum length of labels
availableColours <- c("red", "blue", "green", "orange", "yellow")
legendColours <- ifelse(str_detect(labels, fixed("unclassified")), "grey", availableColours)
Create the plot
d %>%
ggplot(aes(x=Sample, y=Abundance, fill=Species)) +
geom_bar(stat="identity") +
scale_fill_manual(labels=labels, values=legendColours)
Giving
If you want to "pool" all the unclassified species, then
d1 <- d %>%
mutate(
LegendSpecies=ifelse(
str_detect(
Species,
fixed("unclassified")
),
"Unclassified",
Species
)
)
legendColours <- ifelse(str_detect(unique(d1$LegendSpecies), fixed("Unclassified")), "grey", availableColours)
d1 %>%
ggplot(aes(x=Sample, y=Abundance, fill=LegendSpecies)) +
geom_bar(stat="identity")+
scale_fill_manual(labels=unique(d1$LegendSpecies), values=legendColours)
Giving
I have the following dataset:
HIU,0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375
TTHY,0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0
Full,0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951
I made a grouped bar plot according to the rows of HIU and TTHY (figure 1). But I want to add a line according to the "Full" row, such as the second image.
Figure 1:
Figure 2:
How can I do it with R? This is my current code:
df = read.csv('TTR-HIU/resultados.csv',header=FALSE,colClasses=c("NULL",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
df.bar <- barplot(as.matrix(df[-nrow(df),]),beside=TRUE,col=c("darkblue","red"))
Using ggplot2, you could try something like this:
# put data in data frame:
df <- data.frame(HIU = c(0.0833333333,0,0.35,0.0208333333,0.40625,0,0.21875,0.125,0.078125,0.0104166667,1,0.53125,0.4375),
TTHY = c(0,0,0.8,0,0.5,0,0.7083333333,0.2708333333,0,0.6597222222,0,0.1435185185,0),
Full= c(0.0554986339,0.1034836066,0.4620901639,0.0683060109,0.4961577869,0.0696721311,0.222079918,0.1465163934,0.2085040984,0.0476007514,0.893613388,0.396943306,0.4223872951))
library(ggplot2)
library(tidyr) # to make data long (gather)
# create x-values:
df$x <- as.factor(seq_len(nrow(df)))
# make data long for ggplot2:
df_long <- df %>% gather(key, value, -x)
ggplot() +
# plot bars:
geom_col(data = subset(df_long, key %in% c("HIU", "TTHY")),
mapping = aes(x = x, y = value, fill = key),
position = position_dodge()) +
# plot lines:
geom_line(data = subset(df_long, key == "Full"),
mapping = aes(x = x, y = value, group = key, color = key),
size = 2) +
# make plot look a little like your desired output:
scale_color_manual(values = c("Full" = "yellow")) +
scale_fill_manual(values = c("HIU" = "blue", "TTHY" = "red")) +
theme_minimal() +
theme(axis.title = element_blank(),
legend.title = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
However, you might have to put your data in data-frame-shape as in this example. Use dput to show how your data exactly looks like, if you need further help...
Here is an example from the geom_boxplot man page:
p = ggplot(mpg, aes(class, hwy))
p + geom_boxplot(aes(colour = drv))
which looks like this:
I would like to make a very similar plot, but with (yearmon formatted) dates where the class variable is in the example, and a factor variable where drv is in the example.
Here is some sample data:
df_box = data_frame(
Date = sample(
as.yearmon(seq.Date(from = as.Date("2013-01-01"), to = as.Date("2016-08-01"), by = "month")),
size = 10000,
replace = TRUE
),
Source = sample(c("Inside", "Outside"), size = 10000, replace = TRUE),
Value = rnorm(10000)
)
I have tried a bunch of different things:
Put an as.factor around the date variable, then I no longer have the nicely spaced out date scale for the x-axis:
df_box %>%
ggplot(aes(
x = as.factor(Date),
y = Value,
# group = Date,
color = Source
)) +
geom_boxplot(outlier.shape = NA) +
theme_bw() +
xlab("Month Year") +
theme(
axis.text.x = element_text(hjust = 1, angle = 50)
)
On the other hand, if I use Date as an additional group variable as suggested here, adding color no longer has any additional impact:
df_box %>%
ggplot(aes(
x = Date,
y = Value,
group = Date,
color = Source
)) +
geom_boxplot() +
theme_bw()
Any ideas as to how achieve the output of #1 while still maintaining a yearmon scale x-axis?
Since you need separate boxes for each combination of Date and Source, use interaction(Source, Date) as the group aesthetic:
ggplot(df_box, aes(x = Date, y = Value,
colour = Source,
group = interaction(Source, Date))) +
geom_boxplot()
The graph I'm currently trying to make falls a little between two stools. I want to make a histogram that is composed of stacked and labelled boxes. Here's an example of exactly the sort of thing I'm talking about, taken from a recent article in the New York Times:
http://farm8.staticflickr.com/7109/7026409819_1d2aaacd0a.jpg
Is it possible to achieve this using ggplot2?
To amplify the question somewhat, so far what I have is:
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15)
)
ggplot(dfr, aes(x=percent, fill=name)) + geom_bar() +
stat_bin(geom="text", aes(label=name))
...which I'm clearly doing all wrong. Ultimately what I'd ideally like is something along the lines of the manually-modified graph below, with (say) letters A to M filled one shade and N to Z filled another.
http://farm8.staticflickr.com/7116/7026536711_4df9a1aa12.jpg
Here you go!
set.seed(3421)
# added type to mimick which candidate is supported
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15),
type = sample(c("A", "B"), 26, replace = TRUE)
)
# easier to prepare data in advance. uses two ideas
# 1. calculate histogram bins (quite flexible)
# 2. calculate frequencies and label positions
dfr <- transform(dfr, perc_bin = cut(percent, 5))
dfr <- ddply(dfr, .(perc_bin), mutate,
freq = length(name), pos = cumsum(freq) - 0.5*freq)
# start plotting. key steps are
# 1. plot bars, filled by type and grouped by name
# 2. plot labels using name at position pos
# 3. get rid of grid, border, background, y axis text and lables
ggplot(dfr, aes(x = perc_bin)) +
geom_bar(aes(y = freq, group = name, fill = type), colour = 'gray',
show_guide = F) +
geom_text(aes(y = pos, label = name), colour = 'white') +
scale_fill_manual(values = c('red', 'orange')) +
theme_bw() + xlab("") + ylab("") +
opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(),
axis.ticks = theme_blank(), panel.border = theme_blank(),
axis.text.y = theme_blank())