Displaying a stacked bar plot with a condition - r

I have this dataframe with numbers being percentages:
`df <- data.frame(spoken = c(10, 90, 30, 70),
lexicon = c(10, 90, 50, 50),
row.names = c("consonant_initial",
"vowel_initial",
"consonant_final", "vowel_final"))`
I want to display that in a nice way so that I get
a stacked barplot for the distribution of vowel vs consonant initial words
and the distribution of vowel vs consonant final words,
including facet_wrap to show the two conditions lexicon vs. spoken.
I have tried to reshape the data:
df$row <- seq_len(nrow(df))
df <- melt(df, id.vars = "row")
However, I can't wrap my head around how I would need to reshape the data in order to display it accordingly

You need to split the row names since the information you need to color code the stacked bars is encoded within, if I understand your desired graph correctly.
library(tidyverse)
df$label <- row.names(df)
df %>%
separate(label, c("lettertype", "position"), "_") %>%
gather(key = 'condition', value = 'prop', -lettertype, -position) %>%
ggplot() +
aes(x = position, y = prop, fill = lettertype) +
geom_bar(stat = 'identity') +
facet_wrap(~condition)

df$row1 <- sapply(strsplit(row.names(df), "_"), function(x) x[1])
df$row2 <- sapply(strsplit(row.names(df), "_"), function(x) x[2])
library(reshape2)
df <- melt(df, id.vars = c("row1", "row2"))
library(ggplot2)
ggplot(df, aes(x = row2, y = value, fill = row1)) +
geom_col() +
facet_wrap(~variable)

Related

How to fix "Breaks and labels are different lengths" when using ggplot2 for faceted plots?

Consider the following example:
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
df <- data.frame(x = seq(0, 70, 0.5),
y = seq(0, 70, 0.5),
val = rnorm(141),
group =rep(1:3,47))
max_val_plot <- df$val %>% max() %>% round(0)
min_val_plot <--df$val %>% min() %>% round(0)
breaks_plot <-seq(min_val_plot,max_val_plot,0.1)
n <- breaks_plot %>% length()
getPalette <- colorRampPalette(brewer.pal(9, "RdBu"))
colors_plot <-getPalette(n)
labels_plot <- breaks_plot %>%
as.character()
labels_plot[!1:0]=' '
df %>%
ungroup() %>%
ggplot(aes(x=x,y=y,fill=val))+
geom_raster()+
facet_grid(~group)+
theme_bw(base_size = 20)+
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = labels_plot
)
Although labels and breaks are of equal length, the error "Breaks and labels are different lengths" is returned due to the presence of multiple groups and the faceted function in plotting code.
How can I fix this?
Thanks!
One option to fix your issue would be to pass a function to the labels argument of scale_fill_xxx to create the labels on the fly instead of providing the labels as a vector.
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
set.seed(123)
df %>%
ungroup() %>%
ggplot(aes(x = x, y = y, fill = val)) +
geom_raster() +
facet_grid(~group) +
theme_bw(base_size = 20) +
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = function(x) { x <- as.character(x); x[!1:0] <- " "; x}
)

r ggplot barplot with multiple date columns

I have a data frame with multiple date columns and I want to make a single plot with 3 bar charts (one for ID/dat1, ID/dat2 and ID/dat3). Anyone know how to do this?
EDIT: I'm looking for a plot with the date on the x-axis and count of ID on the y-axis.
Example data frame:
dat <- data.frame(ID = c(1:80),
dat1 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat2 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat3 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80))
Are you after this?
melt(setDT(dat), id.vars = "ID") %>%
ggplot(aes(x = value, fill = variable)) +
geom_bar()
If you want to have line plot, you can try
melt(setDT(dat),id.vars = "ID") %>%
ggplot(aes(x = value, y = ID, group = variable, color = variable)) +
geom_line()

How to plot a(n unknown) number of data series as geom_line in same chart

My first Q here, so please go lightly if I'm out of step anywhere.
I'm trying to code R to produce a single chart to contain a number of data series lines. The number of data series may vary but will be provided in the data frame. I have tried to rearrange another thread's content to print the geom_line , but not successfully.
The logic is:
#desire to replace loop of 1:5 with ncol(df)
print(ggplot(df,aes(x=time))
for (i in 1:5) {
print (+ geom_line(aes(y=df[,i]))
}
#functioning geom point loops ggplot production:
for (i in 1:5) {
print(ggplot(df,aes(x=time,y=df[,i]))+geom_point())
}
#functioning multi-line ggplot where n is explicit:
ggplot(data=df, aes(x=time), group=1) +
geom_line(aes(y=df$`3`))+
geom_line(aes(y=df$`4`))
The functioning example code produces n number of point charts, 5 in this case. I would like just one chart to contain n line series.
This may be similar to How to plot n dimensional matrix? for which there are currently no relevant answers
Any contributions much appreciated, thanks
You can use gather from tidyverse "world" to do that.
As you didn't supply a sample data I used mtcars.
I created two data.frames one with 3 columns one with 9. In each one of them I plotted all of the variables against the variable mpg.
library(tidyverse)
df3Columns <- mtcars[, 1:4]
df9Columns <- mtcars[, 1:10]
df3Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
df9Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
Edit - using the sample data in comments.
library(tidyverse)
df %>%
rownames_to_column("time") %>%
gather(var, value, -time) %>%
ggplot(aes(time, value, group = var, color = var)) +
geom_line()
Sample data:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
To strictly answer your question, you can simply store your ggplot in a variable and add the geom_line one by one:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
g <- ggplot(df, aes(x = 1:nrow(df)))
for (i in colnames(df))
{
g <- g + geom_line(y = df[,i])
}
g <- g + scale_y_continuous(limits = c(min(df), max(df)))
print(g)
However, this is not a very convenient solution. I would highly recommend to refactor your data frame to be more ggplot style.
df.ultimate <- data.frame(time = numeric(), value = numeric(), group = character())
for (i in colnames(df))
{
df.ultimate <- rbind(df.ultimate, data.frame(time = 1:nrow(df), value = df[, i], group = i))
}
g <- ggplot(df.ultimate, aes(x = time, y = value, color = group))
g <- g + geom_line()
print(g)
A one-line solution:
ggplot(data.frame(time = rep(1:nrow(df), ncol(df)),
value = as.vector(as.matrix(df)),
group = rep(colnames(df), each = nrow(df))),
aes(x = time, y = value, color = group)) + geom_line()

ggplot, reordering aes and faceting [duplicate]

Lets say, in R, I have a data frame letters, numbers and animals and I want to examine the relationship between all three graphically. I could do something like.
library(dplyr)
library(ggplot2)
library(gridExtra)
set.seed(33)
my_df <- data.frame(
letters = c(letters[1:10], letters[6:15], letters[11:20]),
animals = c(rep('sheep', 10), rep('cow', 10), rep('horse', 10)),
numbers = rnorm(1:30)
)
ggplot(my_df, aes(x = letters, y = numbers)) + geom_point() +
facet_wrap(~animals, ncol = 1, scales = 'free_x')
I'd get something that looks like.
However, I want the order of the x axis to be dependent on the order of the y-axis. This is easy enough to do without facets, as per this example.
I can even make an ordered figure for each animal and then bind them together with grid.arrange as in this example
my_df_shp <- my_df %>% filter(animals == 'sheep')
my_df_cow <- my_df %>% filter(animals == 'cow')
my_df_horse <- my_df %>% filter(animals == 'horse')
my_df_shp1 <- my_df_shp %>% mutate(letters = reorder(letters, numbers))
my_df_cow1 <- my_df_cow %>% mutate(letters = reorder(letters, numbers))
my_df_horse1 <- my_df_horse %>% mutate(letters = reorder(letters, numbers))
p_shp <- ggplot(my_df_shp1, aes(x = letters, y = numbers)) + geom_point()
p_cow <- ggplot(my_df_cow1, aes(x = letters, y = numbers)) + geom_point()
p_horse <- ggplot(my_df_horse1, aes(x = letters, y = numbers)) + geom_point()
grid.arrange(p_shp, p_cow, p_horse, ncol = 1)
I don't particularly like this solution though, because it isn't easily generalizable to cases where there are a lot of facets.
I'd rather do something like
ggplot(my_df, aes(x = y_ordered_by_facet(letters, by = numbers), y = numbers)) + geom_point() +
facet_wrap(~animals, ncol = 1, scales = 'free_x')
Where y_ordered is some function that cleverly orders the letters factor to be in the same order as the numbers.
Something that gets close to this, but doesn't quite seem to work is
ggplot(my_df, aes(x = reorder(letters, numbers), y = numbers)) +
geom_point() + facet_wrap(~animals, ncol = 1, scales = 'free_x')
That doesn't quite work because the order ends up taking effect before, rather than after the facet wrapping and thus putting the labels in not quite the right order for each panel.
Any clever ideas?
I've found dplyr doesn't work super well with group_by() when dealing with different factor levels in each of the groups. So one work around is thinking of creating a new factor that's unique for each animal-letter combination and ordering that. First, we create an interaction variable with animal+letter and determine the proper order for each of the letters for the animals
new_order <- my_df %>%
group_by(animals) %>%
do(data_frame(al=levels(reorder(interaction(.$animals, .$letters, drop=TRUE), .$numbers)))) %>%
pull(al)
Now we create the interaction variable in the data we want to plot, use this new ordering, and finally change the labels so they look like just the letters again
my_df %>%
mutate(al=factor(interaction(animals, letters), levels=new_order)) %>%
ggplot(aes(x = al, y = numbers)) +
geom_point() + facet_wrap(~animals, ncol = 1, scales = 'free_x') +
scale_x_discrete(breaks= new_order, labels=gsub("^.*\\.", "", new_order))
set.seed(33)
my_df <- data.frame(
letters = c(letters[1:10], letters[6:15], letters[11:20]),
animals = c(rep('sheep', 10), rep('cow', 10), rep('horse', 10)),
numbers = rnorm(1:30)
)
my_df %>% group_by(animals) %>%
arrange(numbers, .by_group = T) %>%
mutate(lett = factor(interaction(animals,letters, drop=TRUE))) -> my_df
ggplot(my_df, aes(x = reorder(lett, numbers), y = numbers)) +
geom_point(size = 3) +
facet_wrap(~animals, ncol = 1, scales = 'free_x') +
scale_x_discrete(breaks = my_df$lett, labels=gsub("^.*\\.", "", my_df$lett))

Sorting factors in multipanel plot in ggplot2 according to the first panel

Is it possible to sort factors in a multipanel plot in ggplot2 according to the first panel? The first panel decides the order and the remaining panels follow that order.
Here is an example:
require(ggplot2)
set.seed(36)
xx<-data.frame(YEAR=rep(c("X","Y"), each=20),
CLONE=rep(c("A","B","C","D","E"), each=4, 2),
TREAT=rep(c("T1","T2","T3","C"), 10),
VALUE=sample(c(1:10), 40, replace=T))
ggplot(xx, aes(x=CLONE, y=VALUE, fill=YEAR)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~TREAT)
Which gives me this plot:
Now I would like to sort CLONE based on the VALUE in YEAR X in a descending order (highest to lowest) but only for the Control (C panel). This order should then be maintained for T1, T2, and T3. By looking at the plot above, I want panel C sorted as CLONE C, B or D (both are 5), A and E. This order of CLONE should then be replicated for the remaining panels.
There's no easy way to do this right in ggplot since you have to reorder CLONE by
3 conditions, TREAT, YEAR and VALUE, otherwise forcats::fct_reorder2 could have been an option.
Instead, extract the order of CLONE from the subset of data corresponding to YEAR = "X",
TREAT = "C", and re-define your factor levels for the whole data set based on this subset.
library("ggplot2")
library("dplyr")
set.seed(36)
xx <- data.frame(YEAR = rep(c("X","Y"), each = 20),
CLONE = rep(c("A","B","C","D","E"), each = 4, 2),
TREAT = rep(c("T1","T2","T3","C"), 10),
VALUE = sample(c(1:10), 40, replace = TRUE), stringsAsFactors = FALSE)
clone_order <- xx %>% subset(TREAT == "C" & YEAR == "X") %>%
arrange(-VALUE) %>% select(CLONE) %>% unlist()
xx <- xx %>% mutate(CLONE = factor(CLONE, levels = clone_order))
ggplot(xx, aes(x = CLONE, y = VALUE, fill = YEAR)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~TREAT)
giving

Resources