Horizontal bar plot with both stack and cluster bars in R - r

I am trying to create a horizontal bar plot where for each variable I have two bars. the first should be a stack bar with the count of positive and negative values, the second only the neutral values. So far I didn't manage to do that.
Here is an example of the data:
df <- read.table(text = "Positive Negative Neutral
A 4 5 1
B 6 8 3
C 12 3 6
D 10 5 2
E 2 11 7", header = TRUE)
And here the plot I aiming to do (made with Excel):
Edit:
Thanks everyone for the help! I got great feedbacks already.
I am planning to continue with the option provided by #Duck as I feel that with ggplot2 it's more flexible. However, I figured out I would like to add (and learn) one more things:
Reorder automatically the y axis values (id (A, B, C, D, E)) based on the number of positive values (I edited the plot above). For instance in this case would be C, D, B,A, E):
I tried with reorder() and using factors() but unsuccessfully. Should it be applied on the data or directly on ggplot? The solution I found so far do not have this case of double level of variable (here is the name of variable (id) and the flag (TRUE = positive/negative, FALSE = neutral).

I might be the only person on the planet using base R plotting these days, but it is actually pretty flexible for non-standard plots like this.
xl <- c(0,20)
cols <- c("blue","orange","grey")
bd <- t(as.matrix(df))
bp <- barplot(unname(bd[1:2, rbind(1:5,NA,NA)]), xlim=xl, horiz=TRUE, col=cols[1:2])
barplot(unname(bd[3, rbind(NA,1:5,NA)]), xlim=xl, horiz=TRUE, col=cols[3],
axes=FALSE, add=TRUE)
axis(2, at=colMeans(matrix(bp, nrow=3)[1:2,]), labels=colnames(bd), las=1, lty=0)
par(xpd=NA)
legend("bottom", rownames(bd), fill=cols, horiz=TRUE, inset=-1 / par("pin")[2], bty="n")
Result:

Try with facets like this and smartly create a reference variable to create two variables and plot them using ggplot2 and some tidyverse functions:
library(ggplot2)
library(tidyverse)
#Data
df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
mutate(Flag=name %in% c('Positive','Negative')) %>%
ggplot(aes(x=Flag,y=value,fill=name))+
geom_bar(stat = 'identity')+
facet_grid(id~., switch = "y") +
scale_fill_manual(values = c("red", "tomato", "cyan3"), name = "") +
coord_flip() +
theme_classic() +
theme(panel.spacing = unit(0, "points"),
strip.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
axis.title = element_blank(),
strip.placement = "outside",
strip.text = element_text(),
legend.position = "bottom",
panel.grid.major.x = element_line())
Output:
Update: In order to have the desired order, you could create a dummy data to order the labels and then format the id variable as factor. Here the code:
#Auxiliar data
levs <- df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
filter(name=='Positive') %>%
arrange(desc(value))
#Data
df %>% rownames_to_column("id") %>%
pivot_longer(-id) %>%
mutate(Flag=name %in% c('Positive','Negative'),
id=factor(id,levels = levs$id,ordered = T)) %>%
ggplot(aes(x=Flag,y=value,fill=name))+
geom_bar(stat = 'identity')+
facet_grid(id~., switch = "y") +
scale_fill_manual(values = c("red", "tomato", "cyan3"), name = "") +
coord_flip() +
theme_classic() +
theme(panel.spacing = unit(0, "points"),
strip.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
axis.title = element_blank(),
strip.placement = "outside",
strip.text = element_text(),
legend.position = "bottom",
panel.grid.major.x = element_line())
Output:

Related

Density timeline graph

I'd love to make a timeline density graph like the New York Times does for COVID-19 cases (screenshot below). I am trying to do it with crime data instead of COVID data. Any ideas on how to use R (ggplot2, plotly, etc) to make a graph like this that shows density by day?
So far, I haven't found a similar style in the R-graph Gallery. Thanks.
update: Here is the closest approximation I have come up with so far:
dat_c_grp <- dat_c %>%
group_by(report_date, month) %>%
summarize(count = n())
p <- ggplot(dat_c_grp, aes(report_date, month, fill = count))+
geom_tile(color= "white",size=0.1) +
scale_fill_viridis(name="Daily",option ="C")
output:
p
I'd like the months collapsed the months into one row. I can't figure out how to make it all one row.
Here's a full reprex of one way to do this which emulates the look of the original fairly well (obviously I've had to make the data up):
library(ggplot2)
set.seed(1)
Dates <- rep(seq(as.Date("2020-03-01"), by = "1 week", length.out = 36), 3)
Places <- rep(c("Conneticut", "Fairfield", "New Haven"), each = 36)
Cases <- as.numeric(replicate(3, rpois(36, dgamma((1:36)/3, 2.5) * 100))) +
as.numeric(replicate(3, rpois(36, 0.0002 * exp(1:36)^(1/3))))
df <- data.frame(Dates, Places, Cases)
ggplot(df, aes(Dates, Places, fill = Cases)) +
geom_tile(color = "gray92") +
facet_grid(Places~., scales = "free_y") +
scale_fill_gradientn(colors = c("#f3df8e", "#fdad45", "#ff700a", "#cc0a06")) +
theme_minimal() +
scale_x_date(date_breaks = "month", labels = scales::date_format("%b")) +
theme(panel.spacing = unit(50, "points"),
legend.position = "top",
axis.title.y = element_blank(),
strip.text = element_blank(),
panel.grid = element_blank())
Thanks to both Allan Cameron and Konrad Rudolph.
Here is the code to answer my question:
dat_c_grp <- dat_c %>%
count(report_date, month, name = 'count')
p <- ggplot(dat_c_grp, aes(report_date, 0, fill = count))+
geom_tile(color= "white",size=0.1) +
scale_fill_gradientn(colors = c("#f3df8e", "#fdad45", "#ff700a", "#cc0a06"))+
removeGrid()+
theme(panel.spacing = unit(50, "points"),
legend.position = "top",
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y=element_blank(),
strip.text = element_blank(),
panel.grid = element_blank())
p
And output:

Group variables in ggplot2 (geom_col)

I want to plot variables from a dataframe into different groups, but cannot figure out how to do this.
Mock data frame:
df<-data.frame(animal=c("cat1","cat2","mouse1","mouse2","dog1","dog2"),size=c(3,4,1,2,6,7))
plot<-ggplot(df)+geom_col(mapping=aes(x=animal,y=size))
plot
My wanted output should look like this:
I would suggest next approach. You have to create a group and then use facets. Most of these tricks I learnt from #AllanCameron that has great answer for problems of this style.Next the code that can do that:
library(tidyverse)
#Data
df<-data.frame(animal=c("cat1","cat2","mouse1","mouse2","dog1","dog2"),
size=c(3,4,1,2,6,7),stringsAsFactors = F)
#Create group
df$Group <- gsub('\\d+','',df$animal)
#Now plot
ggplot(df,aes(x=animal,y=size))+
geom_col()+
facet_wrap(.~Group,scales = 'free', strip.position = "bottom")+
theme(strip.placement = "outside",
panel.spacing = unit(0, "points"),
strip.background = element_blank(),
strip.text = element_text(face = "bold", size = 12),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
The output:
You need to tell ggplot to which group an animal belongs to; then you can use the group argument on the x-axis and use the fill argument to further distinguish between the different animals. position = "dodge" leads to the bars showing up next to each other.
library(ggplot2)
df <- data.frame(animal = c("cat1","cat2","mouse1","mouse2","dog1","dog2"),
size=c(3,4,1,2,6,7),
group = c("cat", "cat", "mouse", "mouse", "dog", "dog"))
ggplot(df, aes(x = group, y = size, fill = animal)) +
geom_col(position = "dodge")
Created on 2020-08-23 by the reprex package (v0.3.0)

Split axis plot in ggplot2

I just found this plot in Factfulness (book by Hans Rosling and his children). I find the aestetics of the split quite appealing.
While it's possible to make something similar using geom_rect(), it's a quite different look. Another approach would be to use cowplot or patchwork but quite tricky. Here's as far as I got trying to replicate the top part with
gapminder %>%
filter(year==1997, gdpPercap<16000) %>%
ggplot(aes(gdpPercap, y=lifeExp, size=pop)) +
geom_point(alpha=0.5)+
scale_x_log10()+
ggthemes::theme_base()+
theme(legend.position = "none",
plot.background = element_blank(),
plot.margin = unit(c(0.5, 0, 0, 0), "cm")) -> P1
gapminder %>%
filter(year==1997, gdpPercap>16000) %>%
ggplot(aes(gdpPercap, y=lifeExp, size=pop)) +
geom_point(alpha=0.5)+
scale_x_log10()+
ggthemes::theme_base()+
theme(legend.position = "none",
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
plot.background = element_blank(),
plot.margin = unit(c(0.5, 0.5, 0, 0), "cm"),
axis.title.x = element_blank()) -> P2
cowplot::plot_grid(P1, P2, rel_widths = c(2,1), labels = NULL,
align = "h")
I think al the rest of the text and highlights are possible with existing packages. I am wondering what's the way to get a common x axis (the right side should display the ticks according to the ). Ideally, the x axis title would be centered but that might be too much to ask. I can also move it inside as text.
There are problems with axes, as you can see in the plot with y ticks. I wonder if facets would be a better approach. I am also not sure if the point sizes is wrongly calculated because I filter the data first.
Here is a solution using facets. You can solve the x-axis breaks problem by precomputing the breaks using the scale package's log10 break calculator. You could use a mutate() in the pipeline to make a new variable that splits the facets.
library(tidyverse)
library(gapminder)
breaks <- scales::log10_trans()$breaks(range(gapminder$gdpPercap), n = 6)
gapminder %>%
filter(year==1997) %>%
mutate(facet = factor(ifelse(gdpPercap > 16000, "High", "Low"),
levels = c("Low", "High"))) %>%
ggplot(aes(gdpPercap, y=lifeExp, size=pop)) +
geom_point(alpha=0.5)+
scale_x_log10(breaks = breaks)+
ggthemes::theme_base()+
facet_grid(~ facet,
scales = "free_x", space = "free_x") +
ggtitle("My title") +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5),
plot.background = element_blank())

order y-axis of geom_tile plot by variable

I am using geom_tile to visualize random draws
Generate data:
set.seed(1)
df= crossing(sim=1:10,part= 1:10)
df$result = sample(c(1,0),size = nrow(df), replace=T)
df = df %>%
group_by(sim)%>%
# find out how many successful (1) pilots there were in the first 4 participants
summarize(good_pilots = sum(result[1:4])) %>%
arrange(good_pilots) %>%
ungroup() %>%
# add this back into full dataframe
full_join(df)
# plot data
plot = ggplot(df, aes( y=factor(sim), x=part)) +
geom_tile(aes(fill = factor(result)), colour = "black",
show.legend = T)+
scale_fill_manual(values=c("lightgrey", "darkblue"))+# c(0,1)
theme(panel.border = element_rect(size = 2),
plot.title = element_text(size = rel(1.2)),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.title = element_blank(),
legend.position = "right")+ theme_classic()+ coord_fixed(ratio=1)
This results in:
What I actually want is the y axis to be ordered by the # of blue (ie 1's) in the first four columns of the block (which is calculated in good_pilots).
I tried scale_y_discrete but that cannot be what is intended:
plot + scale_y_discrete(limits=df$sim[order(df$good_pilots)])
resulting in:
From what I can tell it seems like the ordering worked correctly, but using scale_y_discrete caused the plot to be messed up.
You can use reorder here
ggplot(df, aes(y = reorder(sim, good_pilots), x = part)) +
...

Minor grid lines in ggplot2 with discrete values and facet grid

I have a plot created using ggplot2 where I'm trying to modify some of the minor grid lines. Here is the current version:
library(tidyverse)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color))
ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
) +
geom_hline(yintercept = seq(0, length(unique(starwars$skin_color))) + .5, color="gray30")
Y axis is a factor and a facet grid is used, with an uneven number of categories in each grid. I added some minor grid lines using geom_hline (my understanding is that panel.grid.minor does not work with categorical data i.e., factors).
I would like to remove the lines highlighted in yellow below, and then ADD a single black line in between the two facet grids (i.e., where the current double lines are that are highlighted in yellow).
Any way to do this? I'd prefer avoiding hard coding the position of any lines, in case the data change. Thanks.
Removing the top and bottom grid lines dynamically is relatively easy. You code the line positions in the data set based on the faceting groups and exclude the highest and lowest value, and plot the geom_hline with an xintercept inside the aes() statement. That approach is robust to changing the data (to see that this approach works if you change the data, comment out the # filter(!is.na(birth_year)) line below).
library(tidyverse)
library(grid)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color)) %>%
# filter(!is.na(birth_year)) %>%
group_by(tatooine) %>%
# here we assign the line_positions
mutate(line_positions = as.numeric(factor(skin_color, levels = unique(skin_color))),
line_positions = line_positions + .5,
line_positions = ifelse(line_positions == max(line_positions), NA, line_positions))
plot_out <- ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
geom_hline(aes(yintercept = line_positions)) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_line(colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
)
print(plot_out)
gives
However, adding a solid between the facets without any hardcoding is difficult. There are some possible ways to add borders between facets (see here), but if we don't know whether the facets change it is not obvious to which value the border should be assigned. I guess there is a possible solution with drawing a hard coded line in the plot that divides the facets, but the tricky part is to determine dynamically where that border is going to be located, based on the data and how the facets are ultimately draw (e.g. in which order etc). I'd be interested in hearing other opinions on this.

Resources