Any way to flip coordinates without reversing everything else? - r

I'm working on a plotting function that has the option to flip coordinates (using coord_flip). The thing is, this is a plot by group (using the fill argument), which means, for some reason, coord_flip also reverses colors, legend, my value column and my fill column. In practice, this means I have the following pice of code in my function:
if(flip_coord){
colors = c("#CC0000", "#002D73" ) %>% rev
rev_legend = T
table[[col_plot]] = fct_rev(table[[col_plot]]) # value column
table[['origin_table']] = fct_rev(table[['origin_table']]) # fill column
} else{
colors = c("#CC0000", "#002D73" )
rev_legend = F
}
There's also this line in my plot:
{if(flip_coord) coord_flip()} +
This brings back everything else that gets scrambled with coord_flip, but isn't too elegant. Is there a better way to only flip coordinates without reversing everything else?
PS: I know there's no reproducible example here, I'll try to add one, but if someone has already stumbled upon the answer to this problem that might be common, I'll post as is for the moment.
Edit: made some reprex. Let's say my data is this:
df = tibble(origin = c('2000s', '1990s') %>% rep(2),
region = c('South', 'North') %>% rep(2) %>% sort,
value = 1:4) %>%
mutate(origin = factor(origin, levels = c('1990s', '2000s')),
region = factor(region, levels = c('North', 'South')))
colors = c('red', 'blue')
# origin region value
# <fct> <fct> <int>
# 1 2000s North 1
# 2 1990s North 2
# 3 2000s South 3
# 4 1990s South 4
If I plot regularly, everything comes ordered (90s first, 00s second, North first, South second):
df %>%
ggplot(aes(x = region, fill = origin, y = value)) +
geom_bar(stat = "identity", position = 'dodge', color = "white", alpha= 0.8)+
scale_fill_manual(values=colors)
But, if I flip coordinates (just adding + coord_flip() to the code above) I get the following:
South above north, 00s above 90s and the legend isn't in the same order than the bars. This is exactly the same if I input x = value and y = origin. So, to fix this I have to do the following:
df2 = df
df2[['region']] = fct_rev(df2[['region']]) # Change 1
df2[['origin']] = fct_rev(df2[['origin']]) # Change 2
df2 %>%
ggplot(aes(x = value, fill = origin, y = region)) +
geom_bar(stat = "identity", position = 'dodge', color = "white", alpha= 0.8) +
guides(fill = guide_legend(reverse = T)) + # Change 3
scale_fill_manual(values=rev(colors)) # Change 4
Bringing the correct orders:
Is there any less cumbersome way to achieve this?

The issue is that coord_flip() changes the ordering of bars within groups in grouped bar plot:
According to here a hacky way to solve is to put width of position_dodge() to negative,
With scale_x_discrete(limits=rev)+ we get North in correct position:
library(tidyverse)
df %>%
ggplot(aes(x=region, y=value, fill=origin))+
geom_col(position = position_dodge(), width = -0.4)+
scale_fill_manual(values = c("red", "blue")) +
coord_flip()+
scale_x_discrete(limits=rev)+
theme_minimal(base_size=16)+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank())

Coord flip does not flip everything around. Factors are plotted starting from the bottom. Thus, 1990 will be below 2000, and North will be below South.
The simplest way I can see is to simply reverse your factor levels. (when creating your factors).
library(tidyverse)
df <- tibble(
origin = c("2000s", "1990s") %>% rep(2),
region = c("South", "North") %>% rep(2) %>% sort(),
value = 1:4
) %>%
mutate(
## just reverse the factor levels
origin = factor(origin, levels = rev(c("1990s", "2000s"))),
region = factor(region, levels = rev(c("North", "South")))
)
colors <- c("red", "blue")
df %>%
# switched x and y
ggplot(aes(y = region, x = value, fill = origin)) +
geom_bar(stat = "identity", position = "dodge", color = "white", alpha = 0.8) +
## this is to set the correct legend order and mapping to your colors
scale_fill_manual(values = colors, breaks = rev(unique(df$origin)))

Related

removing missing values labels from ggplot

what I want appears to be simple but I can't figure it out: I want to take the NA values from my labs out. Problem is, it's my first time using the "na.value" argument, so I’m not quite sure how to proceed.
(btw, I can’t drop the NAs before plotting because the shapes that are not from the tourist regions will also disappear, and I need the full map.)
I have this code:
mun_tur_shape %>%
filter(abbrev_state == "BA") %>%
ggplot() +
geom_sf(aes(fill=TOURIST_REGION, colour=TOURIST_REGION)) +
scale_fill_manual(
na.value = "grey90"
values = c(viridis::inferno(13)),
aesthetics = c("fill", "colour")
) +
labs(fill = "Região Turística",
colour = "Região Turística"
) +
And this is how it looks:
Plot with NA value
Does anyone know what I can do to omit them?
# here's an sf for reproducible example:
#install.packages(geobr)
df <- geobr::read_state()
df %>%
# creating NA values like my real dataset has
mutate(name_region=case_when(name_region=="Nordeste"~NA_character_,
TRUE~name_region)) %>%
ggplot() +
geom_sf(aes(fill=name_region, colour=name_region)) +
scale_fill_manual(
na.value = "grey90",
values = (viridis::inferno(4)),
aesthetics = c("fill", "colour")
) +
labs(colour = "Regions",
fill = "Regions")
My solution for this (see also NA values ​in choropleth plot legend with ggplot2 in R) is to add first a base layer with all the polygons (no aes) using the colors that I would want to use for NA. After that, you can overlay the layer with aes(). In practice, just add one line on your code
library(dplyr)
library(ggplot2)
library(geobr)
df <- geobr::read_state()
df %>%
# creating NA values like my real dataset has
mutate(name_region = case_when(
name_region == "Nordeste" ~ NA_character_,
TRUE ~ name_region
)) %>%
# Create map
ggplot() +
# Add this line a base layer with no aes, but a NA fill color
geom_sf(fill = "grey50", color = "grey50") +
# end
geom_sf(aes(fill = name_region, colour = name_region)) +
scale_fill_manual(
na.value = "grey90",
values = (viridis::inferno(4)),
aesthetics = c("fill", "colour"),
na.translate = FALSE
) +
labs(
colour = "Regions",
fill = "Regions"
)

Adding points to stacked barplot based on conditions in ggplot

I have a stacked bar plot which shows the frequency of size distributions of different kinds of buildings across several companies.
I am replicating this chart for each company in the dataset and for each company would like to add a dot in the stacks that represent the sizes of the buildings in their company. I was able to set up bar chart and add points to the graph. However, I’m unable to figure out how to place the dots in the correct places.
Here’s my code:
Data<-data.frame(Location = c("HQ", "Plant", "Warehouse", "Office", "HQ","Plant",
"Warehouse","Office","HQ","Plant","Warehouse","Office"),
Company=c("a","a","a","a","b","b","b","b","c","c","c","c"),
Staff=c("Small","Medium","Large","Medium","Small","Medium","Medium","Large","Large","Large","Small","Small"))
ggplot(Data,aes(x=Location,fill=Staff))+
geom_bar(position = 'fill')+
geom_point(aes(y = stat(..count../sum(..count..)),
color = Staff), stat = 'count',
position = position_fill(0.5), size = 5)+
scale_color_manual(values = 'black', limits = "Medium")
Here’s what I have so far.
I would like to figure out how to do this for company "a" so the chart looks something like this:
I’m thinking to start I would need to create a vector that shows the staff bins for that company: Point<-as.character(Data$Staff[Data$Company=="a"])
The following should fit your requirements. (I've also wrapped it in a function for convenience in switching between companies.)
library(dplyr)
show.company.point <- function(company.name) {
p <- ggplot(Data,
aes(x = Location, fill = Staff))+
geom_bar(position = 'fill')+
geom_point(data = . %>%
mutate(alpha = Company == company.name) %>%
group_by(Location, Staff) %>%
summarise(n = n(),
alpha = any(alpha)) %>%
ungroup(),
aes(y = n, alpha = alpha, group = Staff),
position = position_fill(0.5, reverse = F),
size = 5, show.legend = F) +
ggtitle(paste("Company", company.name)) +
scale_alpha_identity()
return(p)
}
show.company.point("a")
show.company.point("b")
show.company.point("c")

Barplot side by side and line charts in the same plot

I want to create in R a plot which contains side by side bars and line charts as follows:
I tried:
Total <- c(584,605,664,711,759,795,863,954,1008,1061,1117,1150)
Infected <- c(366,359,388,402,427,422,462,524,570,560,578,577)
Recovered <- c(212,240,269,301,320,359,385,413,421,483,516,548)
Death <- c(6,6,7,8,12,14,16,17,17,18,23,25)
day <- itemizeDates(startDate="01.04.20", endDate="12.04.20")
df <- data.frame(Day=day, Infected=Infected, Recovered=Recovered, Death=Death, Total=Total)
value_matrix = matrix(, nrow = 2, ncol = 12)
value_matrix[1,] = df$Recovered
value_matrix[2,] = df$Death
plot(c(1:12), df$Total, ylim=c(0,1200), xlim=c(1,12), type = "b", col="peachpuff", xaxt="n", xlab = "", ylab = "")
points(c(1:12), df$Infected, type = "b", col="red")
barplot(value_matrix, beside = TRUE, col = c("green", "black"), width = 0.35, add = TRUE)
But the bar chart does not fit the line chart. I guess it would be easier to use ggplot2, but don't know how. Could anyone help me? Thanks a lot in advance!
With ggplot2, the margins are handled nicely for you, but you'll need the data in two separate long forms. Reshape from wide to long with tidyr::gather, tidyr::pivot_longer, reshape2::melt, reshape, or whatever you prefer.
library(tidyr)
library(ggplot2)
df <- data.frame(
Total = c(584,605,664,711,759,795,863,954,1008,1061,1117,1150),
Infected = c(366,359,388,402,427,422,462,524,570,560,578,577),
Recovered = c(212,240,269,301,320,359,385,413,421,483,516,548),
Death = c(6,6,7,8,12,14,16,17,17,18,23,25),
day = seq(as.Date("2020-04-01"), as.Date("2020-04-12"), by = 'day')
)
ggplot(
tidyr::gather(df, Population, count, Total:Infected),
aes(day, count, color = Population, fill = Population)
) +
geom_line() +
geom_point() +
geom_col(
data = tidyr::gather(df, Population, count, Recovered:Death),
position = 'dodge', show.legend = FALSE
)
Another way to do it is to gather twice before plotting. Not sure if this is easier or harder to understand, but you get the same thing.
df %>%
tidyr::gather(Population, count, Total:Infected) %>%
tidyr::gather(Resolution, count2, Recovered:Death) %>%
ggplot(aes(x = day, y = count, color = Population)) +
geom_line() +
geom_point() +
geom_col(
aes(y = count2, color = Resolution, fill = Resolution),
position = 'dodge', show.legend = FALSE
)
You can actually plot the lines and points without reshaping by making separate calls for each, but to dodge bars (or get legends), you'll definitely need to reshape.

ggplot: remove NA factor level in legend

How can I omit the NA level of a factor from a legend?
From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:
library(nycflights13); library(ggplot2)
flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,
c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
max(flights$tot_delay, na.rm = TRUE)),
labels = c("none", "short","medium","long"))
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:
# default
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4)
# with na.translate = F
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4) +
scale_colour_discrete(na.translate = F)
This works in ggplot2 3.1.0.
You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:
filter(flights, !is.na(delay_class)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
does the trick:
Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_manual( breaks = c("none","short","medium","long"),
values = scales::hue_pal()(4) )
UPDATE: As pointed out in #gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_discrete(na.translate=FALSE)
I like #Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:
na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

Stacked Bar Plot for Temperature vs Home Runs

I am trying to make some changes to my plot, but am having difficulty doing so.
(1) I would like warm, avg, and cold to be filled in as the colors red, yellow, and blue, respectively.
(2) I am trying to make the y-axis read "Count" and have it be horizontally written.
(3) In the legend, I would like the title to be Temperatures, rather than variable
Any help making these changes would be much appreciated along with other suggestions to make the plot look nicer.
df <- read.table(textConnection(
'Statistic Warm Avg Cold
Homers(Away) 1.151 1.028 .841
Homers(Home) 1.202 1.058 .949'), header = TRUE)
library(ggplot2)
library(reshape2)
df <- melt(df, id = 'Statistic')
ggplot(
data = df,
aes(
y = value,
x = Statistic,
group = variable,
shape = variable,
fill = variable
)
) +
geom_bar(stat = "identity")
You are on the right lines by trying to reshape the data into long format. My preference is to use gather from the tidyr package for that. You can also create the variable names Temperatures and Count in the gather step.
The next step is to turn the 3 classes of temperature into a factor, ordered from cold, through average, to warm.
Now you can plot. You want position = "dodge" to get the bars side by side, since it makes no sense to stack the values in a single bar. Fill colours you specify using scale_fill_manual.
You rotate the y-axis title by manipulating axis.title.y.
So putting all of that together (plus a black/white theme):
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(Temperatures, Count, -Statistic) %>%
mutate(Temperatures = factor(Temperatures, c("Cold", "Avg", "Warm"))) %>%
ggplot(aes(Statistic, Count)) +
geom_col(aes(fill = Temperatures), position = "dodge") +
scale_fill_manual(values = c("blue", "yellow", "red")) +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
Result:
I'd question whether Count is a sensible variable name in this case.
You are almost there. To map specific colors to specific factor levels you can use scale_fill_manual and create your own scale:
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
Changing the y axis legend is also easy in ggplot:
ylab("Count") +
And to change the legend title you can use:
labs(fill='TEMPERATURE') +
Giving us:
ggplot(df, aes(y = value, x = Statistic, group= variable, fill = variable)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
labs(fill='TEMPERATURE') +
ylab("Count") +
xlab("") +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))

Resources