Overlaying geom_bar and missing values - r

I want to overlay 3 geom_bar to make clear an evolution over 3 years.
My data is as follows for each year:
Example : PerfDist2021 (my dataframe for 2021)
Districts
Perf
1
40
2
30
3
60
On my Yaxis I have the performance (in %) and on the Xaxis I have the number corresponding to the district (from 1 to 25 and there is also a 31th).
I made this :
ggplot(data=NULL, aes(Districts, Perf)) +
geom_bar(aes(fill = "2019"), data = PerfDist2019, stat="identity" ,alpha = 0.5, col="red") +
geom_bar(aes(fill = "2020" ), data = PerfDist2020, stat="identity", alpha = 0.5, col="green") +
geom_bar(aes(fill = "2021" ), data = PerfDist2021, stat="identity", alpha = 0.5, col="blue")
But first, I can't see all my districts, I don't know how to get them
all visible it's like R erase some or is just not precise with my
Xaxis (see picture in link).
Secondly, I don't know how to change the color of the geom_bar, I can
just change the color of the frame of the bar with col=... , and the
data is not very readable this way.
Third, the colours blend together, it is sometimes hard to
distinguish the three. I tried with several combination of colors, it
is always the same. Is there a way to avoir this issue of mixing
colors ? Thanks you
Thanks you for your help !
PS : You can ask for any precision !

fill is the aesthetic that controls the color of the bars. color controls colors of lines, in this case, the frames, as you noted. So you want to delete the col = arguments and add scale_fill_manual to associate which color each custom fill name should have.
Also, the three repeated geom_bar isn't too good, if you had 12 years you wouldn't want to have 12 geom_bar. To change that, you can rbind your three datasets, and add a column specifying from which year it comes from, and than saying to ggplot to fill the bars following that column.
Lastly, as #h45 said, you can change the type of Districts to factor to avoid the missing levels.
PerfDist = rbind(
cbind(PerfDist2019, Year = "2019"),
cbind(PerfDist2019, Year = "2020"),
cbind(PerfDist2019, Year = "2021"))
PerfDist$Districts <- factor(PerfDist$Districts, levels = 1:31)
ggplot(PerfDist, aes(Districts, Perf, fill = Year)) +
geom_bar(stat="identity", position = "identity", alpha = 0.5) +
scale_fill_manual(c("red", "green", "blue"))
Thats it for the answer, now just a note for you to improve your next answers, please read how to make a great r reproducible example.

Related

ggplot2 geom_linerange remove whitespace between rows

Am attempting to create a plot similar to a strip chart recorder showing outage data. Outage Severity is Major and Minor. Plot has a large amount of vertical white space between the two rows and before and after that I would like to remove to create a compact two-row chart.
dataframe is:
> head(dfsub)
StartDateTime EndDateTime Outage.DUR Outage.Severity
1 2021-07-01T00:23:33.0000000 2021-07-01T00:25:26.0000000 1.8833333 Minor
2 2021-07-01T00:25:26.0000000 2021-07-01T00:31:33.0000000 6.1166667 Major
3 2021-07-01T00:31:33.0000000 2021-07-01T00:40:34.0000000 9.0166667 Major
4 2021-07-01T00:40:34.0000000 2021-07-01T00:42:57.0000000 2.3833333 Minor
5 2021-07-01T00:42:57.0000000 2021-07-01T00:43:49.0000000 0.8666667 Minor
6 2021-07-01T00:43:49.0000000 2021-07-01T00:45:35.0000000 1.7666667 Minor
R Code I am running
ggplot(dfsub) +
geom_linerange(aes(y = Outage.Severity,
xmin = StartDateTime,
xmax = EndDateTime,
colour = as.factor(Outage.Severity)
),
show.legend = FALSE,
size = 50) +
scale_color_manual(values = c("red", "yellow")) +
theme(legend.position = "none") +
theme_test()
generates this plot
Two suggestions.
You didn't ask about this, but your x-axis is broken, using time (which is a continuous thing) in a categorical sense. Note that R and ggplot2 are treating the current columns as strings not timestamps. This is easily resolved:
dfsub[c("StartDateTime", "EndDateTime")] <-
lapply(dfsub[c("StartDateTime", "EndDateTime")], as.POSIXct, format="%Y-%m-%dT%H:%M:%OS", tz="UTC")
I don't think you're going to get the fine control over blank space between the reds and yellows using geom_linerange, I suggest geom_rect as an option. With that, remove size=, and we'll need to control ymin= and ymax=. This benefits from setting Outage.Severity to a factor; while not completely necessary, it's common for this work to then come back with "how do I change the order of the y-axis categories?", for which the only (sane) response is to convert them to factors and control their levels=. We also need to add fill=, which geom_linerange did not need.
dfsub$Outage.Severity <- factor(dfsub$Outage.Severity) # add 'levels=' if you want to control the order
From here, knowing that categorical data are plotted on integers, we'll fill the gap between them by extending their rectangles +/- 0.48 (arbitrary, but should likely be close to but not at/beyond 0.5).
ggplot(dfsub) +
geom_rect(aes(ymin = as.numeric(Outage.Severity)-0.48,
ymax = as.numeric(Outage.Severity)+0.48,
xmin = StartDateTime,
xmax = EndDateTime,
colour = Outage.Severity,
fill = Outage.Severity),
show.legend = FALSE) +
scale_y_continuous(breaks = unique(as.numeric(dfsub$Outage.Severity)), labels = unique(dfsub$Outage.Severity)) +
scale_color_manual(values = c("Major"="red", "Minor"="yellow")) +
scale_fill_manual(values = c("Major"="red", "Minor"="yellow")) +
theme(legend.position = "none") +
theme_test()

Grouped bar plot column width uneven due to no data

I am trying to display a grouped bar plot for my dataset, however, due to some months have no data (no income), the column width is showing up as unequal and I was hoping to have the same column width regardless if some states have no income. Notice how the bar plot is grouped for January, something grouped like that across all months although other states have no income (I'd like to have them spaced out if some states do not have any income). Any help will be much appreciated, thanks.
library(ggplot2)
plot = ggplot(Checkouts, aes(fill=Checkouts$State, x=Checkouts$Month, y=Checkouts$Income)) +
geom_bar(colour = "black", stat = "identity")
My Bar Plot
Checkouts table/data
There are two ways that this can be done.
If you are using the latest version of ggplot2(from 2.2.1 I believe), there is a parameter called preserve in the function position_dodge which preserves the vertical position and adjust only the horizontal position. Here is the code for it.
Code:
import(ggplot2)
plot = ggplot(Checkouts, aes(fill=Checkouts$State, x=Checkouts$Month, y=Checkouts$Income)) +
geom_bar(colour = "black", stat = "identity", position = position_dodge(preserve = 'single'))
Another way is to precompute and add dummy rows for each of the missing. using table is the best solution.
You are looking for position_dodge2(preserve = "single")(https://ggplot2.tidyverse.org/reference/position_dodge.html).
library(ggplot2)
plot = ggplot(Checkouts, aes(fill = State, x = Month, y= Income)) +
geom_bar(colour = "black", stat = "identity",
position = position_dodge2(preserve = "single"))
Also, you don't need to specify the columns to the data frame with $ in ggplot(). For example, Checkouts$State can be replaced with State.

ggplot - retaining axis label coloring with reordered data

I'm making a horizontal bar chart where each observation has a numeric count variable associated with it. I want to show the bars for each variable ordered by (descending) count, which is no problem. However I also want to highlight the variable name based on a third dichotomous variable. I found how to do the latter in another post on here, but I have been unable to combine the two. Here's an example of what I mean:
library(ggplot2)
testdata<-data.frame("var"=c('V1','V2','V3','V4'),"cat"=c('Y','N','Y','N'),
"count"=c(1,5,2,10))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
That's the horizontal bar chart with highlighting, which works fine - V1/V3 are red and V2/V4 are green.
However when I try to sort it doesn't keep the groups:
ggplot(testdata, aes(reorder(var,count),count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
In this second graph, V2 and V3 are the wrong color.
I also tried sorting the data by count first, and then using the first ggplot statement, however it still plots the data by variable name instead of count (and even if it did work, I would have to resolve tied count values). Any ideas? What I really need is for the dataframe in the "ifelse" colour to match the dataframe in the aes statement. I tried using the data frame that was sorted by descending count in the colour statement, but that also did not work.
Thanks
edit: more code
testdata$var = with(testdata, reorder(var, count))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
My comment was partially incorrect. The order of the levels is the only thing that matters for the order of the axis, but when we do ifelse(testdata$cat == "N", "darkgreen", "darkred") of course it goes in the order of the data! So we need the order of the levels and the order of the data to be the same:
testdata$var = with(testdata, reorder(var, count))
testdata = testdata[order(testdata$var), ]
ggplot(testdata, aes(var, count)) +
geom_bar(
stat = 'identity',
colour = 'blue',
fill = 'blue',
width = 0.3
) +
coord_flip(ylim = c(0, 10)) + theme_classic() +
theme(axis.ticks.y = element_blank()) +
theme(axis.text.y =
element_text(
colour = ifelse(testdata$cat == "N", "darkgreen", "darkred"),
size = 15
))

How to change color of only one stack in ggplot stacked barplot?

I have a ggplot stacked barplot like below
I want to change the color combination of stacks only for dec , for example c("red", "green"). My desired output is
I tried
ggplot() +
geom_bar(data = x1, aes(y = values, x = months, fill = variable), stat="identity") +
scale_fill_manual(values = c("orange", "blue")) +
geom_bar(data = x2, aes(y = values, x = months, fill = variable), stat="identity") +
scale_fill_manual(values = c("red", "green"))
It takes only the last scale_fill_manual values.
If it was a regular barplot, changing fill in geom_bar works. I can't figure out how to do this for stacked plot without creating extra values for legend.
In my code, x1 contains values for jan to nov and x2 contains values for dec. Both are subsets of whole data.
It's hard to give the code without the dataset, but what you want to try to do is create a new type column (for the full data set) that will distinguish between the Dec column and the other ones, so you'll have four types: chickens, eggs, chickens-December, eggs-December. fill based on this new column, and you'll get new colours for the December bars.
You can then use
scale_fill_manual(breaks=c("chickens","eggs"),
values=c("green", "orange", "red", "blue"))
to only include those values you want in the legend (pardon my random colour choice - use better ones for your graph).

Dark to light colours based on value ggplot2

I am trying to customize the colours using ggplot2. The function I wrote is as follows:
library(tidyverse)
spaghetti_plot_multiple <- function(input, MV, item_level){
MV <- enquo(MV)
titles <- enquo(item_level)
input %>%
filter(!!(MV) == item_level) %>%
mutate(first_answer = first_answer) %>%
ggplot(.,aes( x = time, y = jitter(Answer), group = ID)) +
geom_line(aes(colour = first_answer)) +
labs(title = titles ,x = 'Time', y = 'Answer', colour = 'Answer given at time 0') +
facet_wrap(~ ID, scales = "free_x")+
theme(strip.text = element_text(size = 8)) +
scale_color_manual(values = c('red', 'blue', 'brown', 'purple', 'black'))
}
This however doesn't work, but I can't seem to figure out why scale_color_manual(..) values doesn't work. The current plot I am using is:
This is somewhat in line with what I am trying to achieve: a dark color for values 1-3 (i.e. based on first_answer which ranges from 1 to 5) and lighter ones for 4 and 5. The reason is simply because there are many more lines with a value of 4 or 5 and I want to be able to see the direction of lines across time.
EDIT The image is the plot I currently have. Although it somewhat resembles what I'd like to get, I'd much rather set the colors myself or use some function that chooses colors to enhance the plotting visibility (the lines in the plot) automatically.
You can specify color gradients with 'scale_x_gradient' scale_x_gradient2 or scale_x_gradientn
(x can be fill or color)
Caveat when specifying the color values with values = c(...)): values() assigns colours based on their position within c(0,1). You therefore need to scale the values from your vector which you want to have as breaks to the range c(0,1).
Re your question which palette best to use for 5 distinct lines: I think best is to manually specify the colours as you have done. I often use hex codes instead. I personally look those up at
html color codes.

Resources