Perform transformation inside ggplot2 function to produce negative values - r

In respect to the below code I can produce a stacked bar chart as shown by the first graph.
library(ggplot2)
vehicle<- sample(rep(c("Cars","Cycles","Motobike"),times=c(20,50,30)))
team<-sample(rep(c("TeamA","TeamB"),times=c(50,50)))
df<-data.frame(team,vehicle, stringsAsFactors = FALSE)
ggplot(data = df, aes(x = as.factor (vehicle), fill =team)) +
geom_bar(mapping = aes(y = stat(count)/sum(..count..)*100),
position = "stack")
What I want to do is to produce a transformation within the geom_bar(mapping = aes(y = stat(count)/sum(..count..)*100),position = "stack") part that says if it is team B, then the count becomes a minus number. I want to do this so I can reproduce something like the 2nd graph.
Can anyone help amend the code to get the desired result?
Note: the second graph is created using the code below but I don't want to have to add two separate geom_bars because it means the % is incorrect on the y axis.
ggplot(data = df, aes(x = as.factor (vehicle), fill =team)) +
geom_bar(data = subset(df, team=="TeamA"),
mapping = aes(y = stat(count)/sum(..count..)*100),
position = "stack")+
geom_bar(data = subset(df, team=="TeamB"),
mapping = aes(y = - stat(count)/sum(..count..)*100),
position = "stack") +
labs(x = "", y="")

I think it's easier to prepare the data before you feed it into ggplot. I realize the numbers don't quite match up here but I'll let you deal with that.
library(tidyverse)
library(ggplot2)
vehicle<- sample(rep(c("Cars","Cycles","Motobike"),times=c(20,50,30)))
team<-sample(rep(c("TeamA","TeamB"),times=c(50,50)))
df<-data.frame(team,vehicle, stringsAsFactors = FALSE) %>%
group_by(team, vehicle) %>%
summarize(count = n()) %>%
mutate(newcount = if_else(team == 'TeamA', count, -count))
ggplot(data = df, aes(x = as.factor(vehicle), y = newcount, fill =team)) +
geom_bar(position = "stack", stat ='identity')

I managed to do it by using an ifelse directly in the function which achieved what I was after.
set.seed (105)
vehicle<- sample(rep(c("Cars","Cycles","Motorbike"),times=c(20,50,30)))
team<-sample(rep(c("TeamA","TeamB"),times=c(50,50)))
df<-data.frame(team,vehicle, stringsAsFactors = FALSE)
ggplot(data = df, aes(x = as.factor (vehicle), fill =team,
y= ifelse(test = team == "TeamB",
yes = -1/nrow(df)*100, no = 1/nrow(df)*100)))+
geom_bar(stat="identity")

Related

How can I change the size of a bar in a grouped bar chart when one group has no data? [duplicate]

Is there a way to set a constant width for geom_bar() in the event of missing data in the time series example below? I've tried setting width in aes() with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
Some new options for position_dodge() and the new position_dodge2(), introduced in ggplot2 3.0.0 can help.
You can use preserve = "single" in position_dodge() to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2() changes the way things are centered, centering each set of bars at each x axis location. It has some padding built in, so use padding = 0 to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
The easiest way is to supplement your data set so that every combination is present, even if it has NA as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat with a dataframe with all the combinations of a and b:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
I had the same problem but was looking for a solution that works with the pipe (%>%). Using tidyr::spread and tidyr::gather from the tidyverse does the trick. I use the same data as #Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())

Ggplot grouped column chart with two sets of x labels

I have the following dataset which produces a grouped bar plot:
library(ggplot2)
library(dplyr)
expand.grid(gender=c("M","F"),
education=c("HS","College","Advanced"),
value = sample(1:20,8, replace = T)) %>%
ggplot(aes(x = education, y = value, fill = gender))+
geom_col(position = position_dodge())
But instead of having a legend I want the labels to be on the x axis like this:
Is this possible?
Thanks
as camille already mentioned in a comment, you can use facet_wrap
expand.grid(gender=c("M","F"),
education=c("HS","College","Advanced"),
value = sample(1:20,8, replace = T)) %>%
ggplot(aes(x = gender, y = value, fill = gender))+
geom_col(position = position_dodge()) +
facet_wrap(~education)
The resulting plot looks like this:
If you want to remove the legend, just add theme(legend.position="none")

Plot many variables

Having a dataframe like this one:
From a dataframe like this one:
data <- data.frame(year = c(2010,2011,2012,2010,2011,2012),
name = c("stock1","stock1","stock1","stock2","stock2","stock2"),
value = c(0,3,1,4,1,3))
I would like to create a plot and I use this:
library(ggplot2)
ggplot(data=data, xName="year", groupName="name", brewerPalette="Blues")
but I can't receive the plot. Anything wrong in the call?
I think you need something like this:
library(ggplot2)
library(dplyr)
library(RColorBrewer)
df %>%
group_by(name) %>%
ggplot(aes(year,value,fill=name))+
geom_col()+
scale_fill_brewer(palette = "Blues")
If you want a grouped bar plot (as I guessed from your code), this code may be helpful:
ggplot(data = data, aes(x = as.factor(year), y = value, fill = name)) +
geom_bar(stat = "identity", position = position_dodge(0.8), width = 0.7) +
scale_fill_brewer(palette = "Blues")

ggplot2() bar chart fill argument

I've got a data frame with two categorical variables called verified and procedure.
I'd like to make a bar chart with procedure on the x-axis, and the corresponding percentages rather than counts on the y-axis. Furthermore, I'd like for verified to be the fill of the bars.
The problem's that when I've tried using the fill argument it hasn't worked. My current code gets me bars that are all grey with a black line (despite the absence of a fill argument the black line seems to indicate the levels of verified???). Instead I'd like the levels to be in different colours.
Thanks!
starting point (df):
df <- data.frame(verified=c("small","large","small","small","large","small","small","large","small"),procedure=c(1,2,1,2,1,2,2,2,2))
current code:
library(dplyr)
library(gglot2)
df %>%
count(procedure,verified) %>%
mutate(prop = round((n / sum(n))*100),2) %>%
group_by(procedure) %>%
ggplot(aes(x = procedure, y = prop)) +
geom_bar(stat = "identity",colour="black")
just add fill = verified to your initial aes or within your geom_bar
# common elements
g_df <- df %>%
count(procedure, verified) %>%
mutate(prop = round((n / sum(n)) * 100), 2) %>%
group_by(procedure)
# fill added to initial aes
g1 <- ggplot(g_df, aes(x = procedure, y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black")
# fill added to geom_bar
g2 <- ggplot(aes(x = procedure, y = prop)) +
geom_bar(aes(fill = verified), stat = "identity", colour = "black")
Both g1 and g2 produce the same plot below
As suggested by eipi10 in the comments to my answer, you could clean up the xaxis by making it a factor, a modification of their code below.
df %>%
count(procedure, verified) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = factor(procedure), y = prop, fill = verified)) +
geom_bar(stat = "identity", colour = "black") +
labs(x = "procedure", y = "percent")
to produce

ggplot: remove NA factor level in legend

How can I omit the NA level of a factor from a legend?
From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:
library(nycflights13); library(ggplot2)
flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,
c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
max(flights$tot_delay, na.rm = TRUE)),
labels = c("none", "short","medium","long"))
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:
# default
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4)
# with na.translate = F
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4) +
scale_colour_discrete(na.translate = F)
This works in ggplot2 3.1.0.
You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:
filter(flights, !is.na(delay_class)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
does the trick:
Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_manual( breaks = c("none","short","medium","long"),
values = scales::hue_pal()(4) )
UPDATE: As pointed out in #gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_discrete(na.translate=FALSE)
I like #Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:
na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

Resources