Prevent reordering within facet_wrap() - r

I have an issue with my ggplot() reordering the data. I have an example code below. I have data, and reordered the factors in feed to my content, but after the str_extract() in facet_wrap(), the data gets reordered back before I reordered it. Is there a way to prevent that from occurring? For my actual code, it is important for me to use regex within the facet_wrap() in ggplot,
data <- chickwts
data <- mutate(data, time = 1:nrow(data))
lvl <- c("linseed", "meatmeal", "sunflower", "soybean",
"casein", "horsebean")
data$feed <- factor(data$feed, levels = lvl)
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~str_extract(feed,"[a-z]+"))

You could put the factor inside the facet_wrap:
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~ factor(str_extract(feed,"[a-z]+"), levels = lvl))

Related

How can I change the size of a bar in a grouped bar chart when one group has no data? [duplicate]

Is there a way to set a constant width for geom_bar() in the event of missing data in the time series example below? I've tried setting width in aes() with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
Some new options for position_dodge() and the new position_dodge2(), introduced in ggplot2 3.0.0 can help.
You can use preserve = "single" in position_dodge() to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2() changes the way things are centered, centering each set of bars at each x axis location. It has some padding built in, so use padding = 0 to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
The easiest way is to supplement your data set so that every combination is present, even if it has NA as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat with a dataframe with all the combinations of a and b:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
I had the same problem but was looking for a solution that works with the pipe (%>%). Using tidyr::spread and tidyr::gather from the tidyverse does the trick. I use the same data as #Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

Reordering within 3-factor grouped plot

Using ggplot2, I'm attempting to reorder a data representation with 3 factors: condition, sex, and time.
library(ggplot2)
library(dplyr)
DF <- data.frame(value = rnorm(100, 20, sd = 0.1),
cond = c(rep("a",25),rep("b",25),rep("a",25),rep("b",25)),
sex = c(rep("M",50),rep("F",50)),
time = rep(c("1","2"),50)
)
ggplot(data=DF, aes( x = time,
y = value,
fill = cond,
colour = sex,
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
ggtitle("aF,aM,bF,bM") +
theme(legend.position = "top")
Badly ordered plot.
The way ggplot2 automatically orders condition first and interleaves sex poses the issue. It defaults to an interleaved "aF,aM,bF,bM" order regardless of which factor I assign to which aesthetic.
For analysis purposes, my preferred order is "aM,bM,aF,bF". Order sex first and interleave condition. I tried to fix it by converting the 2x2 factor assignments to one group with 4 levels, which gives me complete control over the order:
DF %>% mutate(grp = as.factor(paste0(cond,sex))) -> DF
level_order <- c("aM", "bM", "aF", "bF")
ggplot(data=DF, aes( x = time,
y = value,
fill = factor(grp, level=level_order),
colour = sex
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080","#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40", "grey40", "grey10")) +
ggtitle("aM,bM,aF,bF") +
theme(legend.position = "top")
Ordering OK, bad representation.
However artificial grouping like this has its downsides, subjects are not assigned to a group, they are male/female (can't be changed) and assigned to some condition. Also the plot legend is unnecessarily cluttered, it has 6 keys instead of 4. It doesn't convey that it's 2x2 repeated measures design all that well.
I'm not sure if what I'm trying to do makes sense (I hope this isn't some massive brain fart), any help would be appreciated.
The order in which you place the aesthetics controls the priority of its groupings. Thus if you switch the position of fill and colour you will get the result you are looking for (e.i. you want colour to be grouped first, and then fill)
ggplot(data=DF, aes( x = time,
y = value,
colour = sex,
fill = cond)) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
theme(legend.position = "top")

How to highlight a column in ggplot2

I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()
An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()
You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot

Passing argument to facet grid in function -ggplot

I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)

Resources