How to suppress warnings when plotting with ggplot - r

When passing missing values to ggplot, it's very kind, and warns us that they are present. This is acceptable in an interactive session, but when writing reports, you do not the output get cluttered with warnings, especially if there's many of them. Below example has one label missing, which produces a warning.
library(ggplot2)
library(reshape2)
mydf <- data.frame(
species = sample(c("A", "B"), 100, replace = TRUE),
lvl = factor(sample(1:3, 100, replace = TRUE))
)
labs <- melt(with(mydf, table(species, lvl)))
names(labs) <- c("species", "lvl", "value")
labs[3, "value"] <- NA
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value, label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
If we wrap suppressWarnings around the last expression, we get a summary of how many warnings there were. For the sake of argument, let's say that this isn't acceptable (but is indeed very honest and correct). How to (completely) suppress warnings when printing a ggplot2 object?

You need to suppressWarnings() around the print() call, not the creation of the ggplot() object:
R> suppressWarnings(print(
+ ggplot(mydf, aes(x = species)) +
+ stat_bin() +
+ geom_text(data = labs, aes(x = species, y = value,
+ label = value, vjust = -0.5)) +
+ facet_wrap(~ lvl)))
R>
It might be easier to assign the final plot to an object and then print().
plt <- ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
R> suppressWarnings(print(plt))
R>
The reason for the behaviour is that the warnings are only generated when the plot is actually drawn, not when the object representing the plot is created. R will auto print during interactive usage, so whilst
R> suppressWarnings(plt)
Warning message:
Removed 1 rows containing missing values (geom_text).
doesn't work because, in effect, you are calling print(suppressWarnings(plt)), whereas
R> suppressWarnings(print(plt))
R>
does work because suppressWarnings() can capture the warnings arising from the print() call.

A more targeted plot-by-plot approach would be to add na.rm=TRUE to your plot calls.
E.g.:
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5), na.rm=TRUE) +
facet_wrap(~ lvl)

In your question, you mention report writing, so it might be better to set the global warning level:
options(warn=-1)
the default is:
options(warn=0)

Related

ggplot how to edit axis labels in R

I have a dataframe to plot where y_axis variable is a character one. I want to take only the last part of character with '_' as separation.
Here an example with iris dataset. As you can see, all y_axis labels are the same. How can I do it? thanks
iris$trial = paste('hello', 'good_bye', iris$Sepal.Length, sep = '_')
myfun = function(x) {
tail(unlist(strsplit(x, '_')), n = 1)
}
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = function(x) myfun(x)) +
theme_bw()
It seems to me that you function is only applied to the first row of the column. That value is replicated. Using lapply returns all the unique values. However, I don't know if it makes sense in this example without making it numeric (and sorting it) so you might want to add that as well.
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = lapply(iris$trial, myfun)) +
theme_bw()
You can make use of regex instead to extract the required value.
library(ggplot2)
#This removes everything until the last underscore
myfun = function(x) sub('.*_', '', x)
ggplot(iris, aes(x = Species, y = trial, color = Species)) +
geom_point() +
scale_y_discrete(labels = myfun) +
theme_bw()
If you want to extract numbers from y-axis value, you can also use scale_y_discrete(labels = readr::parse_number).

geom_text / geom_label with the bquote function

My data :
dat <- data_frame(x = c(1,2,3,4,5,6), y = c(2,2,2,6,2,2))
I wish to display this expression beside the point (x=4,y=6) :
expression <- bquote(paste(frac(a[z], b[z]), " = ", .(dat[which.max(dat$y),"y"] %>% as.numeric())))
But, when I am using this expression with ggplot :
ggplot() +
geom_point(data = dat, aes(x = x, y = y)) +
geom_label(data = dat[which.max(dat$y),], aes(x = x, y = y, label = expression))
I get this error message :
Error: Aesthetics must be either length 1 or the same as the data (1): label
You could use the following code (keeping your definitions of the data and the expression):
Not related to your question, but: it is always better to define aesthetics in the ggplot-call and get it reused in the subsequent function calls. If needed, you may override the definitions, like done below in geom_label
ggplot(data = dat, aes(x = x, y = y)) +
geom_point() +
geom_label(data = dat[4,], label = deparse(expression), parse = TRUE,
hjust = 0, nudge_x = .1)
hjust and nudge_x are used to position the label relative to the point. One could argue to use nudge_y as well to get the whole label in the picture.
yielding this plot:
Please let me know whether this is what you want.

Passing argument to facet grid in function -ggplot

I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)

ggplot: remove NA factor level in legend

How can I omit the NA level of a factor from a legend?
From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:
library(nycflights13); library(ggplot2)
flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,
c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
max(flights$tot_delay, na.rm = TRUE)),
labels = c("none", "short","medium","long"))
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:
# default
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4)
# with na.translate = F
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4) +
scale_colour_discrete(na.translate = F)
This works in ggplot2 3.1.0.
You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:
filter(flights, !is.na(delay_class)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
does the trick:
Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_manual( breaks = c("none","short","medium","long"),
values = scales::hue_pal()(4) )
UPDATE: As pointed out in #gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_discrete(na.translate=FALSE)
I like #Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:
na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

ggplot2 warning about missings that I can't disable with `na.rm=T`

When you plot something using ggplot2, it warns you if it auto-removes missings.
I would love to be able to disable that specific warning or to set the default of na.rm to true system-wide, but that's not possible AFAIK.
I know I can disable it by specifying na.rm=T for each geom that I use. But this fails when ggplot generates further geoms that I don't explicitly specify. In the example below I would get three warnings per plot using my original data (10 when I facet it, so you can see this gets annoying in a knitr report).
I can suppress two warnings with na.rm=T, but the third one about geom_segment I can't. Incidentally it also occurs with mtcars, so I used that as an example.
Warning message:
Removed 23 rows containing missing values (geom_segment).
ggplot(data=mtcars, aes(x = disp, y = wt)) +
geom_linerange(stat = "summary", fun.data = "median_hilow", colour = "#aec05d", na.rm=T) +
geom_pointrange(stat = "summary", fun.data = "mean_cl_boot", colour = "#6c92b2", na.rm=T)
Until I figure this out I can use warning=FALSE for the offending chunks, but I don't really like that since it might suppress warnings that I do care about. I could also use na.omit on the dataset but that's a lot of work and syntax of figuring out which variables I'll use in the plot.
I guess the only way to avoid this is not to use stat_summary, but calculate the summary statistics yourself. For your example that's no problem, but I'll admit that this is not a very satisfactory solution in general.
# load dplyr package used to calculate summary
require(dplyr)
# calculate summary statistics
df <- mtcars %>% group_by(disp) %>% do(mean_cl_boot(.$wt))
# use geom_point and geom_segment with na.rm=TRUE
ggplot(data=mtcars, aes(x = disp, y = wt)) +
geom_linerange(stat = "summary", fun.data = "median_hilow", colour = "#aec05d") +
geom_point(data = df, aes(x = disp, y = y), colour = "#6c92b2") +
geom_segment(data = df, aes(x = disp, xend = disp, y = ymin, yend = ymax), colour = "#6c92b2", na.rm=TRUE)
Alternatively, you can write your own version of mean_cl_boot. If ymin or ymax are NA just set them to the value of y.
# your summary function
my_mean_cl_boot <- function(x, ...){
res <- mean_cl_boot(x, ...)
res[is.na(res$ymin), "ymin"] <- res[is.na(res$ymin), "y"]
res[is.na(res$ymax), "ymax"] <- res[is.na(res$ymax), "y"]
na.omit(res)
}
# plotting command
ggplot(data=mtcars, aes(x = disp, y = wt)) +
geom_linerange(stat = "summary", fun.data = "median_hilow", colour = "#aec05d", na.rm=T) +
geom_pointrange(stat = "summary", fun.data = "my_mean_cl_boot", colour = "#6c92b2", na.rm=T)

Resources