Making ggplot geom_boxplot - r

Boxplot in ggplot
df %>%
mutate(Bezettingsgraad = Bezetting_gem / Capaciteit *100 ) %>%
group_by(Stadion)
Code for the boxplot
df %>%
mutate(Bezettingsgraad = Bezetting_gem / Capaciteit *100 ) %>%
group_by(Provincie) %>%
ggplot(Provincie, aes(x=Provincie, y=Bezetting_gem, color=dose)) +
geom_boxplot()
In the image you see in yellow the rows that are being used
Error

Before the mapping aesthetics you have included the variable Provincie in the place where your data should be . Besides you are already piping your data into your ggplot call via the %>% operator.
Try deleting Provincie

Related

ggplot2: Can you acess the .data argument in subsequent layers?

I have multiple graphs I'm generating with a data set. I preform many operations on the data (filtering rows, aggregating rows, calculations over columns, etc.) before passing on the result to ggplot(). I want to access the data I passed on to ggplot() in subsequent ggplot layers and facets so I can have more control over the resulting graph and to include some characteristics of the data in the plot itself, like for example the number of observations.
Here is a reproducible example:
library(tidyverse)
cars <- mtcars
# Normal scatter plot
cars %>%
filter(
# Many complicated operations
) %>%
group_by(
# More complicated operations
across()
) %>%
summarise(
# Even more complicated operations
n = n()
) %>%
ggplot(aes(x = mpg, y = qsec)) +
geom_point() +
# Join the dots but only if mpg < 20
geom_line(data = .data %>% filter(mpg < 20)) +
# Include the total number of observations in the graph
labs(caption = paste("N. obs =", NROW(.data)))
one could of course create a a separate data set before passing that onto ggplot and then reference that data set throughout (as in the example bellow). However, this is much more cumbersome as you need to save (and later remove) a data set for each graph and run two separate commands for just one graph.
I want to know if there is something that can be done that's more akin to the first example using .data (which obviously doesn't actually work).
library(tidyverse)
cars <- mtcars
tmp <- cars %>%
filter(
# Many complicated operations
) %>%
group_by(
# More complicated operations
across()
) %>%
summarise(
# Even more complicated operations
n = n()
)
tmp %>%
ggplot(aes(x = mpg, y = qsec)) +
geom_point() +
# Join the dots but only if mpg < 20
geom_line(data = tmp %>% filter(mpg < 20)) +
# Include the total number of observations in the graph
labs(caption = paste("N. obs =", NROW(tmp)))
Thanks for your help!
In the help page for each geom_ it helpfully gives a standard way:
A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data. A function can be created from a formula (e.g. ~ head(.x, 10)).
For labs on the other hand you can use the . placeholders in piping, but you have to a) give the . as the data argument in the first place and b) wrap the whole thing in curly braces to recognise the later ..
So for example:
library(tidyverse)
cars <- mtcars
# Normal scatter plot
cars %>%
filter() %>%
group_by(across()) %>%
summarise(n = n()) %>%
{
ggplot(., aes(x = mpg, y = qsec)) +
geom_point() +
geom_line(data = ~ filter(.x, mpg < 20)) +
labs(caption = paste("N. obs =", NROW(.)))
}
Or if you don't like the purrr formula syntax, then the flashy new R anonymous functions work too:
geom_line(data = \(x) filter(x, mpg < 20)) +
Unfortunately the labs function doesn't seem to have an explicit way of testing whether data is shuffling invisibly through the ggplot stack as by-and-large it usually can get on with its job without touching the main data. These are some ways around this.

How to improve this graph with multiple lines in ggplo2?

Example dataframe: datafame.RData
I would like to create a chart below with these automated interactions. Ex. Changing the variable the average calculations are automatically remade and changed in the graph. For example Pais 'n' presents NA.
Here is an example of the expected chat in the output of ggplo2.
What I managed to do in R was this:
mydata %>%
dplyr::filter(Region %in% 'World median') %>%
dplyr::select(year,value) %>%
ggplot() +
aes(year,value, group=1,color="World median")+
geom_line()+
geom_line(data=mydata %>%
dplyr::filter(Country %in% 'Canada') %>%
dplyr::select(year,value),
aes(year, value, group=1, color="Canada"))+
geom_line(data=mydata %>%
dplyr::filter(Country %in% 'Brazil') %>%
dplyr::select(year,value),
aes(year, value, group=1, color="Brazil"))
The result was the one below. But if you have any suggestions on how to do better using ggplot I appreciate it.

Apply string function to (all) labs-type labels in a plot

I'm looking for a way to apply a function to either specified labels, or to all labels that are included in the plot. The goal is to have neat human readable labels that derive from the default labels, without having to specify each.
To demonstrate what I am looking for in terms of the input variable names and the output, I am including an example based on the starwars data set, that uses the versatile snakecase::to_sentence_case() function, but this could apply to any function, including ones that expand short variable names in pre-determined ways:
library(tidyverse)
library(snakecase)
starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() +
labs(
x = to_sentence_case("height"),
y = to_sentence_case("mass"),
color = to_sentence_case("species"),
size = to_sentence_case("birth_year")
)
Which produces the following graph:
The graph is the desired output, but requires that each of the labels be specified by hand, increasing the possibility of error if the variables are later changed. Note that if I had not specified the labels, all the labels would have been applied automatically, but with the variable names instead of the prettier versions.
This issue seems to be somewhat related to what the labeller() function is intended for, but it seems that it only applies to facetting. Another related issue is raised in this question. However, both of these seem to apply only to values contained within the data, not to the variable names that are being used in the plot, which is what I am looking for.
The very helpful answer by #z-lin demonstrated to me a simple way to do this by simply modifying the plot object before printing.
The intended result can be achieved with the help of gg_apply_labs(), a short function that will apply an arbitrary string processing function to the $labels of a plot object. The resulting code should be a self-contained illustration of this approach:
# Packages
library(tidyverse)
library(snakecase)
# This applies fun to each label present in the plot object
#
# fun should accept and return character vectors, it can either be a simple
# prettyfying function or it can perform more complex lookup to replace
# variable names with variable labels
gg_apply_labs <- function(p, fun) {
p$labels <- lapply(p$labels, fun)
p
}
# This gives the intended result
# Note: The plot is assigned to a named variable before piping to apply_labs()
p <- starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point()
p %>% gg_apply_labs(to_sentence_case)
# This also gives the intended result, in a single pipeline
# Note: It is important to put in the extra parentheses!
(starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point()) %>%
gg_apply_labs(to_sentence_case)
# This DOES NOT give the intended result
# Note: The issue is probably order precedence
starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() %>%
gg_apply_labs(to_sentence_case)
A simple solution is to pipe through rename_all (or rename_if if you want more control) before plotting:
library(tidyverse)
library(snakecase)
starwars %>%
filter(mass<1000) %>%
mutate(species=species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
rename_all(to_sentence_case) %>%
#rename_if(is.character, to_sentence_case) %>%
ggplot(aes(Height, Mass, color=Species, size=`Birth year`)) +
geom_point()
#> Warning: Removed 23 rows containing missing values (geom_point).
Created on 2019-11-25 by the reprex package (v0.3.0)
Note, though, that the variables given to aes in ggplot in this case must be modified to match the modified sentence case variable names.
You can modify a ggplot object's appearance at the point of printing / plotting it, without affecting the original plot object, using trace:
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
<whatever string function you desire>)))
This will change the appearance of all existing / new ggplot objects you wish to plot / save, until you turn off the trace via either untrace(...) or tracingState(on = FALSE).
Illustration
Create a normal plot with default labels in lower case:
library(tidyverse)
p <- starwars %>%
filter(mass < 1000) %>%
mutate(species=species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() +
theme_bw()
p # if we print the plot now, all labels will be lower-case
Apply a function to modify the appearance of all labels:
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_sentence_case)))
p # all labels will be in sentence case
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_screaming_snake_case)))
p # all labels will be in upper case
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_random_case)))
p # all letters in all labels may be in upper / lower case randomly
# (exact order can change every time we print the plot again, unless we set the same
# random seed for reproducibility)
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
function(x) paste("!!!", x, "$$$"))))
p # all labels now have "!!!" in front & "$$$" behind (this is a demonstration for
# an arbitrary user-defined function, not a demonstration of good taste in labels)
Toggle between applying & not applying the function:
tracingState(on = FALSE)
p # back to sanity, temporarily
tracingState(on = TRUE)
p # plot labels are affected by the function again
untrace(ggplot2:::ggplot_build.ggplot)
p # back to sanity, permanently

dplyr and ggplot piping is not working as expected

I find no solution for these two following issues:
First I try this:
library(tidyverse)
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) + geom_point(shape=group)
Error in layer(data = data, mapping = mapping, stat = stat, geom =
GeomPoint,:object 'group' not found
which is obviously not working. But using something like this .$group is also not successfull. Of note, I have to specifiy the shape outside from aes()
The second problem is this. I'm not able to call a saved ggplot (gg) within a pipe.
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) + geom_point()
mtcars %>%
filter(vs == 0) %>%
gg + geom_point(aes(x=carb, y=drat), size = 4)
Error in gg(.) : could not find function "gg"
Thanks for your help!
Edit
After a long time I found a solution here. One has to set the complete ggplot term in {}.
mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>% {
ggplot(.,aes(carb,drat)) +
geom_point(shape=.$group)}
If you wrap your shape definition in aes() you can get the desired behavior. To use shape outside of aes() you can pass it a single value (ie shape=1). Also note that group is converted to a discrete var, geom_point throws an error when you pass a continuous var to shape.
library(tidyverse)
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) +
geom_point(aes(shape=as.factor(group)))
gg
Second, the %>% operator, when called as lhs %>% rhs, assumes that the rhs is a function. So as the error shows, you are calling gg as a function. Calling a plot as a function on a dataframe (ie gg(mtcars)) isnt a valid operation.
See #docendo discimus comment on the question for how to use {} to accomplish adding a layer to an existing ggplot object from a magrittr pipeline.

Position_fill function equivalent in ggvis?

Trying to replicate the ggplot function position="fill" in ggvis. I use this handy function all the time in the presentation of results. Reproducible example successfully performed in ggplot2 + the ggvis code. Can it be done using the scale_numeric function?
library(ggplot2)
p <- ggplot(mtcars, aes(x=factor(cyl), fill=factor(vs)))
p+geom_bar()
p+geom_bar(position="fill")
library(ggvis)
q <- mtcars %>%
ggvis(~factor(cyl), fill = ~factor(vs))%>%
layer_bars()
# Something like this?
q %>% scale_numeric("y", domain = c(0,1))
I think that to do this sort of thing with ggvis you have to do the heavy data reshaping lifting before sending it to ggvis. ggplot2's geom_bar handily does a lot of calculations (counting things up, weighting them, etc) for you that you need to do explicitly yourself in ggvis. So try something like the below (there may be more elegant ways):
mtcars %>%
mutate(cyl=factor(cyl), vs=as.factor(vs)) %>%
group_by(cyl, vs) %>%
summarise(count=length(mpg)) %>%
group_by(cyl) %>%
mutate(proportion = count / sum(count)) %>%
ggvis(x= ~cyl, y = ~proportion, fill = ~vs) %>%
layer_bars()

Resources