I've got a grouped dataframe generated in dplyr where each group reflects a unique combination of factor variable levels. I'd like to plot the different groups using code similar to this post. However, I can't figure out how to include two (or more) variables in the title of my plots, which is a hassle since I've got a bunch of different combinations.
Fake data and plotting code:
library(dplyr)
library(ggplot2)
spiris<-iris
spiris$site<-as.factor(rep(c("A","B","C")))
spiris$year<-as.factor(rep(2012:2016))
spiris$treatment<-as.factor(rep(1:2))
g<-spiris %>%
group_by(site, Species) %>%
do(plots=ggplot(data=.) +
aes(x=Petal.Width)+geom_histogram()+
facet_grid(treatment~year))
##Need code for title here
g[[3]] ##view plots
I need the title of each plot to reflect both "site" and "Species". Any ideas?
Use split() %>% purrr::map2() instead of group_by() %>% do() like this:
spiris %>%
split(list(.$site, .$Species)) %>%
purrr::map2(.y = names(.),
~ ggplot(data=., aes(x=Petal.Width)) +
geom_histogram()+
facet_grid(treatment~year) +
labs(title = .y) )
You just need to set the title with ggtitle():
g <- spiris %>% group_by(site, Species) %>% do(plots = ggplot(data = .) +
aes(x = Petal.Width) + geom_histogram() + facet_grid(treatment ~
year) + ggtitle(paste(.$Species,.$site,sep=" - ")))
Related
I have created a set of ggplots using a grouped dataframe and the map function and I would like to extract the plots to be able to manipulate them individually.
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>% group_by(manufacturer) %>% nest() %>%
mutate(plots= map(.x=data, ~plot(.x, manufacturer)))
nissan <- plots %>% filter(manufacturer == "nissan") %>% pull(plots)
nissan
nissan + labs(title = "Nissan")
In this case, "nissan" is a list object and I am not able to manipulate it. How do I extract the ggplot?
In terms of data structures, I think retaining a tibble (or data.frame) is suboptimal with respect to the illustrated usage. If you have one plot per manufacturer, and you plan to access them by manufacturer, then I would recommend to transmute and then deframe out to a list object.
That is, I would find it more conceptually clear here to do something like:
library(tidyverse)
plot <- function(df, title){
df %>% ggplot(aes(class)) +
geom_bar() +
labs(title = title)
}
plots <- mpg %>%
group_by(manufacturer) %>% nest() %>%
transmute(plot=map(.x=data, ~plot(.x, manufacturer))) %>%
deframe()
plots[['nissan']]
plots[['nissan']] + labs(title = "Nissan")
Otherwise, if you want to keep the tibble, another option similar to what has been suggested in the comments is to use a first() after the pull.
I have tried to determine the relationship between the variable "RainTomorrow" and others by the code below. But, seems like the way I coded is not giving me the output. How do I determine the relation of RainTomorrow and all other variables?
rattle::weatherAUS # to load the dataset into R
str(weather)
weather$Date <- as.Date(weather$Date)
weather$RainTomorrow <- as.factor(weather$RainTomorrow)
# exploring all the varibales
weather %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
rattle::weatherAUS merely prints the data to console. You need to run weather <- rattle::weatherAUS
After that everything will work fine.
I use facet_grid() to show RainTomorrow in each row and other numeric variables in each column.
library(tidyverse)
library(rattle)
# exploring all the varibales
weather %>%
mutate(RainTomorrow = as.integer(RainTomorrow)) %>%
keep(is.numeric) %>%
mutate(RainTomorrow = weather$RainTomorrow) %>%
pivot_longer(-RainTomorrow, names_to = "name", values_to = "value") %>%
ggplot(aes(value)) +
geom_histogram() +
facet_grid(vars(RainTomorrow), vars(name), scales = "free") +
theme_test()
I would like to plot some relative frequency data using ggplot in a more efficient manner.
I have many variables of interest, and want to plot a separate barchart for each. The following is my current code for one variables of interest Gender:
chart.gender <- data %>%
count(Gender = factor(Gender)) %>%
mutate(Gender = fct_reorder(Gender,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=Gender, y=n, fill=Gender)) +
geom_col()
This works, but the variable Gender is repeated many times. Since I need to repeat plots for many variables of interest (Gender, Age, Location, etc.) with similar code, I would like to simplify this by declaring the variable of interest once at the top and using that declared variable for the rest of the code. Intuitively, something like:
var <- "Gender"
chart.gender <- data %>%
count(var = factor(var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Which does not result in a plot of three-level factor count of gender frequencies, but merely a single column named 'Gender'. I believe I see why it's not working, but I do not know the solution for it: I want R to retrieve the variable name I stored in var, and then use that to retrieve the data for that variable in 'data'.
With some research I've found suggestions like using as.name(var), but there seems to (at the least) be a problem with declaring the variable var as a factor within the count() function.
Some reproducible data:
library(tidyverse)
library(ggplot2)
set.seed(1)
data <- data.frame(sample(c("Male", "Female", "Prefer not to say"),20,replace=TRUE))
colnames(data) <- c("Gender")
I'm using the following packages in R: tidyverse, ggplot2
Use .data pronound to subset the column with var as variable.
library(tidyverse)
var <- "Gender"
data %>%
count(var = factor(.data[[var]])) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
Or another way would be using sym and !!
data %>%
count(var = factor(!!sym(var))) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
If you use as.name() when you set the variable initially, you can use !! ("bang-bang") to unquote the variable for the count() step.
var <- as.name("Gender")
chart.gender <- data %>%
count(var = factor(!! var)) %>%
mutate(var = fct_reorder(var,desc(n))) %>%
mutate(pct = prop.table(n)) %>%
ggplot(aes(x=var, y=n, fill=var)) +
geom_col()
I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')
Hi I usually use some code like the following to reorder bars in ggplot
or other types of plots.
Normal plot (unordered)
library(tidyverse)
iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
select(mSW,Species) %>%
distinct()
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) +
geom_point(stat = "identity")
Ordering the factor + ordered plot
iris.tr$Species <- factor(iris.tr$Species,
levels = iris.tr[order(iris.tr$mSW),]$Species,
ordered = TRUE)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) +
geom_point(stat = "identity")
The factor line is extremely unpleasant to me and I wonder why arrange() or some other function can't simplify this. I am missing something?
Note:
This do not work but I would like to know if something like this exists in the tidyverse.
iris.tr <-iris %>% group_by(Species) %>% mutate(mSW = mean(Sepal.Width)) %>%
select(mSW,Species) %>%
distinct() %>%
arrange(mSW)
ggplot(iris.tr,aes(x = Species,y = mSW, color = Species)) +
geom_point(stat = "identity")
Using ‹forcats›:
iris.tr %>%
mutate(Species = fct_reorder(Species, mSW)) %>%
ggplot() +
aes(Species, mSW, color = Species) +
geom_point()
Reordering the factor using base:
iris.ba = iris
iris.ba$Species = with(iris.ba, reorder(Species, Sepal.Width, mean))
Translating to dplyr:
iris.tr = iris %>% mutate(Species = reorder(Species, Sepal.Width, mean))
After that, you can continue on to summarize and plot as in your question.
A couple comments: reordering a factor is modifying a data column. The dplyr command to modify a data column is mutate. All arrange does is re-order rows, this has no effect on the levels of the factor and hence no effect on the order of a legend or axis in ggplot.
All factors have an order for their levels. The difference between an ordered = TRUE factor and a regular factor is how the contrasts are set up in a model. ordered = TRUE should only be used if your factor levels have a meaningful rank order, like "Low", "Medium", "High", and even then it only matters if you are building a model and don't want the default contrasts comparing everything to a reference level.
If you happen to have a character vector to order, for example:
iris2 <- iris %>%
mutate(Species = as.character(Species)) %>%
group_by(Species) %>%
mutate(mean_sepal_width = mean(Sepal.Width)) %>%
ungroup()
You can also order the factor level using the behavior of the forcats::as_factor function :
"Compared to base R, this function creates levels in the order in which they appear"
library(forcats)
iris2 %>%
# Change the order
arrange(mean_sepal_width) %>%
# Create factor levels in the order in which they appear
mutate(Species = as_factor(Species)) %>%
ggplot() +
aes(Species, Sepal.Width, color = Species) +
geom_point()
Notice how the species names on the x axis are not ordered alphabetically but by increasing value of their mean_sepal_width. Remove the line containing as_factor to see the difference.
In case you'd like to order levels manually: You can do so also with forcats using https://forcats.tidyverse.org/reference/fct_relevel.html