Plotting multiple columns against one column in ggplot2 (additional question) - r

This is a question referring to Plotting multiple columns against one column in ggplot2. Unfortunately I can't comment there so I need to ask it as a new question. Thanks to rnorouzian for asking the question and neilfws for answering.
I'm trying to layer all facets on one scatter plot (with geom_point) with a legend stating the name of the column.
The answer by neilfws contains this code, output was shown in the question:
library(tidyverse)
data <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/vp_cond.csv')
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value)) +
geom_line() +
facet_wrap(~name) +
theme_bw()
I tried this and got a good output but without a legend. Where can I specify "name" being used as info in the legend, also using different colors?
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value)) +
geom_point() +
theme_bw()

Simply map name to the colour aesthetic. To do this, add colour = name inside aes:
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value, colour = name)) +
geom_line() +
theme_bw()

Related

Change factor label ggplot when scale_x_reordered() is present

I am creating a box plot in which I have used scale_x_reordered() after adjusting the order of factors on the x axis.
I am now trying to change the label of one factor. I had previously been doing this using:
scale_x_discrete(labels=c("old_label" = "new_label"))
However, I cannot use both scale_x_discrete() and scale_x_reordered() in the same plot. Does anyone know of a fix so that I can change a label and keep scale_x_reordered?
My ggplot is based off of this very helpful example: linked here
The change I am trying to make is equivalent to manually changing the name "Michael" to "Mike".
To achieve your desired result I would suggest to recode your factor before applying reorder_within.
The reason is that reorder_within transforms the factor levels to make the reordering within facets work. Inside scale_x_reordered a re-transformation is applied via the labels argument to show the original levels or labels. That's why you can't make use of the labels argument.
In the following example taken from the link you posted I make use of dplyr::recode(name, "Michael" = "Mike") just before reorder_within:
library(tidyverse)
library(babynames)
library(tidytext)
top_names <- babynames %>%
filter(year >= 1950,
year < 1990) %>%
mutate(decade = (year %/% 10) * 10) %>%
group_by(decade) %>%
count(name, wt = n, sort = TRUE) %>%
ungroup
top_names %>%
group_by(decade) %>%
top_n(15) %>%
ungroup %>%
mutate(decade = as.factor(decade),
name = recode(name, "Michael" = "Mike"),
name = reorder_within(name, n, decade)) %>%
ggplot(aes(name, n, fill = decade)) +
geom_col(show.legend = FALSE) +
facet_wrap(~decade, scales = "free_y") +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0)) +
labs(y = "Number of babies per decade",
x = NULL,
title = "What were the most common baby names in each decade?",
subtitle = "Via US Social Security Administration")
#> Selecting by n

Overlaying two quick plot all variables from several data frames

I have uploaded a datafame and done a quick plot of all variables using:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Reference: https://drsimonj.svbtle.com/quick-plot-of-all-variables
I have split this data frame into two data frames based on a binary variable (in my case, Smoker/Non-smoker) in one of the columns. I would like to perform the same quick plot of all variables but have overlayed, different coloured histograms for each of the new data frames (to see if they differ significantly).
I found the following:
Overlaying two ggplot facet_wrap histograms
But it only does the facet_wrap over a single variable. Is there a way to do this by filtering the gathered data frame by the binary value something like:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(subset(df,Smoker==1), fill = "Red", alpha=0.3) +
geom_histogram(subset(df,Smoker==2), fill = "Blue", alpha=0.3)
Idea would be to overlay the following:
df_s %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Red", alpha=0.3)
df_ns %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Blue", alpha=0.3)
I could do this will a loop but would like to do it with the df key-value pairs if possible.
df %>%
keep(is.numeric) %>% # you may need to remove this as smoker will need to be factor for grouping to work
tidyr::gather(key,value, -Smoker) %>% #- preserve smoker and use to colour
ggplot(aes(value, fill = Smoker)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(alpha = 0.30) +
scale_fill_manual(values = c("red","blue"))

displaying distribution of categorical variable using ggplot and r

Simplifying my question in terms of generic titanic dataset:
how can i get a following plot for all the attributes in my dataset
If possible, i would also want to get the count or percentage for each category.
Thank you for your help in advance.
Regards, Trupti
With the Titanic data set this can be accomplished using
library(tidyverse)
data("Titanic")
Titanic %>%
as.data.frame() %>% # transform from a table to dataframe
gather(variable, value, -Freq) %>% # change to long format
group_by(variable, value) %>%
summarise(Freq = sum(Freq)) %>% # get the freq for each level of each variable
ggplot(aes(variable, Freq, fill = value)) +
geom_col(position = position_stack()) +
geom_text(aes(label = paste0(value, " (", Freq, ")")), vjust = 1,
position = position_stack()) +
theme(legend.position = "none")

How to get legend labels that differ from fill aesthetic in R ggplot2

In ggplot2, how do I get the legend labels for the fill aesthetic to differ from the variable actually used as the fill aesthetic? What I'd like to see from the following plot is the legend labels reflecting the name variable. I know I could just use the name itself as the fill aesthetic; however, in the following example, it's more convenient to set up the colour vector storm_cols (used for ggplot2::scale_fill_manual) using the id column as the vector names rather than typing out each name.
library(dplyr)
library(ggplot2)
dat <-
storms %>%
filter(year >= 2015) %>%
group_by(name, year) %>%
summarize(avg_wind = mean(wind)) %>%
ungroup() %>%
mutate(id = as.character(row_number())) %>%
slice(1:4)
storm_cols <- c("1" = "red", "2" = "blue", "3" = "green", "4" = "yellow")
dat %>%
ggplot(aes(id, avg_wind, fill = id)) +
geom_col() +
scale_fill_manual(values = storm_cols)
You don't need to explicitly type out the names for the color vector. Instead, you can create it programmatically, making it easier to create the desired color assignments and use name directly as the fill aesthetic. For example, in this case you can use the set_names function from the purrr package (or the base R setNames function).
library(tidyverse)
dat %>%
ggplot(aes(id, avg_wind, fill = name)) +
geom_col() +
scale_fill_manual(values = c("red","blue","green","yellow") %>% set_names(dat$name))
With your original example, you could change the legend labels with the labels argument to scale_fill_manual:
dat %>%
ggplot(aes(id, avg_wind, fill = id)) +
geom_col() +
scale_fill_manual(values = storm_cols,
labels = dat$name)

using facets on every column with color grouping

I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')

Resources