How to add legends in this context? - r

songs %>% group_by(year) %>% summarise(count=nth(pop,1))%>%
ggplot(aes(x=factor(year),y=count,fill=year))+geom_bar(stat ='identity' )+theme_classic()
1.How can I adjust my legends to show years(2010:2019) rather than what it is showing right now?
2.Scale_size_manual is not working.

You need to set year as a factor each time (or externally), not just once. I don't have your data, so I'll use mtcars.
library(ggplot2)
library(dplyr)
# first plot
mtcars %>%
ggplot(aes(factor(carb), disp, fill=carb)) +
geom_bar(stat="identity")
# second plot
mutate(mtcars, carb = factor(carb)) %>%
ggplot(aes(carb, disp, fill=carb)) +
geom_bar(stat="identity")
# alternate code for second plot, not shown
mtcars %>%
ggplot(aes(factor(carb), disp, fill=factor(carb))) +
# both ^^^^^^ and ^^^^^^
geom_bar(stat="identity")
(There are numerous ways to convert to a factor. I'm using dplyr here, but it can easily be done in base or data.table.)
I included the "alternate" code above that shows the manual factor being applied to each use of carb; this is not the preferred method in my mind, since if you're doing it multiple times, just do it once before the plotting and use it multiple times. If you need both the ordinal year and the numeric version, you can add a new field, such as ordinal_year=factor(year).

Related

Excluding levels/groups within categorical variable (ggplot graph)

I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?
I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.
Data type
sapply(df,class)
username active
"character" "character"
ggplot(data = df, aes(x = active)) +
geom_bar()
You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:
df %>% count(active) %>% filter(n>2) %>%
ggplot(aes(x=active,y=n)) +
geom_col()
Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:
ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) +
geom_bar()

purrr::pmap() output incompatible with what ggplot::aes() expects

Problem: purrr::pmap() output incompatible with ggplot::aes()
The following reprex boils down to a single question, is there anyway we can use the quoted variable names inside ggplot2::aes() instead of the plain text names? Example: we typically use ggplot(mpg, aes(displ, cyl)) , how to make aes() work normally with ggplot(mpg, aes("displ", "cyl")) ?
If you understood my question, the remainder of this reprex really adds no information. However, I added it to draw the full picture of the problem.
More details: I want to use purrr functions to create a bunch of routinely exploratory data analysis plots effortlessly. The problem is, purrr::pmap() results the string-quoted name of the variables, which ggplot::aes() doesn't understand. As far as I'm concerned, the functions cat() and as.name() can take the string-quoted variable name and return it in the very typical way that aes() understands; unquoted. However, neither of them worked. The following reprex reproduces the problem. I commented the code to spare you the pain of figuring out what the code does.
library(tidyverse)
# Divide the classes of variables into numeric and non-numeric. Goal: place a combination of numeric variables on the axes wwhile encoding a non-numeric variable.
mpg_numeric <- map_lgl(.x = seq_along(mpg), .f = ~ mpg[[.x]] %>% class() %in% c("numeric","integer"))
mpg_factor <- map_lgl(.x = seq_along(mpg), .f = ~ mpg[[.x]] %>% class() %in% c("factor","character"))
# create all possible combinations of the variables
eda_routine_combinations <- expand_grid(num_1 = mpg[mpg_numeric] %>% names(),
num_2 = mpg[mpg_numeric] %>% names(),
fct = mpg[mpg_factor] %>% names()) %>%
filter(num_1 != num_2) %>% slice_head(n = 2) # for simplicity, keep only the first 2 combinations
# use purrr::pmap() to create all the plots we want in a single call
pmap(.l = list(eda_routine_combinations$num_1,
eda_routine_combinations$num_2,
eda_routine_combinations$fct) ,
.f = ~ mpg %>%
ggplot(aes(..1 , ..2, col = ..3)) +
geom_point() )
Next we pinpoint the problem using a typical ggplot2 call.
this is what we want purrr::pmap() to create in its iterations:
mpg %>%
ggplot(aes(displ , cyl, fill = drv)) +
geom_boxplot()
However, this is purrr::pmap() renders; quoted variable names:
mpg %>%
ggplot(aes("displ" , "cyl", fill = "drv")) +
geom_boxplot()
Failing attempts
Using cat() to transform the quoted variable names from pmap() into unquoted form for aes() to understand fails.
mpg %>%
ggplot(aes(cat("displ") , cat("cyl"), fill = cat("drv"))) +
geom_boxplot()
Using as.name() to transform the quoted variable names from pmap() into unquoted form for aes() to understand fails.
mpg %>%
ggplot(aes(as.name("displ") , as.name("cyl"), fill = as.name("drv"))) +
geom_boxplot()
Bottom line
Is there a way to make ggplot(aes("quoted_var_name")) work properly?

Scatterplot using ggplot

I need to create a scatterplot of count vs. depth of 12 species using ggplot.
This is what I have so far:
library(ggplot2)
ggplot(data = ReefFish, mapping = aes(count, depth))
However, how do I use geom_point(), geom_smooth(), and facet_wrap() to include a smoother as well as include just the 12 species I want from the data (ReefFish)? Since I believe what I have right now includes all species from the data.
Here is an example of part of my data:
Since I don't have access to the ReefFish data set, here's an example using the built-in mpg data set about cars. To make it work with your data set, just edit this code to replace manufacturers with species.
Filter the data
First we filter the data so that it only includes the species/manufacturers we're interested in.
# load our packages
library(ggplot2)
library(magrittr)
library(dplyr)
# set up a character vector of the manufacturers we're interested in
manufacturers <- c("audi", "nissan", "toyota")
# filter our data set to only include the manufacturers we care about
mpg_filtered <- mpg %>%
filter(manufacturer %in% manufacturers)
Plot the data
Now we plot. Your code was just about there! You just needed to add the plot elements, you wanted, like so:
mpg_filtered %>%
ggplot(mapping = aes(x = cty,
y = hwy)) +
geom_point() +
geom_smooth() +
facet_wrap(~manufacturer)
Hope that helps, and let me know if you have any issues.

Multiple line subplots in R

I am new to R and am struggling to understand how to create a matrix line plot (or plot with line subplots) given a data set with let's say one x and 5 y-columns such that:
-the first subplot is a plot of variables 1 and 2 (function of x)
-the second subplot variables 1 and 3 and so on
The idea is to use one of the variables (in this example number 1) as a reference and pair it with the rest so that they can be easily compared.
Thank you very much for your help.
Here's an example of one way to do that using tidyr and ggplot. tidyr::gather can pull the non-mpg columns into long format, each matched with its respective mpg. Then the data is mapped in ggplot so that x is mpg and y is the other value, and the name of the column it came from is mapped to facets.
library(tidyverse)
mtcars %>%
select(rowname, mpg, cyl, disp, hp) %>%
gather(stat, value, cyl:hp) %>%
ggplot(aes(mpg, value)) +
geom_point() +
facet_grid(stat~., scales = "free")

Reorder factored count data in ggplot2 geom_bar

I find countless examples of reordering X by the corresponding size of Y if the Dataframe for ggplot2 (geom_bar) is read using stat="identity".
I have yet to find an example of stat="count". The reorder function fails as I have no corresponding y.
I have a factored DF of one column, "count" (see below for a poor example), where there are multiple instances of the data as you would expect. However, I expected factored data to be displayed:
ggplot(df, aes(x=df$count)) + geom_bar()
by the order defined from the quantity of each factor, as it is different for unfactored (character) data i.e., will display alphabetically.
Any idea how to reorder?
This is my current awful effort, sadly I figured this out last night, then lost my R command history:
If you start off your project with loading the tidyverse, I suggest you use the built-in tidyverse function: fct_infreq()
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar()
As your categories are words, consider adding coord_flip() so that your bars run horizontally.
ggplot(df, aes(x=fct_infreq(df$count))) + geom_bar() + coord_flip()
This is what it looks like with some fish species counts: A horzontal bar chart with species on the y axis (but really the flipped x-axis) and counts on horizontal axis (but actually the flipped y-axis). The counts are sorted from least to greatest.
Converting the counts to a factor and then modifying that factor might help accomplish what you need. In the below I'm reversing the order of the counts using fct_rev from the forcats package (part of tidyverse)
library(tidyverse)
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_rev) %>%
ggplot(aes(n)) + geom_bar()
Alternatively, if you'd like the bars to be arranged large to small, you can use fct_infreq.
iris %>%
count(Sepal.Length) %>%
mutate(n=n %>% as.factor %>% fct_infreq) %>%
ggplot(aes(n)) + geom_bar()

Resources