Plotting continuous versus categorical variable in a bar chart using ggplot - r

I am a newbie to base R. I have gone through similar issues here but didn't get it resolved. I am using the code:
ggplot(combined_Attributes, aes(x = factor(CatAge), y = Total_Expenditure,
fill = "#0073C2FF" +
geom_bar(stat = "identity", position = "dodge"))) + geom_text(aes(label = CatAge))
I do not want a text written on the plot but the categories and as reference. Struggling with this.

I don't know, whether you are looking for something like this, but here I used mpg demo data-frame from the tidyverse and calculated the frequency of each model prepared by the manufacturer and plotted as bar plot.
library(tidyverse)
data(mpg)
mpg
#> # A tibble: 234 x 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto(l~ f 18 29 p comp~
#> 2 audi a4 1.8 1999 4 manual~ f 21 29 p comp~
#> 3 audi a4 2 2008 4 manual~ f 20 31 p comp~
#> 4 audi a4 2 2008 4 auto(a~ f 21 30 p comp~
#> 5 audi a4 2.8 1999 6 auto(l~ f 16 26 p comp~
#> 6 audi a4 2.8 1999 6 manual~ f 18 26 p comp~
#> 7 audi a4 3.1 2008 6 auto(a~ f 18 27 p comp~
#> 8 audi a4 quat~ 1.8 1999 4 manual~ 4 18 26 p comp~
#> 9 audi a4 quat~ 1.8 1999 4 auto(l~ 4 16 25 p comp~
#> 10 audi a4 quat~ 2 2008 4 manual~ 4 20 28 p comp~
#> # ... with 224 more rows
ggplot(mpg, aes(x = manufacturer, y = frequency(model),
fill = model)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45))
Created on 2021-09-03 by the reprex package (v2.0.1)

Related

I want to use a previously created function in the mutate() function. Yet R doesn't seem to want to let me [duplicate]

This question already has answers here:
adding a column to df that counts occurrence of a value in another column
(2 answers)
Closed 3 months ago.
I am looking at population data and want to make sure I have enough observations do to county level analysis. Therefore I would like to generate a variable that assigns each observation the number of observations with the same value for the "county" row.
I want to assign each row in my data frame ("cps") a new variable ("freq") which represents the frequency of its specific value in one specific variable ("county").
I used
f <- function(x)sum(with(cps, county==x))
to generate a function that tells me how often a given county x appears in the data.
Now I want to use
cps <- mutate(cps, freq=f(county))
to assign each row the number of times its county value appears in the data frame.
However, it assigns each row with the overall number of observations.
You can get what you want using dplyr::add_count():
library(dplyr)
mpg %>% add_count(cyl, name = "freq")
# A tibble: 234 × 12
manufacturer model displ year cyl trans drv cty hwy fl class freq
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> <int>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 81
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 81
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact 81
4 audi a4 2 2008 4 auto(av) f 21 30 p compact 81
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 79
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 79
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact 79
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 81
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 81
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact 81
# … with 224 more rows
But if you wanted to use your function, you'd need to wrap in sapply() (or purrr:map_int()) to compare each element of x against every element:
f <- function(x) sapply(x, \(x) sum(with(mpg, cyl == x)))
You can also generalize it to work with any column:
f2 <- function(x) sapply(x, \(x_i) sum(x == x_i))
mutate(mpg, freq=f2(drv))
# A tibble: 234 × 12
manufacturer model displ year cyl trans drv cty hwy fl class freq
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> <int>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 106
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 106
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact 106
4 audi a4 2 2008 4 auto(av) f 21 30 p compact 106
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 106
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 106
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact 106
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 103
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 103
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact 103
# … with 224 more rows

Find a specific string with grepl across all columns in R dplyr

In a huge data.frame I am trying to search all columns for a string using dplyr in R
I am unsure where I am doing wrong, but here is an example of what I am trying.
Let's say that I am trying in mpg to find audi, and audi exists in multiple columns, and I want to extract only the rows that contain audi.
This would not work
ANy ideas
library(tidyverse)
head(mpg)
#> # A tibble: 6 × 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
#> 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
#> 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
#> 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
#> 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
#> 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
mpg |>
filter(if_all(.cols = everything(), ~grepl("audi",.)))
#> # A tibble: 0 × 11
#> # … with 11 variables: manufacturer <chr>, model <chr>, displ <dbl>,
#> # year <int>, cyl <int>, trans <chr>, drv <chr>, cty <int>, hwy <int>,
#> # fl <chr>, class <chr>
Created on 2022-09-09 with reprex v2.0.2
Here is a base R option:
library(ggplot2) # Load for mpg dataset
mpg[Reduce(`|`, lapply(mpg, grepl, pattern = "audi")),]
#> # A tibble: 18 × 11
#> manufacturer model displ year cyl trans drv cty hwy fl class
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
#> 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
#> 3 audi a4 2 2008 4 manu… f 20 31 p comp…
#> 4 audi a4 2 2008 4 auto… f 21 30 p comp…
#> 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
#> 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
#> 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
#> 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
#> 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
#> 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
#> 11 audi a4 quattro 2 2008 4 auto… 4 19 27 p comp…
#> 12 audi a4 quattro 2.8 1999 6 auto… 4 15 25 p comp…
#> 13 audi a4 quattro 2.8 1999 6 manu… 4 17 25 p comp…
#> 14 audi a4 quattro 3.1 2008 6 auto… 4 17 25 p comp…
#> 15 audi a4 quattro 3.1 2008 6 manu… 4 15 25 p comp…
#> 16 audi a6 quattro 2.8 1999 6 auto… 4 15 24 p mids…
#> 17 audi a6 quattro 3.1 2008 6 auto… 4 17 25 p mids…
#> 18 audi a6 quattro 4.2 2008 8 auto… 4 16 23 p mids…
Created on 2022-09-09 with reprex v2.0.2
Use if_any to match a row if any of the column (i.e. at least one among all) matches the pattern. With if_all, every column would have to match the pattern.
mpg |>
filter(if_any(.cols = everything(), ~ grepl("audi", .)))

Dynamically selecting multiple columns for group_by

Data masking for group_by does not work when there is more than one grouping variable.
Pasting code below
grpByCols <- "model"
mpg%>%
group_by(.data[[grpByCols]])
grpByCols <- c("model", "manufacturer")
mpg%>%
group_by(.data[[grpByCols]])
The first group_by works, the second one fails.
Pasting the run output below
> grpByCols <- "model"
>
> mpg%>%
+ group_by(.data[[grpByCols]])
# A tibble: 234 x 11
# Groups: model [38]
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
# … with 224 more rows
>
> grpByCols <- c("model", "manufacturer")
>
> mpg%>%
+ group_by(.data[[grpByCols]])
Error: Problem with `mutate()` input `..1`.
x Must subset the data pronoun with a string.
ℹ Input `..1` is `<unknown>`.
Run `rlang::last_error()` to see where the error occurred.
>
Please let me know if you have any ideas to make this work
A simple way is to use the across() function from dplyr.
mpg %>% group_by(across(all_of(grpByCols)))
# A tibble: 234 × 11
# Groups: model, manufacturer [38]
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
We could unquote the symbol with !!
grpByCols <- "model"
mpg%>%
group_by(!!sym(grpByCols))
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
# ... with 224 more rows
You can use the following solution. You should use rlang::syms which takes strings as input and turn them into symbols and since the output is a list of length 2 (corresponding to the length of input), we use big bang operator !!! to splice the elements of the list, meaning that they each become one single argument:
library(rlang)
grpByCols <- c("model", "manufacturer")
mpg %>%
group_by(!!!syms(grpByCols))
# A tibble: 234 x 11
# Groups: model, manufacturer [38]
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
# ... with 224 more rows
Using cur_data()
library(dplyr)
mpg %>%
group_by(cur_data()[grpByCols])
-output
# A tibble: 234 x 11
# Groups: model, manufacturer [38]
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
# … with 224 more rows

Boxplot by group and then column in r

How do I make a boxplot such that each group of boxes in the boxplot contains columns of variables from a dataframe.
For example using the mpg dataset:
head(mpg)
# A tibble: 234 x 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
# ... with 224 more rows
So within each cyl group (4,5,6,8), I want to have boxplots for each variable/column cty,hwy, and displ.
Usually, one will set the fill in ggplot to be a factor variable but in this case, I have 3 variables.
It should look something like this:
You need to tranform your data to long format on your three variables. Here an example with data.table and melt function, but you will easily find the same with tydr:
library(ggplot2)
library(data.table)
mpg <- setDT(copy(mpg))
mpg_plot <- melt(mpg,measure.vars = c("cty","hwy","displ"),value.name = "val",variable.name = "var")
ggplot(mpg_plot, aes(x = as.factor(cyl),y = val,fill = var))+
geom_boxplot()+
theme_light()

Unable to select

I want to select variables which are character and integer type using dplyr's select_if function. But the code below throws an error.
mpg %>% select_if(is.character | is.integer)
How do I solve this?
mpg %>% select_if(is.character) alone works well, how do I apply multiple conditions?
We could use the ~ as well
library(dplyr)
mpg %>%
select_if(~ is.character(.x)|is.integer(.x))
Or with inherits
mpg %>%
select_if(~ inherits(.x, c("character", "integer")))
One way would be to use an anonymous function
library(dplyr)
mpg %>% select_if(function(x) is.character(x) | is.integer(x))
# manufacturer model year cyl trans drv cty hwy fl class
# <chr> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
# 1 audi a4 1999 4 auto(l5) f 18 29 p compact
# 2 audi a4 1999 4 manual(m5) f 21 29 p compact
# 3 audi a4 2008 4 manual(m6) f 20 31 p compact
# 4 audi a4 2008 4 auto(av) f 21 30 p compact
# 5 audi a4 1999 6 auto(l5) f 16 26 p compact
# 6 audi a4 1999 6 manual(m5) f 18 26 p compact
# 7 audi a4 2008 6 auto(av) f 18 27 p compact
# 8 audi a4 quattro 1999 4 manual(m5) 4 18 26 p compact
# 9 audi a4 quattro 1999 4 auto(l5) 4 16 25 p compact
#10 audi a4 quattro 2008 4 manual(m6) 4 20 28 p compact
# … with 224 more rows
OR using funs
mpg %>% select_if(funs(is.character(.) | is.integer(.)))

Resources