Extract top_n variables name for each group

Extract top_n variables name for each group - r

For each group, I'm trying to get the top_n car_names to appear in a new column comma separated.
For example, when you run the code below you'll see the top 2 mpg cars per group (cyl). Next, I want to extract the top two cars (or more, if there is a tie) and store them together into a new column called car_summary.
mtcars2 %>%
select(mpg, cyl, car_name) %>%
group_by(cyl) %>%
mutate(Score = rank(mpg, ties.method = "max")) %>%
arrange(desc(Score)) %>% top_n(2,Score)
The expected output looks like below
cyl <- c(8,4,6)
car_summary <- c("Pontiac Firebird, Hornet Sportabout", "Toyota Corolla,
Fiat 128", "Hornet 4 Drive, Mazda RX4, Mazda RX4 Wag")
data.frame(cyl, car_summary)
cyl car_summary
1 8 Pontiac Firebird, Hornet Sportabout
2 4 Toyota Corolla, Fiat 128
3 6 Hornet 4 Drive, Mazda RX4, Mazda RX4 Wag

You need toString from base R -
mtcars2 %>%
select(mpg, cyl, car_name) %>%
group_by(cyl) %>%
mutate(Score = rank(mpg, ties.method = "max")) %>%
arrange(desc(Score)) %>%
top_n(2,Score) %>%
summarize(car_summary = toString(car_name))

Related

Dplyr: Conditionally rename multiple variables with regex by name

I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g
df %>% rename_with(~ newnames, all_of(oldnames))
MWE:
df <- mtcars[, 1:5]
# works without regex
replace_df_1 <- tibble::tibble(
old = df %>% colnames(),
new = df %>% colnames() %>% toupper()
)
df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))
# with regex
replace_df_2 <- tibble::tibble(
old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
new = df %>% colnames() %>% toupper()
)
old new
<chr> <chr>
1 ^m MPG
2 cyl101|cyl CYL
3 disp DISP
4 hp HP
5 drat DRAT
# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))
EDIT 1:
The solution of #Mael works in general, but there seems to be index issue, e.g. consider the following example
replace_df_2 <- tibble::tibble(
old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
new = mtcars[,1:5] %>% colnames() %>% toupper()
)
mtcars[, 1:5] %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
Results in
mpg MPG disp hp drat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9
meaning that the rename_with function correctly finds the column, but replaces it with the first item in the replacement column. How can we tell the function to take the respective row where a replacement has been found?
So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL).
Thank you for any hints!

matches should be on the regex-y column:
df %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
MPG CYL DISP HP DRAT
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
#...

If the task is simply to set all col names to upper-case, then this works:
sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG" "CYL" "DISP" "HP" "DRAT"
In dplyr:
df %>%
rename_with( ~sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE))

I found a solution using the idea of non standard evaluation from this question and #Maël's answer.
Using map_lgl we create a logical vector that returns TRUE if the column in replace_df_2$old can be found inside the dataframe df. Then we pass this logical vector to replace_df_2$new to get the correct replacement.
df <- mtcars[, 1:5]
df %>%
rename_with(.fn = ~replace_df_2$new[map_lgl(replace_df_2$old,~ any(str_detect(., names(df))))],
.cols = matches(replace_df_2$old))
Result:
mpg CYL disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90

How to rename a variable with spaces in the name dynamically in dplyr?

I want to rename a variable in my dataframe using dplyr to have spaces but this variable name is a concatenation of a dynamic variable and a static string. In the following example, I'd need "Test1" to be a dynamic variable
df <- mtcars %>% select(`Test1 mpg` = "mpg")
So when I try this, I end up with an error:
var <- "Test1"
df <- mtcars %>% select(paste0(var, " mpg") = "mpg")
How could I go about making those new variable names dynamic?

Using the special assignment operator := you could do:
library(dplyr)
df <- mtcars %>% select(`Test1 mpg` = "mpg")
var <- "Test1"
mtcars %>%
select("{var} mpg" := "mpg")
#> Test1 mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4
or using !!sym():
mtcars %>%
select(!!sym(paste(var, " mpg")) := "mpg")
#> Test1 mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4

Adding different numbers to different columns of a dataframe

Suppose I am working with the mtcars dataset, and I would like to add:
1 to all values in the column: mpg
2 to all values in the column: cyl
3 to all values in the column: disp
I would like to keep all columns in mtcars, and refer to the columns by their names rather than their index.
Here's my current attempt:
library("tidyverse")
library("rlang")
data(mtcars)
mtcars_colnames <- quo(c("mpg", "cyl", "disp"))
num <- c(1, 2, 3)
mtcars %>% mutate(across(!!! mtcars_colnames, function(x) {x + num[col(.)]}))
I'm stuck on how to dynamically add (1,2,3) to columns (mpg, cyl, disp).
Thanks in advance.

We could change the input by passing just a vector of strings instead of quosures and a named vector for 'num', then use the cur_column inside the across to match with the named vector of 'num', get the corresponding value and do the addition
library(dplyr)
mtcars_colnames <- c("mpg", "cyl", "disp")
num <- setNames(c(1, 2, 3), mtcars_colnames)
mtcars1 <- mtcars %>%
mutate(across(all_of(mtcars_colnames), ~ num[cur_column()] + .))
-check the output
# // old data
mtcars %>%
select(all_of(mtcars_colnames)) %>%
slice_head(n = 5)
# mpg cyl disp
#Mazda RX4 21.0 6 160
#Mazda RX4 Wag 21.0 6 160
#Datsun 710 22.8 4 108
#Hornet 4 Drive 21.4 6 258
#Hornet Sportabout 18.7 8 360
# // new data
mtcars1 %>%
select(all_of(mtcars_colnames)) %>%
slice_head(n = 5)
# mpg cyl disp
#Mazda RX4 22.0 8 163
#Mazda RX4 Wag 22.0 8 163
#Datsun 710 23.8 6 111
#Hornet 4 Drive 22.4 8 261
#Hornet Sportabout 19.7 10 363
Or if we prefer to pass a unnamed 'num' vector, then match the cur_column with the 'mtcars_colnamesinside theacross` to return the index and then use that to subset the 'num'
mtcars1 <- mtcars %>%
mutate(across(all_of(mtcars_colnames),
~ num[match(cur_column(), mtcars_colnames)] + .))

Here are 3 base R approaches :
mtcars_colnames <- c("mpg", "cyl", "disp")
num <- c(1, 2, 3)
df <- mtcars
#option 1
df[mtcars_colnames] <- sweep(df[mtcars_colnames], 2, num, `+`)
#option 2
df[mtcars_colnames] <- Map(`+`, df[mtcars_colnames], num)
#option 3
df[mtcars_colnames] <- t(t(df[mtcars_colnames]) + num)

Using the dot operator in dplyr::bind_cols

I'm seeing some unexpected behavior with dplyr. I have a specific use case but I will setup a dummy problem to illustrate my point. Why does this work,
library(dplyr)
temp <- bind_cols(mtcars %>% select(-mpg), mtcars %>% select(mpg))
head(temp)
cyl disp hp drat wt qsec vs am gear carb mpg
6 160.0 110 3.90 2.620 16.46 0 1 4 4 21.0
6 160.0 110 3.90 2.875 17.02 0 1 4 4 21.0
But not this,
library(dplyr)
temp <- mtcars %>% bind_cols(. %>% select(-mpg), . %>% select(mpg))
Error in cbind_all(x) : Argument 2 must be length 1, not 32
Thanks for the help.

You need to wrap your function with {} to pipe mtcars into a function within another function like the following:
library(dplyr)
temp1 = mtcars %>% {bind_cols(select(., -mpg), select(., mpg))}
temp2 = bind_cols(mtcars %>% select(-mpg), mtcars %>% select(mpg))
# > identical(temp1, temp2)
# [1] TRUE

Another solution:
myfun <- function(x) {
bind_cols(x %>% select(-mpg), x %>% select(mpg))
}
temp <- mtcars %>% myfun

select and rename stored in variable

I have several similar data frames with many columns in common. I would like to select and rename a subset of those columns from any table.
library(tidyverse)
mtcars %>%
select(my_mpg = mpg,
cylinders = cyl,
gear)
Is it possible to do something like
my_select_rename <- c("my_mpg"="mpg","cylinders"="cyl","gear")
mtcars %>%
select_(.dots = my_select_rename)
but using the tidyeval framework instead?

I think you want:
my_select <- c("mpg","cyl","gear")
my_select_rename <- c("my_mpg","cylinders","gear")
mtcars %>%
select_at(vars(my_select)) %>%
setNames(., my_select_rename)
my_mpg cylinders gear
Mazda RX4 21.0 6 4
Mazda RX4 Wag 21.0 6 4
Datsun 710 22.8 4 4
Hornet 4 Drive 21.4 6 3
Hornet Sportabout 18.7 8 3

lionel's answer to this question group_by by a vector of characters using tidy evaluation semantics provides the answer
mtcars %>%
select(!!! rlang::syms(my_select_rename))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract top_n variables name for each group - r

You need toString from base R - mtcars2 %>% select(mpg, cyl, car_name) %>% group_by(cyl) %>% mutate(Score = rank(mpg, ties.method = "max")) %>% arrange(desc(Score)) %>% top_n(2,Score) %>% summarize(car_summary = toString(car_name))

Related

Dplyr: Conditionally rename multiple variables with regex by name

How to rename a variable with spaces in the name dynamically in dplyr?

Adding different numbers to different columns of a dataframe

Using the dot operator in dplyr::bind_cols

select and rename stored in variable

Categories

Resources