I am working on a project where I created a function to edit column names of a given df:
fix_names <- function(a, b, c) {
if (is.data.frame(a) == TRUE & is.character(b) == TRUE & is.character(c) == TRUE) {
str_replace_all(colnames(a), pattern = b, replacement = c)
} else {
return("invalid inputs")
}
}
And then I have a column, data, that contains four data frames. I am trying to rename the columns of all the data frames in data using my function above inside of a map function. It's successful in fixing the names, but I cannot figure out how to apply it to the df since the output is a list and the data frames are nested. Here's what I have:
map(.x = df$data, ~fix_names(., "OldName", "NewName"))
Thank you!
Edit: adding example df using mtcars
data(mtcars)
mtcars %>%
group_by(cyl) %>%
nest() -> nestMtcars
map(.x = nestMtcars$data, ~fix_names(., "mpg", "MPG"))
You could transpose the nested list to run the map function, and transpose it back to its original form :
library(stringr)
library(purrr)
fix_names <- function(a, b, c) {
if (is.data.frame(a) == TRUE & is.character(b) == TRUE & is.character(c) == TRUE) {
colnames(a) <- str_replace_all(colnames(a), pattern = b, replacement = c)
a
} else {
return("invalid inputs")
}
}
nestMtcars %>% transpose %>%
map(~{.x$data <- fix_names(.x$data,"mpg","MPG"); .x}) %>%
transpose
$cyl
$cyl[[1]]
[1] 6
$cyl[[2]]
[1] 4
$cyl[[3]]
[1] 8
$data
$data[[1]]
# A tibble: 7 x 10
MPG disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 160 110 3.9 2.62 16.5 0 1 4 4
2 21 160 110 3.9 2.88 17.0 0 1 4 4
3 21.4 258 110 3.08 3.22 19.4 1 0 3 1
4 18.1 225 105 2.76 3.46 20.2 1 0 3 1
5 19.2 168. 123 3.92 3.44 18.3 1 0 4 4
6 17.8 168. 123 3.92 3.44 18.9 1 0 4 4
7 19.7 145 175 3.62 2.77 15.5 0 1 5 6
$data[[2]]
# A tibble: 11 x 10
MPG disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22.8 108 93 3.85 2.32 18.6 1 1 4 1
2 24.4 147. 62 3.69 3.19 20 1 0 4 2
3 22.8 141. 95 3.92 3.15 22.9 1 0 4 2
4 32.4 78.7 66 4.08 2.2 19.5 1 1 4 1
5 30.4 75.7 52 4.93 1.62 18.5 1 1 4 2
6 33.9 71.1 65 4.22 1.84 19.9 1 1 4 1
7 21.5 120. 97 3.7 2.46 20.0 1 0 3 1
8 27.3 79 66 4.08 1.94 18.9 1 1 4 1
9 26 120. 91 4.43 2.14 16.7 0 1 5 2
10 30.4 95.1 113 3.77 1.51 16.9 1 1 5 2
11 21.4 121 109 4.11 2.78 18.6 1 1 4 2
$data[[3]]
# A tibble: 14 x 10
MPG disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 18.7 360 175 3.15 3.44 17.0 0 0 3 2
2 14.3 360 245 3.21 3.57 15.8 0 0 3 4
3 16.4 276. 180 3.07 4.07 17.4 0 0 3 3
4 17.3 276. 180 3.07 3.73 17.6 0 0 3 3
5 15.2 276. 180 3.07 3.78 18 0 0 3 3
6 10.4 472 205 2.93 5.25 18.0 0 0 3 4
7 10.4 460 215 3 5.42 17.8 0 0 3 4
8 14.7 440 230 3.23 5.34 17.4 0 0 3 4
9 15.5 318 150 2.76 3.52 16.9 0 0 3 2
10 15.2 304 150 3.15 3.44 17.3 0 0 3 2
11 13.3 350 245 3.73 3.84 15.4 0 0 3 4
12 19.2 400 175 3.08 3.84 17.0 0 0 3 2
13 15.8 351 264 4.22 3.17 14.5 0 1 5 4
14 15 301 335 3.54 3.57 14.6 0 1 5 8
Related
Thanks so much for any help. I have read tons of answers but I can't seem to figure this out for my specific case. I'm trying to use mutate() with another function to create a new row with the means of each column, should that column contain numeric variables. So far, I've only been able to add a column, which is not what I want. I tried the following:
x <- y %>%
mutate(Total = colMeans(select_if(., is.numeric), na.rm = TRUE)) %>%
head
This only added a column with the means, instead of a row.
How can I add a row called "Means" with the mean of each column? Thank you so much.
One way is to summarize and then bind_rows with the original data. I'll use mtcars with the rownames augmented.
mt <- rownames_to_column(mtcars)
mt %>%
group_by(cyl) %>%
summarize(across(-rowname, mean)) %>%
mutate(rowname = "Means") %>%
bind_rows(mt) %>%
arrange(cyl, rowname != "Means") %>%
print(n=99)
# # A tibble: 35 x 12
# cyl mpg disp hp drat wt qsec vs am gear carb rowname
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
# 1 4 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55 Means
# 2 4 22.8 108 93 3.85 2.32 18.6 1 1 4 1 Datsun 710
# 3 4 24.4 147. 62 3.69 3.19 20 1 0 4 2 Merc 240D
# 4 4 22.8 141. 95 3.92 3.15 22.9 1 0 4 2 Merc 230
# 5 4 32.4 78.7 66 4.08 2.2 19.5 1 1 4 1 Fiat 128
# 6 4 30.4 75.7 52 4.93 1.62 18.5 1 1 4 2 Honda Civic
# 7 4 33.9 71.1 65 4.22 1.84 19.9 1 1 4 1 Toyota Corolla
# 8 4 21.5 120. 97 3.7 2.46 20.0 1 0 3 1 Toyota Corona
# 9 4 27.3 79 66 4.08 1.94 18.9 1 1 4 1 Fiat X1-9
# 10 4 26 120. 91 4.43 2.14 16.7 0 1 5 2 Porsche 914-2
# 11 4 30.4 95.1 113 3.77 1.51 16.9 1 1 5 2 Lotus Europa
# 12 4 21.4 121 109 4.11 2.78 18.6 1 1 4 2 Volvo 142E
# 13 6 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43 Means
# 14 6 21 160 110 3.9 2.62 16.5 0 1 4 4 Mazda RX4
# 15 6 21 160 110 3.9 2.88 17.0 0 1 4 4 Mazda RX4 Wag
# 16 6 21.4 258 110 3.08 3.22 19.4 1 0 3 1 Hornet 4 Drive
# 17 6 18.1 225 105 2.76 3.46 20.2 1 0 3 1 Valiant
# 18 6 19.2 168. 123 3.92 3.44 18.3 1 0 4 4 Merc 280
# 19 6 17.8 168. 123 3.92 3.44 18.9 1 0 4 4 Merc 280C
# 20 6 19.7 145 175 3.62 2.77 15.5 0 1 5 6 Ferrari Dino
# 21 8 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5 Means
# 22 8 18.7 360 175 3.15 3.44 17.0 0 0 3 2 Hornet Sportabout
# 23 8 14.3 360 245 3.21 3.57 15.8 0 0 3 4 Duster 360
# 24 8 16.4 276. 180 3.07 4.07 17.4 0 0 3 3 Merc 450SE
# 25 8 17.3 276. 180 3.07 3.73 17.6 0 0 3 3 Merc 450SL
# 26 8 15.2 276. 180 3.07 3.78 18 0 0 3 3 Merc 450SLC
# 27 8 10.4 472 205 2.93 5.25 18.0 0 0 3 4 Cadillac Fleetwood
# 28 8 10.4 460 215 3 5.42 17.8 0 0 3 4 Lincoln Continental
# 29 8 14.7 440 230 3.23 5.34 17.4 0 0 3 4 Chrysler Imperial
# 30 8 15.5 318 150 2.76 3.52 16.9 0 0 3 2 Dodge Challenger
# 31 8 15.2 304 150 3.15 3.44 17.3 0 0 3 2 AMC Javelin
# 32 8 13.3 350 245 3.73 3.84 15.4 0 0 3 4 Camaro Z28
# 33 8 19.2 400 175 3.08 3.84 17.0 0 0 3 2 Pontiac Firebird
# 34 8 15.8 351 264 4.22 3.17 14.5 0 1 5 4 Ford Pantera L
# 35 8 15 301 335 3.54 3.57 14.6 0 1 5 8 Maserati Bora
The following code should sort a column, hp, in a descending order, but it fails.
Could someone please point out the problem?
data(mtcars)
result = dplyr::arrange( mtcars, !! rlang::expr("desc(hp)") )
head(result)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Additional note
Using str2lang() instead of rlang::expr() works fine. Could someone explain the reason?
data(mtcars)
result = dplyr::arrange( mtcars, !! str2lang("desc(hp)") )
mpg cyl disp hp drat wt qsec vs am gear carb
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Ford Pantera L 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
To explain the problem, we need to look at what rlang::expr does.
It captures an expression without evaluating it. So when you write rlang::expr("desc(hp)")
the result is actually a string:
# setup:
mtcars <- dplyr::as_tibble(mtcars)
eval(rlang::expr("desc(hp)"))
#> [1] "desc(hp)"
and by this no different than:
dplyr::arrange(mtcars, "desc(hp)")
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
to actually make it work with rlang::expr we need to use an actual expression and not a string as argument:
dplyr::arrange(mtcars, !! rlang::expr(desc(hp)))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 15 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
#> 3 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 4 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#> 5 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 6 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 7 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 8 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 9 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 10 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> # ... with 22 more rows
Since you want to use a string we could use rlang::parse_expr
as you already posted in your own answer:
dplyr::arrange( mtcars, !! rlang::parse_expr("desc(hp)"))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 15 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
#> 3 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 4 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#> 5 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 6 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 7 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 8 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 9 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 10 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> # ... with 22 more rows
Created on 2021-11-03 by the reprex package (v2.0.1)
This is my misunderstanding.
What I should have used is rlang::parse_expr(), not rlang::expr(). The following code works.
data(mtcars)
result = dplyr::arrange( mtcars, !! rlang::parse_expr("desc(hp)") )
This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
I know this question has answers in multiple places, but I am unable to figure out where I am going wrong. Suppose I want to find the sum of hp for each group in cyl:
mtcars%>%
group_by(cyl) %>%
mutate(
sum_hp = sum(hp)
)
sum_hp is giving me 4694 for every value. I want the sum for each value of cyl.
It could be a case of plyr::mutate masking dplyr::mutate when both the packages are loaded. We can specify dplyr::<functionname> to correct this
library(dplyr)
mtcars%>%
group_by(cyl) %>%
dplyr::mutate(sum_hp = sum(hp))
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 856
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 856
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 909
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 856
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2929
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 856
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 2929
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 909
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 909
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 856
# … with 22 more rows
If we use plyr::mutate, the OP's output can be reproduced
mtcars%>%
group_by(cyl) %>%
plyr::mutate(
sum_hp = sum(hp)
)
# A tibble: 32 x 12
# Groups: cyl [3]
# mpg cyl disp hp drat wt qsec vs am gear carb sum_hp
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 4694
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 4694
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 4694
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 4694
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 4694
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 4694
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 4694
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 4694
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 4694
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 4694
# … with 22 more rows
I am using the mtcars built-in dataset. My code is as following:
data("mtcars")
a <- mtcars %>%
group_by(cyl) %>%
arrange(hp)
The output that I get:
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
2 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
3 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
4 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
5 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
6 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
7 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
8 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
9 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
10 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
11 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
12 21 6 160 110 3.9 2.62 16.5 0 1 4 4
13 21 6 160 110 3.9 2.88 17.0 0 1 4 4
14 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
15 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
16 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
17 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
18 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
19 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
20 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
21 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2
22 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
23 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
24 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
25 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
26 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
27 10.4 8 460 215 3 5.42 17.8 0 0 3 4
28 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
29 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
30 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
31 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
32 15 8 301 335 3.54 3.57 14.6 0 1 5 8
As you can see group_by is redundant in this output. I only get my data arranged by "hp" column. I don't understand what I am doing wrong. I want to see everything grouped by "cyl" column and then arranged by "hp".
Grouping isn't really related to sorting. Also, group_by isn't redundant (in the sense of being absolutely ignored) as the second line of the output is
# Groups: cyl [3]
To see that group_by doesn't do sorting, just try
mtcars %>% group_by(cyl) %>% print(n = Inf)
Hence, what you want is first to arrange by cyl and then by hp:
mtcars %>% arrange(cyl, hp)
I have a large sample data of healthcare data called oct
Providers ID date ICD
Billy 4504 9/11 f.11
Billy 5090 9/10 r.05
Max 4430 9/01 k.11
Mindy 0812 9/30 f.11
etc.
I want a random sample of ID numbers for each provider. I have tried.
review <- oct %>% group_by(Providers) %>% do (sample(oct$ID, size = 5, replace= FALSE, prob = NULL))
Example using dplyr::sample_n
library(dplyr)
set.seed(1)
mtcars %>% group_by(cyl) %>% sample_n(3)
# A tibble: 9 x 11
# Groups: cyl [3]
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
2 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
3 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
4 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
5 21 6 160 110 3.9 2.88 17.0 0 1 4 4
6 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
7 15 8 301 335 3.54 3.57 14.6 0 1 5 8
8 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
9 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
If you'd like to just select a specific variable (ID in your question):
set.seed(1)
mtcars %>%
group_by(cyl) %>%
sample_n(3) %>%
pull(mpg)
[1] 22.8 32.4 33.9 19.7 21.0 19.2 15.0 15.5 14.7