How to use rowSums() in "dplyr" when including missing data? - r

I want to use the function rowSums in dplyr and came across some difficulties with missing data. The example data is mtcars. The objective is to estimate the sum of three variables of mpg, cyl and disp by row. However, the results seems incorrect with the following R code when there are missing values within a specific row (see variable new1 and new2 in the output). Any comments and suggestions are appreciated!
data<-mtcars%>%
mutate(
mpg=case_when(mpg>25~NA_real_,TRUE~as.numeric(mpg)), # generate missing data in "mpg"
new1=rowSums(.[c("mpg","cyl","disp")],na.rm=FALSE), # method1: row sum, treat NA as NA?
new2=rowSums(.[c("mpg","cyl","disp")],na.rm=TRUE), # method2: row sum, treat NA as zero?
new3=mpg+cyl+disp # method3: row sum, by hand
)
data
The output is listed below:
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 115.1 115.1 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 110.1 110.1 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 109.0 109.0 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 110.3 110.3 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 150.3 150.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 129.5 129.5 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4

The results are not what you expect because the insertion of NAs in mpg is in the same mutate statement as variables new*; hence, R still considers the previous values that were in variable mpg. By using another mutate step before, we manage to obtain the desided results
mtcars %>%
mutate(
mpg = case_when(mpg > 25 ~ NA_real_, TRUE ~ as.numeric(mpg)) # generate missing data in "mpg"
) %>%
mutate(
new1 = rowSums(.[c("mpg","cyl","disp")], na.rm = FALSE), # method1: row sum, treat NA as NA?
new2 = rowSums(.[c("mpg","cyl","disp")], na.rm = TRUE), # method2: row sum, treat NA as zero?
new3 = mpg + cyl + disp # method3: row sum, by hand
)
Output
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA 82.7 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA 79.7 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA 75.1 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA 83.0 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA 124.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA 99.1 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4

did you try cbind in place of c:
data<-mtcars%>%
mutate(
mpg=case_when(mpg>25~NA_real_,TRUE~as.numeric(mpg)), # generate missing data in "mpg"
new1=rowSums(cbind(mpg,cyl,disp),na.rm=FALSE), # method1: row sum, treat NA as NA?
new2=rowSums(cbind(mpg,cyl,disp),na.rm=TRUE), # method2: row sum, treat NA as zero?
new3=mpg+cyl+disp # method3: row sum, by hand
)
data seems what you'd expect:
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA 82.7 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA 79.7 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA 75.1 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA 83.0 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA 124.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA 99.1 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4

Related

In creating a new column to a data frame using dplyr, R overwrites existing variable name

I would like to add new columns to an existing data frame. The column names are generated in a FOR loop so that they are numerically sequential. Here is the code:
NewColumn <- paste("return_date", as.character(i), sep = "_")
When I display NewColumn, this is what I want:
[1] "return_date_2"
When I execute:
mutate(Cima, NewColumn = "01-01-01")
The name of the column is: NewColumn
I can rename it, but is there a way to avoid this step?
Why does R not recognize that NewColumn holds a string?
Do you have to use mutate in your code?
If not, replace mutate(Cima, NewColumn = "01-01-01") with Cima[NewColumn] <- "01-01-01"
Because mutate consider the left part of the equal sign to be already the column name. U can get over it with the code below:
library(dplyr)
library(rlang)
i <- 1
NewColumn <- paste("return_date", as.character(i), sep = "_")
> mutate(mtcars, !!NewColumn := 5)
mpg cyl disp hp drat wt qsec vs am gear carb return_date_1
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 5
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 5
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 5
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 5
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 5
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 5
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 5
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 5
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 5
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 5
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 5
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 5
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 5
18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 5
19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 5
20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 5
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 5
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 5
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 5
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 5
26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 5
27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 5
28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 5
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 5
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 5
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 5
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 5
Take a look into this one to understand it better:
Use dynamic variable names in `dplyr`
You can also check advanced R from Hadley Wickham and take a look at the bang bang operator and see what it does.
https://adv-r.hadley.nz/

mutate a variable with curly-curly [duplicate]

This question already has answers here:
Use dynamic name for new column/variable in `dplyr`
(10 answers)
Closed 2 years ago.
I've used curly-curly with group_by and summarise as described in the rlang announcement. But I can't get it to work when mutating a variable in place. What's the best way to do this currently with dplyr?
Say I want to supply an unquoted column name and have it mutated, here's a toy example function that doesn't work:
my_fun <- function(dat, var_name){
dat %>%
mutate({{var_name}} = 1)
}
my_fun(mtcars, cyl)
What should that mutate line be to change any column in mtcars to be a constant?
You need to use the assignment operator (:=) if you want to use the curly-curly to specify a name on the left hand side of an assignment in mutate:
my_fun <- function(dat, var_name){
dat %>%
mutate({{var_name}} := 1)
}
Which allows:
my_fun(mtcars, cyl)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 1 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 1 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 1 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 1 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 1 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 1 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7 14.3 1 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8 24.4 1 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9 22.8 1 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10 19.2 1 167.6 123 3.92 3.440 18.30 1 0 4 4
#> 11 17.8 1 167.6 123 3.92 3.440 18.90 1 0 4 4
#> 12 16.4 1 275.8 180 3.07 4.070 17.40 0 0 3 3
#> 13 17.3 1 275.8 180 3.07 3.730 17.60 0 0 3 3
#> 14 15.2 1 275.8 180 3.07 3.780 18.00 0 0 3 3
#> 15 10.4 1 472.0 205 2.93 5.250 17.98 0 0 3 4
#> 16 10.4 1 460.0 215 3.00 5.424 17.82 0 0 3 4
#> 17 14.7 1 440.0 230 3.23 5.345 17.42 0 0 3 4
#> 18 32.4 1 78.7 66 4.08 2.200 19.47 1 1 4 1
#> 19 30.4 1 75.7 52 4.93 1.615 18.52 1 1 4 2
#> 20 33.9 1 71.1 65 4.22 1.835 19.90 1 1 4 1
#> 21 21.5 1 120.1 97 3.70 2.465 20.01 1 0 3 1
#> 22 15.5 1 318.0 150 2.76 3.520 16.87 0 0 3 2
#> 23 15.2 1 304.0 150 3.15 3.435 17.30 0 0 3 2
#> 24 13.3 1 350.0 245 3.73 3.840 15.41 0 0 3 4
#> 25 19.2 1 400.0 175 3.08 3.845 17.05 0 0 3 2
#> 26 27.3 1 79.0 66 4.08 1.935 18.90 1 1 4 1
#> 27 26.0 1 120.3 91 4.43 2.140 16.70 0 1 5 2
#> 28 30.4 1 95.1 113 3.77 1.513 16.90 1 1 5 2
#> 29 15.8 1 351.0 264 4.22 3.170 14.50 0 1 5 4
#> 30 19.7 1 145.0 175 3.62 2.770 15.50 0 1 5 6
#> 31 15.0 1 301.0 335 3.54 3.570 14.60 0 1 5 8
#> 32 21.4 1 121.0 109 4.11 2.780 18.60 1 1 4 2

dplyr mutate() displaying NA values when matched from dataframe

I am trying to replace values found in one column of a dataframe based upon finding a match in another dataframe using mutate(). Here is an example:
rename_ds <- data.frame(
car_name = c("Camaro Z28","AMC Javelin"),
replace_with = c("Camaro","Javelin"),
stringsAsFactors = FALSE)
mt_cars <- mtcars %>%
tibble::rownames_to_column() %>%
dplyr::rename("car_name" = rowname) %>%
dplyr::mutate(car_name = ifelse(car_name %in% rename_ds$car_name,
rename_ds[which(rename_ds$car_name == car_name),2],
car_name)
When I run this, instead of the car names being replaced by their respective replacements in rename_ds$replace_with, they are NA.
21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23 <NA> 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
24 <NA> 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Any suggestions? Thanks in advance.
We could make it simpler with a named vector and coalesce:
library(dplyr)
mtcars %>%
tibble::rownames_to_column("car_name") %>%
mutate(car_name = coalesce(set_names(rename_ds$replace_with,
rename_ds$car_name)[car_name], car_name))
# car_name mpg cyl disp hp drat wt qsec vs am gear carb
#1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#23 Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#24 Camaro 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
In base R, we could do
pmax(row.names(mtcars), setNames(rename_ds$replace_with,
rename_ds$car_name)[row.names(mtcars)], na.rm = TRUE)
To me this looks more like a join operation:
mtcars %>%
tibble::rownames_to_column() %>%
dplyr::rename("car_name" = rowname) %>%
left_join(rename_ds, by = "car_name") %>%
mutate(car_name = coalesce(replace_with, car_name)) %>%
select(-replace_with)
# car_name mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# 11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# 15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# 16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# 21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# 22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# 23 Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# 24 Camaro 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# 25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# 27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# 28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# 30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# 31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
# 32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
(Rows 23-24 are updated.)
You are on the right track, you can use str_replace_all
mtcars %>%
tibble::rownames_to_column() %>%
dplyr::rename("car_name" = rowname) %>%
dplyr::mutate(car_name = str_replace_all(car_name,
exec(str_c,collapse="|",rename_ds$car_name),
exec(setNames,!!!unname(rev(rename_ds)))))
car_name mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
11 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
12 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
13 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
14 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
15 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
17 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
18 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
19 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
20 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23 Camaro 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 # Changed
24 Javelin 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 # Changed
25 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
28 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
29 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
30 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
31 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
32 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

How to pass a vector with many column names in data table for rowMeans function

I have way too many variables to list them manually inside a rowMeans(cbind()) function. Naturally I tried to pass them packed in one single character vector, but it's not working. I tried with eval, .., mget, yet no one seems to do the trick
column_names <- as.vector(summary$variables) #this is where I take the column names from (characters)
dataset[ , means := rowMeans( cbind( eval(column_names) ) , na.rm=TRUE )]
Thanks
You need to use .SD and .SDcols to specify the relevant columns; here is a minimal reproducible example based on mtcars
library(data.table)
dt <- as.data.table(mtcars)
col_names <- c("mpg", "disp", "drat")
dt[, mean := rowMeans(.SD), .SDcols = col_names]
dt
#mpg cyl disp hp drat wt qsec vs am gear carb mean
#1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 61.63333
#2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 61.63333
#3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 44.88333
#4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 94.16000
#5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 127.28333
#6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 81.95333
#7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 125.83667
#8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 58.26333
#9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 55.84000
#10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 63.57333
#11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 63.10667
#12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 98.42333
#13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 98.72333
#14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 98.02333
#15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 161.77667
#16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 157.80000
#17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 152.64333
#18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 38.39333
#19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 37.01000
#20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 36.40667
#21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 48.43333
#22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 112.08667
#23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 107.45000
#24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 122.34333
#25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 140.76000
#26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 36.79333
#27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 50.24333
#28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 43.09000
#29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 123.67333
#30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 56.10667
#31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 106.51333
#32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 48.83667
#mpg cyl disp hp drat wt qsec vs am gear carb mean
So in your case, something like
dataset[ , means := rowMeans(.SD, na.rm = T), .SDcols = column_names]

How to get function to work where dplyr is being used?

I get an error when trying to call my function where dplyr is used inside the function. Does dplyr not work inside R functions?
all_df_yoy <- function(all_df, units) {
all_df_yoy <- all_df %>% mutate(
players_units_yoy = units)
}
us_players_all_df_yoy <- all_df_yoy(us_players_all_df, players_units_us)
I get the following error.
Error in compat_lazy_dots(.dots, caller_env(), ..., .named = TRUE) :
object 'players_units_us' not found
However, players_units_us does indeed exist inside the data frame​.
Without a minimal reproducible example it's impossible to answer this question to your exact scope, but you need to utilize tidyeval to code functions in the same way that library(dplyr) does. Here is a brief example of what you have to do
library(tidyverse)
create_new_col <- function(df, units) {
units <- enquo(units)
df %>%
mutate(players_units_yoy = !!units)
}
mtcars %>%
create_new_col(cyl)
#> mpg cyl disp hp drat wt qsec vs am gear carb players_units_yoy
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 6
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 6
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6
#> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 8
#> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8
#> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 4
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 4
#> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 6
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 6
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 8
#> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 8
#> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 8
#> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 8
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 8
#> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 8
#> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 4
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 4
#> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 4
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 4
#> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 8
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 8
#> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 8
#> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 8
#> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 4
#> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 4
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 4
#> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 8
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 6
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 8
#> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 4
Created on 2019-05-02 by the reprex package (v0.2.1)
You can read more on this here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
If you are new to programming in R, realize that this is a hurdle most users go through when beginning to develop their own packages. So don't worry if it doesn't click at first, become more familiar with R (try writing your functions using base R) and then come back to this topic.

Resources