Replacing a value in df (all variables) with na [duplicate] - r

This question already has answers here:
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Conditional replacement of values in a data.frame
(5 answers)
Closed 3 years ago.
I am trying to change portions of my data frame in multiple variables from 8 and 9 to NA
Also, does anyone know a quick way to reverse code a vector? (likert scale where 1 is strongly agree, I want the most weight to be at 5)
Any help would be appreciated. Cheers.
naniar::replace_with_na_all(data = amer, condition = ~.x == -8)
data %>% mutate_all(.funs = function(x) replace(var, which(var == -9 | var == -8), NA))
df %>% mutate_each(funs(replace(., .>7, NA))
dep. evidently

Please see the comment to understand how to make your question reproducible for future posts. It's always a good idea to include sample data; if you can't share your data, provide code to generate representative mock data or use one of the built-in datasets.
As to your question, you can use mutate_all in the following way
library(dplyr)
data %>% mutate_all(~ifelse(.x %in% c(-8, -9), NA, .x))
Or you can use replace
data %>% mutate_all(~replace(.x, which(.x %in% c(-8, -9)), NA))
Reproducible example
Let's take mtcars as sample data. To replace all 3 and 4 entries across all columns with NA we can do
mtcars %>% mutate_all(~ifelse(.x %in% c(3, 4), NA, .x))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 NA NA
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 NA NA
#3 22.8 NA 108.0 93 3.85 2.320 18.61 1 1 NA 1
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 NA 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 NA 2
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 NA 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 NA NA
#8 24.4 NA 146.7 62 3.69 3.190 20.00 1 0 NA 2
#9 22.8 NA 140.8 95 3.92 3.150 22.90 1 0 NA 2
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 NA NA
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 NA NA
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 NA NA
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 NA NA
#14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 NA NA
#15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 NA NA
#16 10.4 8 460.0 215 NA 5.424 17.82 0 0 NA NA
#17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 NA NA
#18 32.4 NA 78.7 66 4.08 2.200 19.47 1 1 NA 1
#19 30.4 NA 75.7 52 4.93 1.615 18.52 1 1 NA 2
#20 33.9 NA 71.1 65 4.22 1.835 19.90 1 1 NA 1
#21 21.5 NA 120.1 97 3.70 2.465 20.01 1 0 NA 1
#22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 NA 2
#23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 NA 2
#24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 NA NA
#25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 NA 2
#26 27.3 NA 79.0 66 4.08 1.935 18.90 1 1 NA 1
#27 26.0 NA 120.3 91 4.43 2.140 16.70 0 1 5 2
#28 30.4 NA 95.1 113 3.77 1.513 16.90 1 1 5 2
#29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 NA
#30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#32 21.4 NA 121.0 109 4.11 2.780 18.60 1 1 NA 2
Using replace as
mtcars %>% mutate_all(~replace(.x, which(.x %in% c(3, 4)), NA))
gives the same result.

Related

How to use rowSums() in "dplyr" when including missing data?

I want to use the function rowSums in dplyr and came across some difficulties with missing data. The example data is mtcars. The objective is to estimate the sum of three variables of mpg, cyl and disp by row. However, the results seems incorrect with the following R code when there are missing values within a specific row (see variable new1 and new2 in the output). Any comments and suggestions are appreciated!
data<-mtcars%>%
mutate(
mpg=case_when(mpg>25~NA_real_,TRUE~as.numeric(mpg)), # generate missing data in "mpg"
new1=rowSums(.[c("mpg","cyl","disp")],na.rm=FALSE), # method1: row sum, treat NA as NA?
new2=rowSums(.[c("mpg","cyl","disp")],na.rm=TRUE), # method2: row sum, treat NA as zero?
new3=mpg+cyl+disp # method3: row sum, by hand
)
data
The output is listed below:
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 115.1 115.1 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 110.1 110.1 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 109.0 109.0 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 110.3 110.3 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 150.3 150.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 129.5 129.5 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4
The results are not what you expect because the insertion of NAs in mpg is in the same mutate statement as variables new*; hence, R still considers the previous values that were in variable mpg. By using another mutate step before, we manage to obtain the desided results
mtcars %>%
mutate(
mpg = case_when(mpg > 25 ~ NA_real_, TRUE ~ as.numeric(mpg)) # generate missing data in "mpg"
) %>%
mutate(
new1 = rowSums(.[c("mpg","cyl","disp")], na.rm = FALSE), # method1: row sum, treat NA as NA?
new2 = rowSums(.[c("mpg","cyl","disp")], na.rm = TRUE), # method2: row sum, treat NA as zero?
new3 = mpg + cyl + disp # method3: row sum, by hand
)
Output
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA 82.7 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA 79.7 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA 75.1 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA 83.0 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA 124.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA 99.1 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4
did you try cbind in place of c:
data<-mtcars%>%
mutate(
mpg=case_when(mpg>25~NA_real_,TRUE~as.numeric(mpg)), # generate missing data in "mpg"
new1=rowSums(cbind(mpg,cyl,disp),na.rm=FALSE), # method1: row sum, treat NA as NA?
new2=rowSums(cbind(mpg,cyl,disp),na.rm=TRUE), # method2: row sum, treat NA as zero?
new3=mpg+cyl+disp # method3: row sum, by hand
)
data seems what you'd expect:
mpg cyl disp hp drat wt qsec vs am gear carb new1 new2 new3
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 187.0 187.0 187.0
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 187.0 187.0 187.0
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 134.8 134.8 134.8
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 285.4 285.4 285.4
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 386.7 386.7 386.7
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 249.1 249.1 249.1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 382.3 382.3 382.3
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 175.1 175.1 175.1
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 167.6 167.6 167.6
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 192.8 192.8 192.8
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 191.4 191.4 191.4
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 300.2 300.2 300.2
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 301.1 301.1 301.1
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 299.0 299.0 299.0
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 490.4 490.4 490.4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 478.4 478.4 478.4
17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 462.7 462.7 462.7
18 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA 82.7 NA
19 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA 79.7 NA
20 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA 75.1 NA
21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 145.6 145.6 145.6
22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 341.5 341.5 341.5
23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 327.2 327.2 327.2
24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 371.3 371.3 371.3
25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 427.2 427.2 427.2
26 NA 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA 83.0 NA
27 NA 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA 124.3 NA
28 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA 99.1 NA
29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 374.8 374.8 374.8
30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 170.7 170.7 170.7
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 324.0 324.0 324.0
32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 146.4 146.4 146.4

How to pass a vector with many column names in data table for rowMeans function

I have way too many variables to list them manually inside a rowMeans(cbind()) function. Naturally I tried to pass them packed in one single character vector, but it's not working. I tried with eval, .., mget, yet no one seems to do the trick
column_names <- as.vector(summary$variables) #this is where I take the column names from (characters)
dataset[ , means := rowMeans( cbind( eval(column_names) ) , na.rm=TRUE )]
Thanks
You need to use .SD and .SDcols to specify the relevant columns; here is a minimal reproducible example based on mtcars
library(data.table)
dt <- as.data.table(mtcars)
col_names <- c("mpg", "disp", "drat")
dt[, mean := rowMeans(.SD), .SDcols = col_names]
dt
#mpg cyl disp hp drat wt qsec vs am gear carb mean
#1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 61.63333
#2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 61.63333
#3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 44.88333
#4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 94.16000
#5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 127.28333
#6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 81.95333
#7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 125.83667
#8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 58.26333
#9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 55.84000
#10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 63.57333
#11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 63.10667
#12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 98.42333
#13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 98.72333
#14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 98.02333
#15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 161.77667
#16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 157.80000
#17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 152.64333
#18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 38.39333
#19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 37.01000
#20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 36.40667
#21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 48.43333
#22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 112.08667
#23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 107.45000
#24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 122.34333
#25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 140.76000
#26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 36.79333
#27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 50.24333
#28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 43.09000
#29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 123.67333
#30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 56.10667
#31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 106.51333
#32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 48.83667
#mpg cyl disp hp drat wt qsec vs am gear carb mean
So in your case, something like
dataset[ , means := rowMeans(.SD, na.rm = T), .SDcols = column_names]

How can I convert df's variable assignment from using for-loop to purrr and dplyr?

The code is from r4ds's exercise
trans <- list(
disp = function(x) x * 0.0163871,
am = function(x) {
factor(x, labels = c("auto", "manual"))
}
)
for (var in names(trans)) {
mtcars[[var]] <- trans[[var]](mtcars[[var]])
}
I studied the next section here, and have a question that
How can I remake this code using purrr and dplyr?
Of course, I can do like this
mtcars %>%
mutate(
disp = disp * 0.0163871,
am = factor(am, labels = c("auto", "manual"))
)
But I want to make the best use of FP.
It is very hard to me because of combining variable assignment and purrr
Here is a purrr/dplyr option using imap_dfc
library(tidyverse)
imap_dfc(trans, ~mtcars %>% transmute_at(vars(.y), funs(.x))) %>%
bind_cols(mtcars %>% select(-one_of(names(trans)))) %>%
select(names(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 2.621936 110 3.90 2.620 16.46 0 manual 4 4
#2 21.0 6 2.621936 110 3.90 2.875 17.02 0 manual 4 4
#3 22.8 4 1.769807 93 3.85 2.320 18.61 1 manual 4 1
#4 21.4 6 4.227872 110 3.08 3.215 19.44 1 auto 3 1
#5 18.7 8 5.899356 175 3.15 3.440 17.02 0 auto 3 2
#6 18.1 6 3.687098 105 2.76 3.460 20.22 1 auto 3 1
#7 14.3 8 5.899356 245 3.21 3.570 15.84 0 auto 3 4
#8 24.4 4 2.403988 62 3.69 3.190 20.00 1 auto 4 2
#9 22.8 4 2.307304 95 3.92 3.150 22.90 1 auto 4 2
#10 19.2 6 2.746478 123 3.92 3.440 18.30 1 auto 4 4
#11 17.8 6 2.746478 123 3.92 3.440 18.90 1 auto 4 4
#12 16.4 8 4.519562 180 3.07 4.070 17.40 0 auto 3 3
#13 17.3 8 4.519562 180 3.07 3.730 17.60 0 auto 3 3
#14 15.2 8 4.519562 180 3.07 3.780 18.00 0 auto 3 3
#15 10.4 8 7.734711 205 2.93 5.250 17.98 0 auto 3 4
#16 10.4 8 7.538066 215 3.00 5.424 17.82 0 auto 3 4
#17 14.7 8 7.210324 230 3.23 5.345 17.42 0 auto 3 4
#18 32.4 4 1.289665 66 4.08 2.200 19.47 1 manual 4 1
#19 30.4 4 1.240503 52 4.93 1.615 18.52 1 manual 4 2
#20 33.9 4 1.165123 65 4.22 1.835 19.90 1 manual 4 1
#21 21.5 4 1.968091 97 3.70 2.465 20.01 1 auto 3 1
#22 15.5 8 5.211098 150 2.76 3.520 16.87 0 auto 3 2
#23 15.2 8 4.981678 150 3.15 3.435 17.30 0 auto 3 2
#24 13.3 8 5.735485 245 3.73 3.840 15.41 0 auto 3 4
#25 19.2 8 6.554840 175 3.08 3.845 17.05 0 auto 3 2
#26 27.3 4 1.294581 66 4.08 1.935 18.90 1 manual 4 1
#27 26.0 4 1.971368 91 4.43 2.140 16.70 0 manual 5 2
#28 30.4 4 1.558413 113 3.77 1.513 16.90 1 manual 5 2
#29 15.8 8 5.751872 264 4.22 3.170 14.50 0 manual 5 4
#30 19.7 6 2.376130 175 3.62 2.770 15.50 0 manual 5 6
#31 15.0 8 4.932517 335 3.54 3.570 14.60 0 manual 5 8
#32 21.4 4 1.982839 109 4.11 2.780 18.60 1 manual 4 2
Explanation: imap_dfc(...) column-binds the two modified columns, which in turn are then column-bound to mtcars without the two columns that were modified; the last line re-arranges columns such that they correspond to the original mtcars column ordering.
A possible suggestion, but it is just a different color of the same paint!
result <- mtcars
walk(1:length(trans),
function(i) result <<- result %>% mutate_at(names(trans)[[i]],trans[[i]]))
result
A best one should be
result <- mtcars
pmap(list(names(trans),trans),
function(n,f) result <<- result %>% mutate_at(n,f))
result
And a shorter one :
result <- mtcars
iwalk(trans,
function(f,n) result <<- result %>% mutate_at(n,f))
result

dplyr/rlang: parse_expr with multiple expressions

dplyr/rlang: parse_expr with multiple expressions
For example if i want to parse some string to mutate i can
e1 = "vs + am"
mtcars %>% mutate(!!parse_expr(e1))
But when i want to parse any text with special characters like "," it will give me an error,
e2 = "vs + am , am +vs"
mtcars %>% mutate(!!parse_expr(e2))
Error in parse(text = x) : <text>:1:9: unexpected ','
1: vs + am ,
^
Are there any ways to work around this?
Thanks
We can use the triple-bang operator with the plural form parse_exprs and a modified e2 expression to parse multiple expressions (see ?parse_quosures):
Explanation:
Multiple expressions in e2 need to be separated either by ; or by new lines.
From ?quasiquotation: The !!! operator unquotes and splices its argument. The argument should represents a list or a vector.
e2 = "vs + am ; am +vs";
mtcars %>% mutate(!!!parse_exprs(e2))
# mpg cyl disp hp drat wt qsec vs am gear carb vs + am am + vs
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 1
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1 1
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 2 2
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 0 0
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0 0
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1 1
#9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1 1
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1 1
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 1 1
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 0 0
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 0 0
#14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 0 0
#15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 0
#16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 0
#17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 0
#18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2 2
#19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2 2
#20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2 2
#21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 1 1
#22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 0 0
#23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 0 0
#24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 0 0
#25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 0 0
#26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 2 2
#27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 1
#28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 2 2
#29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 1 1
#30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 1 1
#31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 1 1
#32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 2 2
You could always split them outside the expressions for example:
e2 = "vs + am"
e3 = "am +vs"
mtcars %>% mutate(!!parse_expr(e2),!!parse_expr(e3))
You can do this with parse_exprs and a semicolon instead of a comma thanks #Maurits Evers.
!!! takes a list of elements and splices them into to the current call.
e2 = "vs + am ; am +vs"
mtcars %>% mutate(!!!parse_exprs(e2))
Here a little trick I use to name variables (as Genom asked)
Exemple with 2 named expressions :
across_funs <- function(x, .fns, .cols) {
stopifnot(length(.fns) == length(.cols))
stopifnot(all(sapply(.fns, class) == "call"))
for (i in 1:length(.fns)) {
x <- x %>% mutate(!!.cols[i] := !!.fns[[i]])
}
return(x)
}
funs = parse_exprs(c("vs+am", "am+vs"))
cols = c("var1", "var2")
mtcars %>% across_funs(.fns = funs, .cols = cols)

Assigning by reference in R data.table with i expression provided in string variable

I am building a Shiny app with plotly, and need to filter data on the basis of a number of parameters. Currently I am doing this with a flag in a data.table, updated by reference. The actual data have many columns, and I would vastly prefer an extensible way of adding columns to be visualised. I am coming up short in one area: the actual filtering of the data on the basis of values.
I store the names of the columns to be filtered in an array of characters, but it seems that I can't use this to define the expression by which rows are selected (i.e. the i expression). Is this possible? Or am I approaching this the wrong way?
library(data.table)
set.seed(12345)
dt = data.table(mtcars)
dt[,filtered := FALSE]
filterColumnNames = c('cyl','gear','carb')
filterValues = list(cyl = c(4,6),
gear = c(3),
carb = c(1))
for (columnName in filterColumnNames) {
dt[columnName %in% filterValues[columnName][[1]], filtered := TRUE]
}
# Working, but not loopy enough.
# dt[cyl %in% filterValues['cyl'][[1]], filtered := TRUE]
# dt[gear %in% filterValues['gear'][[1]], filtered := TRUE]
# dt[carb %in% filterValues['carb'][[1]], filtered := TRUE]
print(dt)
Another way to achieve this is to use a join to select the rows:
library(data.table)
dt <- as.data.table(mtcars)
filterValues <- list(cyl = c(4,6),
gear = c(3),
carb = c(1))
dt[do.call(CJ, filterValues), on = names(filterValues), filtered := TRUE][]
mpg cyl disp hp drat wt qsec vs am gear carb filtered
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 NA
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 TRUE
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 NA
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 NA
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 NA
10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 NA
11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 NA
12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 NA
13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 NA
14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 NA
15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 NA
16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 NA
17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 NA
18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA
19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA
20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA
21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 TRUE
22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 NA
23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 NA
24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 NA
25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 NA
26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA
27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA
28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA
29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 NA
30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 NA
31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 NA
32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 NA
mpg cyl disp hp drat wt qsec vs am gear carb filtered
or
dt <- as.data.table(mtcars)
dt[do.call(CJ, filterValues), on = names(filterValues), nomatch = 0L]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
2: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
3: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
You only need to specify the list of filterValues. do.call(CJ, filterValues) (cross join) creates a data.table with all combinations to select the rows by:
cyl gear carb
1: 4 3 1
2: 6 3 1
Edit
The OP has asked if this could be extended to inequalities.
This can be done with data.table's non-equi joins but the setup is somewhat different. E.g.,
filterIntervals <- list(disp = c(200, 300),
mpg = c(10, 20))
mDT <- dcast(melt(filterIntervals), . ~ L1 + rowid(L1))
filterCondition <- c("disp>=disp_1", "disp<disp_2", "mpg>mpg_1", "mpg<mpg_2")
dt[mDT, on = filterCondition, filtered := TRUE][]
mpg cyl disp hp drat wt qsec vs am gear carb filtered
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 NA
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 NA
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 NA
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 NA
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 NA
10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 NA
11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 NA
12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 TRUE
13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 TRUE
14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 TRUE
15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 NA
16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 NA
17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 NA
18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 NA
19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 NA
20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 NA
21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 NA
22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 NA
23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 NA
24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 NA
25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 NA
26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 NA
27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 NA
28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 NA
29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 NA
30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 NA
31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 NA
32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 NA
mpg cyl disp hp drat wt qsec vs am gear carb filtered
The reason is the columnName before the %in% is not evaluated to get the value of that column. We can either use get
for (columnName in filterColumnNames) {
dt[get(columnName) %in% filterValues[columnName][[1]], filtered := TRUE][]
}
or eval(as.name(
for (columnName in filterColumnNames) {
dt[eval(as.name(columnName)) %in% filterValues[columnName][[1]], filtered := TRUE][]
}
You can create a character vector based on the filtering conditions you want to apply. See following example:
library(data.table)
d <- mtcars
setDT(d)
filtering_condition <- "cyl==6"
d[eval(parse(text=filtering_condition))]

Resources