more efficient way than using a 'for' loop

more efficient way than using a 'for' loop - r

I feel there's smarter/more efficient way than this code:
df <- mtcars
df$somename <- as.array(rep(c(0), 32))
for (i in 1:32){
df$somename[i] <- sd(c(df$wt[i], df$qsec[i]))
}
maybe with %>%? but how?

An option using purrr::map2
library(tidyverse)
mtcars %>% mutate(somename = map2(wt, qsec, ~sd(c(.x, .y))))
# mpg cyl disp hp drat wt qsec vs am gear carb somename
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 9.786358
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 10.00203
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11.51877
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11.47281
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 9.60251
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 11.85111
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8.6762
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 11.88646
#9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 13.96536
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 10.50761
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 10.93187
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 9.425733
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 9.807571
#14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 10.05506
#15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 9.001469
#16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 8.765296
#17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 8.538314
#18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 12.21173
#19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 11.95364
#20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 12.77388
#21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 12.40619
#22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 9.439876
#23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 9.804036
#24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 8.181225
#25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 9.337345
#26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 11.99607
#27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 10.29547
#28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 10.88025
#29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 8.01152
#30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 9.001469
#31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 7.799388
#32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 11.18643
Update
I re-ran #42-'s microbenchmark analysis using a larger dataset
library(microbenchmark)
df <- do.call(rbind, lapply(1:100, function(x) mtcars))
res <- microbenchmark(
orig = {
df$somename <- as.array(rep(c(0), nrow(df)))
for (i in 1:nrow(df)) {
df$somename[i] <- sd(c(df$wt[i], df$qsec[i]))}},
tidy = {
df <- df %>% mutate(somename = map2(wt, qsec, ~sd(c(.x, .y))))},
mapply = {
df$somename <- mapply(function(x, y) sd(c(x, y)), df$wt, df$qsec)},
rowMeans = {
df$rm <- rowMeans(df[,c("wt","qsec")])
df$sd2col <- sqrt( (df$wt - df$rm)^2 + (df$qsec - df$rm)^2 )})
res
#Unit: microseconds
# expr min lq mean median uq max
# orig 331092.86 349754.808 360716.6501 357229.3920 366635.2820 446581.924
# tidy 168701.28 181079.910 189710.1927 187026.6290 194392.5190 273725.354
# mapply 161711.77 172457.395 179326.5484 177263.3045 183688.5365 266102.901
# rowMeans 228.08 315.854 343.9151 334.8975 358.5915 807.847
library(ggplot2)
autoplot(res)

More of a comment than an answer:
> library(microbenchmark)
> microbenchmark( orig = {df <- mtcars
+
+ df$somename <- as.array(rep(c(0), 32))
+
+ for (i in 1:32){
+ df$somename[i] <- sd(c(df$wt[i], df$qsec[i]))
+ }}, tidy = {
+ mtcars %>% mutate(somename = map2(wt, qsec, ~sd(c(.x, .y))))}, mapply = { mapply(function(x, y) sd(c(x, y)), df$wt, df$qsec)})
#------------------------------------
Unit: microseconds
expr min lq mean median uq max neval cld
orig 5069.391 5161.9270 5555.5886 5236.769 5490.7365 12400.502 100 b
tidy 910.071 943.9685 986.4419 970.541 998.8075 1241.711 100 a
mapply 744.639 761.1875 805.6328 773.426 807.2545 2206.393 100 a

Code:
df$somename <- apply(matrix(c(df$wt, df$qsec), ncol=2), MARGIN = 1, FUN=sd)
Output:
> head(df$somename)
somename
1 9.786358
2 10.002025
3 11.518769
4 11.472808
5 9.602510
6 11.851110
7 8.676200
8 11.886465
9 13.965359
10 10.507607

Related

Optimize/Vectorize Database Query with R

I am attempting to use R to query a large database. Due to the size of the database, I have written the query to fetch 100 rows at a time My code looks something like:
library(RJDBC)
library(DBI)
library(tidyverse)
options(java.parameters = "-Xmx8000m")
drv<-JDBC("driver name", "driver path.jar")
conn<-
dbConnect(
drv,
"database info",
"username",
"password"
)
query<-"SELECT * FROM some_table"
hc<-tibble()
res<-dbSendQuery(conn,query)
repeat{
chunk<-dbFetch(res,100)
if(nrow(chunk)==0){break}
hc<-bind_rows(hc,chunk)
print(nrow(hc))
}
Basically, I would like write something that does the same thing, but via the combination of function and lapply. In theory, given the way R processes data via loops, using lapply will speed up query. Some understanding of the dbFetch function may help. Specifically, how in the repeat loop it doesn't just keep selecting the first initial 100 rows.
I have tried the following, but nothing works:
df_list <- lapply(query , function(x) dbGetQuery(conn, x))
hc<-tibble()
res<-dbSendQuery(conn,query)
test_query<-function(x){
chunk<-dbFetch(res,100)
if(nrow(chunk)==0){break}
print(nrow(hc))
}
bind_rows(lapply(test_query,res))

Consider following the example in dbFetch docs that checks for completed status of fetch, dbHasCompleted. Then, for memory efficiency build a list of data frames/tibbles with lapply then row bind once outside the loop.
rs <- dbSendQuery(con, "SELECT * FROM some_table")
run_chunks <- function(i, res) {
# base::transform OR dplyr::mutate
# base::tryCatch => for empty chunks depending on chunk number
chunk <- tryCatch(transform(dbFetch(res, 100), chunk_no = i),
error = function(e) NULL)
return(chunk)
}
while (!dbHasCompleted(rs)) {
# PROVIDE SUFFICIENT NUMBER OF CHUNKS (table rows / fetch rows)
df_list <- lapply(1:5, run_chunks, res=rs)
}
# base::do.call(rbind, ...) OR dplyr::bind_rows(...)
final_df <- do.call(rbind, df_list)
Demonstration with in-memory SQLite database of mtcars:
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars)
run_chunks <- function(i, res) {
chunk <- dbFetch(res, 10)
return(chunk)
}
rs <- dbSendQuery(con, "SELECT * FROM mtcars")
while (!dbHasCompleted(rs)) {
# PROVIDE SUFFICIENT NUMBER OF CHUNKS (table rows / fetch rows)
df_list <- lapply(1:5, function(i)
print(run_chunks(i, res=rs))
)
}
do.call(rbind, df_list)
dbClearResult(rs)
dbDisconnect(con)
Output (5 chunks of 10 rows, 10 rows, 10 rows, 2 rows, 0 rows, and full 32 rows)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 2 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 3 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 4 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# 5 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# 6 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 7 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 8 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 9 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 10 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# 3 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# 4 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# 5 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 6 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# 7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# 8 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 9 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# 10 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
# 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)
do.call(rbind, df_list)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
# 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

The following works well, as it allows the user to customize the size and number of chunks. Ideally, the function would be Vectorized somehow.
I explored getting the number of rows to automatically set the chunk number, but I couldn't find any methods without actually needing to perform the query first. Adding a large number of chunks doesn't add a ton of extra process time. The performance improvement over the repeat approach depends on the size of the data, but the bigger the data the bigger the performance improvement.
Chunks of n = 1000 seem to consistently produce the best results. Any suggestions to these points would be much appreciated.
Solution:
library(RJDBC)
library(DBI)
library(dplyr)
library(tidyr)
res<-dbSendQuery(conn,"SELECT * FROM some_table")
##Multiplied together need to be greater than N
chunk_size<-1000
chunk_number<-150
run_chunks<-
function(chunk_number, res, chunk_size) {
chunk <-
tryCatch(
dbFetch(res, chunk_size),
error = function(e) NULL
)
if(!is.null(chunk)){
return(chunk)
}
}
dat<-
bind_rows(
lapply(
1:chunk_number,
run_chunks,
res,
chunk_size
)
)

How to pass a vector with many column names in data table for rowMeans function

I have way too many variables to list them manually inside a rowMeans(cbind()) function. Naturally I tried to pass them packed in one single character vector, but it's not working. I tried with eval, .., mget, yet no one seems to do the trick
column_names <- as.vector(summary$variables) #this is where I take the column names from (characters)
dataset[ , means := rowMeans( cbind( eval(column_names) ) , na.rm=TRUE )]
Thanks

You need to use .SD and .SDcols to specify the relevant columns; here is a minimal reproducible example based on mtcars
library(data.table)
dt <- as.data.table(mtcars)
col_names <- c("mpg", "disp", "drat")
dt[, mean := rowMeans(.SD), .SDcols = col_names]
dt
#mpg cyl disp hp drat wt qsec vs am gear carb mean
#1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 61.63333
#2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 61.63333
#3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 44.88333
#4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 94.16000
#5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 127.28333
#6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 81.95333
#7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 125.83667
#8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 58.26333
#9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 55.84000
#10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 63.57333
#11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 63.10667
#12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 98.42333
#13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 98.72333
#14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 98.02333
#15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 161.77667
#16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 157.80000
#17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 152.64333
#18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 38.39333
#19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 37.01000
#20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 36.40667
#21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 48.43333
#22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 112.08667
#23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 107.45000
#24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 122.34333
#25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 140.76000
#26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 36.79333
#27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 50.24333
#28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 43.09000
#29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 123.67333
#30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 56.10667
#31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 106.51333
#32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 48.83667
#mpg cyl disp hp drat wt qsec vs am gear carb mean
So in your case, something like
dataset[ , means := rowMeans(.SD, na.rm = T), .SDcols = column_names]

Filter values in multiple dataframes

I can not get my head around this. I have a dataset which contains a data.frame in per day for 3 years, so i have a list with 1000 dataframes.
I want to filter all dataframes like in the example below. I know I could easily filter (or use rbindlist), first and then do the split, but I desire a way to apply a filter function to multiple dataframes. Can you help me? The code below does not work, but hope it helps to make clear what I want to archieve.
dflist <- mtcars %>%
split(.$cyl)
lapply(dflist, function(x) dplyr::filter(x[["mpg"]] > 10))

The filter works on a data.frame/tbl_df. Instead, we are extracting a vector (x[["mpg"]])
library(tidyverse)
filter(mtcars$mpg > 10)
Error in UseMethod("filter_") : no applicable method for 'filter_'
applied to an object of class "logical"
and apply filter on it.
We need to apply filter on the data.frame itself
map(dflist, ~ .x %>%
filter(mpg > 10))
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#7 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#8 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#10 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#11 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#9 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#10 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#11 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#12 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#13 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#14 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Or using lapply
lapply(dflist, function(x) x %>%
filter(mpg > 10))

How can I convert df's variable assignment from using for-loop to purrr and dplyr?

The code is from r4ds's exercise
trans <- list(
disp = function(x) x * 0.0163871,
am = function(x) {
factor(x, labels = c("auto", "manual"))
}
)
for (var in names(trans)) {
mtcars[[var]] <- trans[[var]](mtcars[[var]])
}
I studied the next section here, and have a question that
How can I remake this code using purrr and dplyr?
Of course, I can do like this
mtcars %>%
mutate(
disp = disp * 0.0163871,
am = factor(am, labels = c("auto", "manual"))
)
But I want to make the best use of FP.
It is very hard to me because of combining variable assignment and purrr

Here is a purrr/dplyr option using imap_dfc
library(tidyverse)
imap_dfc(trans, ~mtcars %>% transmute_at(vars(.y), funs(.x))) %>%
bind_cols(mtcars %>% select(-one_of(names(trans)))) %>%
select(names(mtcars))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21.0 6 2.621936 110 3.90 2.620 16.46 0 manual 4 4
#2 21.0 6 2.621936 110 3.90 2.875 17.02 0 manual 4 4
#3 22.8 4 1.769807 93 3.85 2.320 18.61 1 manual 4 1
#4 21.4 6 4.227872 110 3.08 3.215 19.44 1 auto 3 1
#5 18.7 8 5.899356 175 3.15 3.440 17.02 0 auto 3 2
#6 18.1 6 3.687098 105 2.76 3.460 20.22 1 auto 3 1
#7 14.3 8 5.899356 245 3.21 3.570 15.84 0 auto 3 4
#8 24.4 4 2.403988 62 3.69 3.190 20.00 1 auto 4 2
#9 22.8 4 2.307304 95 3.92 3.150 22.90 1 auto 4 2
#10 19.2 6 2.746478 123 3.92 3.440 18.30 1 auto 4 4
#11 17.8 6 2.746478 123 3.92 3.440 18.90 1 auto 4 4
#12 16.4 8 4.519562 180 3.07 4.070 17.40 0 auto 3 3
#13 17.3 8 4.519562 180 3.07 3.730 17.60 0 auto 3 3
#14 15.2 8 4.519562 180 3.07 3.780 18.00 0 auto 3 3
#15 10.4 8 7.734711 205 2.93 5.250 17.98 0 auto 3 4
#16 10.4 8 7.538066 215 3.00 5.424 17.82 0 auto 3 4
#17 14.7 8 7.210324 230 3.23 5.345 17.42 0 auto 3 4
#18 32.4 4 1.289665 66 4.08 2.200 19.47 1 manual 4 1
#19 30.4 4 1.240503 52 4.93 1.615 18.52 1 manual 4 2
#20 33.9 4 1.165123 65 4.22 1.835 19.90 1 manual 4 1
#21 21.5 4 1.968091 97 3.70 2.465 20.01 1 auto 3 1
#22 15.5 8 5.211098 150 2.76 3.520 16.87 0 auto 3 2
#23 15.2 8 4.981678 150 3.15 3.435 17.30 0 auto 3 2
#24 13.3 8 5.735485 245 3.73 3.840 15.41 0 auto 3 4
#25 19.2 8 6.554840 175 3.08 3.845 17.05 0 auto 3 2
#26 27.3 4 1.294581 66 4.08 1.935 18.90 1 manual 4 1
#27 26.0 4 1.971368 91 4.43 2.140 16.70 0 manual 5 2
#28 30.4 4 1.558413 113 3.77 1.513 16.90 1 manual 5 2
#29 15.8 8 5.751872 264 4.22 3.170 14.50 0 manual 5 4
#30 19.7 6 2.376130 175 3.62 2.770 15.50 0 manual 5 6
#31 15.0 8 4.932517 335 3.54 3.570 14.60 0 manual 5 8
#32 21.4 4 1.982839 109 4.11 2.780 18.60 1 manual 4 2
Explanation: imap_dfc(...) column-binds the two modified columns, which in turn are then column-bound to mtcars without the two columns that were modified; the last line re-arranges columns such that they correspond to the original mtcars column ordering.

A possible suggestion, but it is just a different color of the same paint!
result <- mtcars
walk(1:length(trans),
function(i) result <<- result %>% mutate_at(names(trans)[[i]],trans[[i]]))
result
A best one should be
result <- mtcars
pmap(list(names(trans),trans),
function(n,f) result <<- result %>% mutate_at(n,f))
result
And a shorter one :
result <- mtcars
iwalk(trans,
function(f,n) result <<- result %>% mutate_at(n,f))
result

dplyr/rlang: parse_expr with multiple expressions

dplyr/rlang: parse_expr with multiple expressions
For example if i want to parse some string to mutate i can
e1 = "vs + am"
mtcars %>% mutate(!!parse_expr(e1))
But when i want to parse any text with special characters like "," it will give me an error,
e2 = "vs + am , am +vs"
mtcars %>% mutate(!!parse_expr(e2))
Error in parse(text = x) : <text>:1:9: unexpected ','
1: vs + am ,
^
Are there any ways to work around this?
Thanks

We can use the triple-bang operator with the plural form parse_exprs and a modified e2 expression to parse multiple expressions (see ?parse_quosures):
Explanation:
Multiple expressions in e2 need to be separated either by ; or by new lines.
From ?quasiquotation: The !!! operator unquotes and splices its argument. The argument should represents a list or a vector.
e2 = "vs + am ; am +vs";
mtcars %>% mutate(!!!parse_exprs(e2))
# mpg cyl disp hp drat wt qsec vs am gear carb vs + am am + vs
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 1
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1 1
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 2 2
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 0 0
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0 0
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1 1
#9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1 1
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1 1
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 1 1
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 0 0
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 0 0
#14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 0 0
#15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 0
#16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 0
#17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 0
#18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2 2
#19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2 2
#20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2 2
#21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 1 1
#22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 0 0
#23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 0 0
#24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 0 0
#25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 0 0
#26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 2 2
#27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 1
#28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 2 2
#29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 1 1
#30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 1 1
#31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 1 1
#32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 2 2

You could always split them outside the expressions for example:
e2 = "vs + am"
e3 = "am +vs"
mtcars %>% mutate(!!parse_expr(e2),!!parse_expr(e3))
You can do this with parse_exprs and a semicolon instead of a comma thanks #Maurits Evers.
!!! takes a list of elements and splices them into to the current call.
e2 = "vs + am ; am +vs"
mtcars %>% mutate(!!!parse_exprs(e2))

Here a little trick I use to name variables (as Genom asked)
Exemple with 2 named expressions :
across_funs <- function(x, .fns, .cols) {
stopifnot(length(.fns) == length(.cols))
stopifnot(all(sapply(.fns, class) == "call"))
for (i in 1:length(.fns)) {
x <- x %>% mutate(!!.cols[i] := !!.fns[[i]])
}
return(x)
}
funs = parse_exprs(c("vs+am", "am+vs"))
cols = c("var1", "var2")
mtcars %>% across_funs(.fns = funs, .cols = cols)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

more efficient way than using a 'for' loop - r

I feel there's smarter/more efficient way than this code: df <- mtcars df$somename <- as.array(rep(c(0), 32)) for (i in 1:32){ df$somename[i] <- sd(c(df$wt[i], df$qsec[i])) } maybe with %>%? but how?

Code: df$somename <- apply(matrix(c(df$wt, df$qsec), ncol=2), MARGIN = 1, FUN=sd) Output: > head(df$somename) somename 1 9.786358 2 10.002025 3 11.518769 4 11.472808 5 9.602510 6 11.851110 7 8.676200 8 11.886465 9 13.965359 10 10.507607

Related

Optimize/Vectorize Database Query with R

How to pass a vector with many column names in data table for rowMeans function

Filter values in multiple dataframes

How can I convert df's variable assignment from using for-loop to purrr and dplyr?

dplyr/rlang: parse_expr with multiple expressions

Categories

Resources