How the Mode Formula in R works...?

How the Mode Formula in R works...? - r

I found this formula can be used for R to find the MODE for any column in a Dataset, ho does it work...??
names(sort(-table(mtcars$wt)))[1]
it can be used to find the MODE for wt colimn.
I need to understand this formula.

To learn what the whole expression does, you should step through each component.
table tabulates (counts) the occurrences for each unique value within $wt:
table(mtcars$wt)
# 1.513 1.615 1.835 1.935 2.14 2.2 2.32 2.465 2.62 2.77 2.78 2.875 3.15 3.17 3.19 3.215 3.435 3.44 3.46
# 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1
# 3.52 3.57 3.73 3.78 3.84 3.845 4.07 5.25 5.345 5.424
# 1 2 1 1 1 1 1 1 1 1
Note that the original "value" of $wt is stored as the names within the returned vector.
sort(-table(.)) then brings the most-frequent value to the front (left) and least-frequent value to the back (right).
sort(-table(mtcars$wt))
# 3.44 3.57 1.513 1.615 1.835 1.935 2.14 2.2 2.32 2.465 2.62 2.77 2.78 2.875 3.15 3.17 3.19 3.215 3.435
# -3 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
# 3.46 3.52 3.73 3.78 3.84 3.845 4.07 5.25 5.345 5.424
# -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Sorting on the negative of it is equivalent to sort(table(.), decreasing=TRUE).
names(..) will return the original wt values from this vector, sorted in the decreasing order of their counts. Adding [1] to that returns only the first of the name.
Long-story-short: this returns the first value within mtcars$wt that occurs the most. FYI, if there are multiple values with the same count, this code will not indicate that condition.

Related

Optimize/Vectorize Database Query with R

I am attempting to use R to query a large database. Due to the size of the database, I have written the query to fetch 100 rows at a time My code looks something like:
library(RJDBC)
library(DBI)
library(tidyverse)
options(java.parameters = "-Xmx8000m")
drv<-JDBC("driver name", "driver path.jar")
conn<-
dbConnect(
drv,
"database info",
"username",
"password"
)
query<-"SELECT * FROM some_table"
hc<-tibble()
res<-dbSendQuery(conn,query)
repeat{
chunk<-dbFetch(res,100)
if(nrow(chunk)==0){break}
hc<-bind_rows(hc,chunk)
print(nrow(hc))
}
Basically, I would like write something that does the same thing, but via the combination of function and lapply. In theory, given the way R processes data via loops, using lapply will speed up query. Some understanding of the dbFetch function may help. Specifically, how in the repeat loop it doesn't just keep selecting the first initial 100 rows.
I have tried the following, but nothing works:
df_list <- lapply(query , function(x) dbGetQuery(conn, x))
hc<-tibble()
res<-dbSendQuery(conn,query)
test_query<-function(x){
chunk<-dbFetch(res,100)
if(nrow(chunk)==0){break}
print(nrow(hc))
}
bind_rows(lapply(test_query,res))

Consider following the example in dbFetch docs that checks for completed status of fetch, dbHasCompleted. Then, for memory efficiency build a list of data frames/tibbles with lapply then row bind once outside the loop.
rs <- dbSendQuery(con, "SELECT * FROM some_table")
run_chunks <- function(i, res) {
# base::transform OR dplyr::mutate
# base::tryCatch => for empty chunks depending on chunk number
chunk <- tryCatch(transform(dbFetch(res, 100), chunk_no = i),
error = function(e) NULL)
return(chunk)
}
while (!dbHasCompleted(rs)) {
# PROVIDE SUFFICIENT NUMBER OF CHUNKS (table rows / fetch rows)
df_list <- lapply(1:5, run_chunks, res=rs)
}
# base::do.call(rbind, ...) OR dplyr::bind_rows(...)
final_df <- do.call(rbind, df_list)
Demonstration with in-memory SQLite database of mtcars:
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars)
run_chunks <- function(i, res) {
chunk <- dbFetch(res, 10)
return(chunk)
}
rs <- dbSendQuery(con, "SELECT * FROM mtcars")
while (!dbHasCompleted(rs)) {
# PROVIDE SUFFICIENT NUMBER OF CHUNKS (table rows / fetch rows)
df_list <- lapply(1:5, function(i)
print(run_chunks(i, res=rs))
)
}
do.call(rbind, df_list)
dbClearResult(rs)
dbDisconnect(con)
Output (5 chunks of 10 rows, 10 rows, 10 rows, 2 rows, 0 rows, and full 32 rows)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 2 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 3 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 4 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# 5 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# 6 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 7 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 8 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 9 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 10 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# 3 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# 4 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# 5 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 6 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# 7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# 8 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 9 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# 10 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
# 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)
do.call(rbind, df_list)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
# 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
# 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
# 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
# 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
# 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
# 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
# 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

The following works well, as it allows the user to customize the size and number of chunks. Ideally, the function would be Vectorized somehow.
I explored getting the number of rows to automatically set the chunk number, but I couldn't find any methods without actually needing to perform the query first. Adding a large number of chunks doesn't add a ton of extra process time. The performance improvement over the repeat approach depends on the size of the data, but the bigger the data the bigger the performance improvement.
Chunks of n = 1000 seem to consistently produce the best results. Any suggestions to these points would be much appreciated.
Solution:
library(RJDBC)
library(DBI)
library(dplyr)
library(tidyr)
res<-dbSendQuery(conn,"SELECT * FROM some_table")
##Multiplied together need to be greater than N
chunk_size<-1000
chunk_number<-150
run_chunks<-
function(chunk_number, res, chunk_size) {
chunk <-
tryCatch(
dbFetch(res, chunk_size),
error = function(e) NULL
)
if(!is.null(chunk)){
return(chunk)
}
}
dat<-
bind_rows(
lapply(
1:chunk_number,
run_chunks,
res,
chunk_size
)
)

find values of vector to separate data into groups with same number of data in each group

Let's see if I can explain this clearly...
Say I have a vector mtcars$mpg, if you do hist(mtcars$mpg) you see that there are 6 values between 10 and 15, 12 between 15 and 20,...
What I'm trying to do is find the values of mtcars$mpg that I can later use to separate the data into groups, where each group has the same number of data.
For instance, maybe 10, 16 and 22 allow to have 8 data between 10 and 16 and also 8 data between 16 and 22.
(I looked on SO but can't find any questions/answers that address this)

Since mpg is a continuous variable you can arbitrarily group the data by sorting the dataframe by its values, and then simply adding a grouping variable with rep(x, each = n). For example, using base R and n <- 8 for groups of 8:
df <- mtcars[order(mtcars$mpg),]
df$group <- rep(1:(nrow(df) / n), each = n)
Calling the following will return the first observation from each group, which is your cutoff, and join it to the original dataframe:
cutoffs <- aggregate(df$mpg, list(group = df$group), `[`, 1)
merge(df, cutoffs, by = "group")
#### OUTPUT ####
group mpg cyl disp hp drat wt qsec vs am gear carb x
1 1 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 10.4
2 1 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 10.4
3 1 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 10.4
4 1 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 10.4
5 1 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 10.4
6 1 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 10.4
7 1 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 10.4
8 1 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 10.4
9 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 15.5
10 2 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 15.5
11 ...
If you feel comfortable with dplyr you can use ntile, left_join, and summarise:
library(dplyr)
mutate(mtcars, group = ntile(mpg, 4)) %>%
group_by(group) %>%
left_join(summarise(., cutoff = first(mpg, order_by = mpg)), by = "group") %>%
arrange(mpg)
#### OUTPUT ####
# A tibble: 32 x 13
mpg cyl disp hp drat wt qsec vs am gear carb group cutoff
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4 1 10.4
2 10.4 8 460 215 3 5.42 17.8 0 0 3 4 1 10.4
3 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4 1 10.4
4 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 1 10.4
5 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4 1 10.4
6 15 8 301 335 3.54 3.57 14.6 0 1 5 8 1 10.4
7 15.2 8 276. 180 3.07 3.78 18 0 0 3 3 1 10.4
8 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2 1 10.4
9 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2 2 15.5
10 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4 2 15.5
# … with 22 more rows

Scope through variables using mutate_at/ ifelse creating new variables

I have this code checking for outliers (pseudo outliers in this data -- only 1.25sd plus the mean in this example) using a function but for scaling it up for many variables without specifying each ifelse would there be a way?
library(tidyverse)
meanplusd <- function (var){mean(var, na.rm = TRUE)+(1.25*(sd(var, na.rm = TRUE)))}
mtcars%>%
mutate_at(vars(drat:qsec), .funs = list(meanplus = ~ meanplusd(.))) %>%
mutate(outlier_drat = ifelse(drat > drat_meanplus,1,0),
outlier_wt = ifelse(wt > wt_meanplus,1,0),
outlier_qsec = ifelse(qsec > qsec_meanplus ,1,0)) %>%
filter_at(vars(outlier_drat:outlier_qsec), any_vars (.== 1)) %>%
select(-c(drat_meanplus:qsec_meanplus))
mpg cyl disp hp drat wt qsec vs am gear carb outlier_drat outlier_wt outlier_qsec
1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 0 0 1
2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 0 0 1
3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 1 0
4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 1 0
5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 1 0
6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1 0 0
7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 0 0
>
Open to non-tidyverse ways too for learning purposes.

You could determine outliers all in one function:
is_outlier <- function(var) {
as.numeric(var > na.omit(var) %>% {mean(.) + 1.25*sd(.)})
}
mtcars %>%
mutate_at(vars(drat:qsec), .funs = list(outlier = ~ is_outlier(.))) %>%
filter_at(vars(drat_outlier:qsec_outlier), any_vars (.== 1))
mpg cyl disp hp drat wt qsec vs am gear carb drat_outlier wt_outlier qsec_outlier
1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 0 0 1
2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 0 0 1
3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 1 0
4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 1 0
5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 1 0
6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1 0 0
7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 0 0

If you only want to filter the rows you could directly use filter_at and apply meanplusd function
library(dplyr)
mtcars %>% filter_at(vars(drat:qsec), any_vars(. > meanplusd(.)))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Or in base R we can use sapply over selected columns and use rowSums
mtcars[rowSums(sapply(mtcars[5:7], function(x) x > meanplusd(x))) > 0, ]
However, if you want new columns with the outlier value you can do something like
df <- mtcars
cols <- names(df)[5:7]
df[paste0(cols, "_outlier")] <- lapply(mtcars[cols],function(x) +(x > meanplusd(x)))
df[rowSums(df[paste0(cols, "_outlier")]) > 0, ]

Dplyr group_by and arrange functions together doesn't group same values together

I am using the mtcars built-in dataset. My code is as following:
data("mtcars")
a <- mtcars %>%
group_by(cyl) %>%
arrange(hp)
The output that I get:
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
2 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
3 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
4 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
5 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
6 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
7 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
8 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
9 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
10 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
11 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
12 21 6 160 110 3.9 2.62 16.5 0 1 4 4
13 21 6 160 110 3.9 2.88 17.0 0 1 4 4
14 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
15 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
16 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
17 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
18 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
19 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
20 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
21 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2
22 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
23 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
24 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
25 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
26 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
27 10.4 8 460 215 3 5.42 17.8 0 0 3 4
28 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
29 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
30 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
31 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
32 15 8 301 335 3.54 3.57 14.6 0 1 5 8
As you can see group_by is redundant in this output. I only get my data arranged by "hp" column. I don't understand what I am doing wrong. I want to see everything grouped by "cyl" column and then arranged by "hp".

Grouping isn't really related to sorting. Also, group_by isn't redundant (in the sense of being absolutely ignored) as the second line of the output is
# Groups: cyl [3]
To see that group_by doesn't do sorting, just try
mtcars %>% group_by(cyl) %>% print(n = Inf)
Hence, what you want is first to arrange by cyl and then by hp:
mtcars %>% arrange(cyl, hp)

dplyr/rlang: parse_expr with multiple expressions

dplyr/rlang: parse_expr with multiple expressions
For example if i want to parse some string to mutate i can
e1 = "vs + am"
mtcars %>% mutate(!!parse_expr(e1))
But when i want to parse any text with special characters like "," it will give me an error,
e2 = "vs + am , am +vs"
mtcars %>% mutate(!!parse_expr(e2))
Error in parse(text = x) : <text>:1:9: unexpected ','
1: vs + am ,
^
Are there any ways to work around this?
Thanks

We can use the triple-bang operator with the plural form parse_exprs and a modified e2 expression to parse multiple expressions (see ?parse_quosures):
Explanation:
Multiple expressions in e2 need to be separated either by ; or by new lines.
From ?quasiquotation: The !!! operator unquotes and splices its argument. The argument should represents a list or a vector.
e2 = "vs + am ; am +vs";
mtcars %>% mutate(!!!parse_exprs(e2))
# mpg cyl disp hp drat wt qsec vs am gear carb vs + am am + vs
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 1 1
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 1 1
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 2 2
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1 1
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 0 0
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 1 1
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 0 0
#8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 1 1
#9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 1 1
#10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 1 1
#11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 1 1
#12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 0 0
#13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 0 0
#14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 0 0
#15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 0
#16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 0
#17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 0
#18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2 2
#19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2 2
#20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2 2
#21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 1 1
#22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 0 0
#23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 0 0
#24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 0 0
#25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 0 0
#26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 2 2
#27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 1
#28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 2 2
#29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 1 1
#30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 1 1
#31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 1 1
#32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 2 2

You could always split them outside the expressions for example:
e2 = "vs + am"
e3 = "am +vs"
mtcars %>% mutate(!!parse_expr(e2),!!parse_expr(e3))
You can do this with parse_exprs and a semicolon instead of a comma thanks #Maurits Evers.
!!! takes a list of elements and splices them into to the current call.
e2 = "vs + am ; am +vs"
mtcars %>% mutate(!!!parse_exprs(e2))

Here a little trick I use to name variables (as Genom asked)
Exemple with 2 named expressions :
across_funs <- function(x, .fns, .cols) {
stopifnot(length(.fns) == length(.cols))
stopifnot(all(sapply(.fns, class) == "call"))
for (i in 1:length(.fns)) {
x <- x %>% mutate(!!.cols[i] := !!.fns[[i]])
}
return(x)
}
funs = parse_exprs(c("vs+am", "am+vs"))
cols = c("var1", "var2")
mtcars %>% across_funs(.fns = funs, .cols = cols)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How the Mode Formula in R works...? - r

I found this formula can be used for R to find the MODE for any column in a Dataset, ho does it work...?? names(sort(-table(mtcars$wt)))[1] it can be used to find the MODE for wt colimn. I need to understand this formula.

Related

Optimize/Vectorize Database Query with R

find values of vector to separate data into groups with same number of data in each group

Scope through variables using mutate_at/ ifelse creating new variables

Dplyr group_by and arrange functions together doesn't group same values together

dplyr/rlang: parse_expr with multiple expressions

Categories

Resources