Use mutate in purrr workflow - r

I got the following datasets:
dflist <- list(mtcars, mtcars)
dflist[[1]] %>%
mutate(cyl2 = cyl * 2)
This works!
dflist %>%
map(.x, ~.x$cyl2 = .x$cyl * 2)
Error: unexpected '=' in:
"dflist %>%
map(.x, ~x$cyl2 ="
This results in an error. I tried other options, but the function does not except the = sign. What is wrong there?

Try :
library(dplyr)
library(purrr)
dflist %>% map(~.x %>% mutate(cyl2 = cyl * 2))
#[[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#....
#[[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb cyl2
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 12
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 12
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 12
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 16
#...
Or keeping it in base R:
lapply(dflist, function(x) transform(x, cyl2 = cyl * 2))

You can also try:
modify(dflist, ~ update_list(., cyl2 = ~ cyl * 2))
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb cyl2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16

We can use transform without anonymous function call in base R
lapply(dflist, transform, cyl2 = cyl *2)

Related

Mutate specific colums and add pattern to the name

I would like to use mutate to add new columns to a data.frame based on specific colums divided by another column and keep the originalname plus a fixed pattern.
mtcars$mpg_HorsePower = mtcars$mpg / mtcars$hp
mtcars$cyl_HorsePower = mtcars$cyl / mtcars$hp
mtcars$disp_HorsePower = mtcars$disp / mtcars$hp
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_HorsePower cyl_HorsePower disp_HorsePower
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.1909091 0.05454545 1.454545
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.1909091 0.05454545 1.454545
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.2451613 0.04301075 1.161290
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 0.1945455 0.05454545 2.345455
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.1068571 0.04571429 2.057143
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 0.1723810 0.05714286 2.142857
I was hoping that something like this
mtcars %>%
mutate_at(vars(mpg:disp), funs(. / hp))
would work but does nothing.
Using dplyr::across you could achieve your desired result like so:
library(dplyr, w = FALSE)
mtcars2 <- mtcars %>%
mutate(across(mpg:disp, list(Horsepower = ~. / hp)))
head(mtcars2)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#> mpg_Horsepower cyl_Horsepower disp_Horsepower
#> Mazda RX4 0.1909091 0.05454545 1.454545
#> Mazda RX4 Wag 0.1909091 0.05454545 1.454545
#> Datsun 710 0.2451613 0.04301075 1.161290
#> Hornet 4 Drive 0.1945455 0.05454545 2.345455
#> Hornet Sportabout 0.1068571 0.04571429 2.057143
#> Valiant 0.1723810 0.05714286 2.142857

If column A equals criteria return value of column B in column C

Using the R inbuilt dataset
mtcars
I want to make a column called "want".
mtcars$want<-NA
When column "carb" is equal to 1 (Column A), input value of column "qsec" (Column B) in column "want" (Column C).
If carb is not equal to 1 do nothing.
The first 5 rows of the new dataset should look like this:
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 NA
This should do the job:
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, NA)
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 NA
If you only want to achieve it in the print out you could try the following (in the data.frame itself this will still be shown as NA):
mtcars$want <- ifelse(mtcars$carb == 1, mtcars$qsec, "")
head(mtcars, 5)
mpg cyl disp hp drat wt qsec vs am gear carb want
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
If it is helpful, I am of the impression that a loop over the columns should work. One can modify the loop or add further conditionals as appropriate to fill in the other values of the column.
#written in R version 4.2.1
data(mtcars)
mtcars$want = 0
for(i in 1:dim(mtcars)[1]){
if(mtcars$carb[i] == 1){
mtcars$want[i] = mtcars$qsec[i]
}}
Result:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb want
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.00
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.00
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0.00
#Valiant
What you can do is first set a value to your new column "want" for example 2. You can use ifelse to do your criteria and return "want" if do nothing like this:
mtcars$want <- 2
library(dplyr)
mtcars %>%
mutate(want = ifelse(carb == 1, qsec, want)) %>%
head(5)
#> mpg cyl disp hp drat wt qsec vs am gear carb want
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2.00
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2.00
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 18.61
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 19.44
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.00
Created on 2022-06-30 by the reprex package (v2.0.1)

how to provide expression to `purrr::pmap` for function using tidy evaluation

I am trying to write a function using rlang so that I can subset data based on supplied expression. Although the actual function is complicated, here is a minimal version of it that illustrates the problem.
minimal version of needed function
library(rlang)
# define a function
foo <- function(data, expr = NULL) {
if (!quo_is_null(enquo(expr))) {
dplyr::filter(data, !!enexpr(expr))
} else {
data
}
}
# does the function work? yes
head(foo(mtcars, NULL)) # with NULL
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(foo(mtcars, mpg > 20)) # with expression
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
problems with purrr::pmap
When used with purrr::pmap(), it works as expected when expr is NULL, but not otherwise. Instead of list, I also tried using alist to supply the input.
library(purrr)
# works when expression is `NULL`
pmap(
.l = list(data = list(head(mtcars)), expr = list(NULL)),
.f = foo
)
#> [[1]]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# but not otherwise
pmap(
.l = list(data = list(head(mtcars)), expr = list("mpg > 20")),
.f = foo
)
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `"mpg > 20"`.
#> x Input `..1` must be a logical vector, not a character.
Created on 2021-07-20 by the reprex package (v2.0.0)
One way to make this work is by wrapping with quote
purrr::pmap(
.l = list(data = list(head(mtcars)), expr = list(quote(mpg > 20))),
.f = foo
)
-output
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
which also works with the NULL
pmap(
.l = list(data = list(head(mtcars)), expr = list(quote(NULL))),
.f = foo
)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Same output with subset
subset(head(mtcars), mpg > 20)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Or another option is to modify the function by changing the enexpr to parse_expr
foo1 <- function(data, expr = NULL) {
if (!quo_is_null(enquo(expr))) {
dplyr::filter(data, !!parse_expr(expr))
} else {
data
}
}
-testing
> pmap(
+ .l = list(data = list(head(mtcars)), expr = list(NULL)),
+ .f = foo1
+ )
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
>
> pmap(
+ .l = list(data = list(head(mtcars)), expr = list("mpg > 20")),
+ .f = foo1
+ )
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

count number of rows needed to have sum greater than a particular value in R

I want to subset dataframe such that no of rows needed to get mpg value is at least 100.
library(datasets)
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The output should be top 5 values
here mpg sum is >100 after Hornet Sportabout
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
I want to the checksum at each row for the mpg column and then the output as no of rows it took to get that sum of at least 100
I would use cumsum in association with lag
library(dplyr)
mtcars %>%
filter(cumsum(lag(mpg, default = 0)) < 100)
This should solve it
library(tidyverse)
df_answer <- mtcars %>%
rownames_to_column() %>%
tibble() %>%
mutate(cum_sum = cumsum(mpg)) %>%
filter(cum_sum < 100)
df_answer %>%
nrow() + 1
A base R option using subset + cumsum
subset(mtcars, c(TRUE, cumsum(mpg) <= 100)[-nrow(mtcars)])
gives
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
You can also use the purrr library:
library(purrr)
which.max(purrr::accumulate(mtcars$mpg, `+`) > 100)
# 5
If you want the whole dataset you can use dplyr::slice:
library(tidyverse)
dplyr::slice(mtcars, 1 : which.max(purrr::accumulate(mtcars$mpg, `+`) > 100))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
You can use a filter condition with dplyr:
library(tidyverse)
mtcars %>%
filter(row_number() %in% 1:(max(which(cumsum(mpg) < 100)) + 1))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
The code can be shortened with slice:
mtcars %>%
slice(1:(max(which(cumsum(mpg) < 100)) + 1))
And packaged as a function:
fnc = function(data, var, cutoff) {
data %>%
slice(1:(max(which(cumsum({{var}}) < cutoff)) + 1))
}
fnc(mtcars, mpg, 100)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
fnc(iris, Sepal.Width, 10)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa

R: Sort columns by object class

Can you sort a df based on object class? Say
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
sapply(mtcars,class)
and I want all numeric variables first and then all factors at the end? I want to be able to do this on a much larger dataset so I prefer solutions that do not rely on subsetting by column number. Cheers.
Maybe this one?
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
x <- mtcars[,names(sort(unlist(lapply(mtcars, class)), decreasing = T))]
head(x)
# mpg disp hp drat wt qsec gear carb cyl vs am
# Mazda RX4 21.0 160 110 3.90 2.620 16.46 4 4 6 0 1
# Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 4 4 6 0 1
# Datsun 710 22.8 108 93 3.85 2.320 18.61 4 1 4 1 1
# Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 3 1 6 1 0
# Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 3 2 8 0 0
# Valiant 18.1 225 105 2.76 3.460 20.22 3 1 6 1 0
In x, as you see, the columns cyl, vs and am that are of class factor are place at the end and those of class numeric first.

Resources