Adding row in dplyr across a selected number of columns - r

While within dplyr workflow I would like to append a row across a selected number of columns.
Desired results
Starting with the mtcarsdata and applying function(s) with the goal of adding string "A" to columns 2:5 the one should arrive at the following results:
mpg cyl disp hp drat wt qsec vs am gear carb
NA A A A A NA NA NA NA NA NA
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
The following criteria were met:
For the columns with available index in vars() call the "A" string was added
For the remaining columns the NA value was provided
Approach
require(dplyr)
mtcars %>%
mutate_at(.cols = vars(2:5),
.funs = add_row(. = "A", .before = 1))
Naturally, this results in an error message:
Error: Unsupported index type: NULL
Hence my question: how can I utilise add_row, or a similar approach, to force value across a set of columns initially passed via vars()?
Side notes
I don't mind doing this via rbind but I would like to keep my %>% workflow:
%>% - receive object
Add something across first row to columns x:y %>%
Add something across first row to columns m:n %>%
Other manipulations

Add the row then update:
mtcars %>%
head %>%
add_row(.before = 1) %>%
mutate_at(.cols = vars(2:5),
funs(ifelse(is.na(.), "A", .)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NA A A A A NA NA NA NA NA NA
# 2 21.0 6 160 110 3.9 2.620 16.46 0 1 4 4
# 3 21.0 6 160 110 3.9 2.875 17.02 0 1 4 4
# 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 7 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Note: This will add "A" to any row that has NAs.

Related

Creating a column out of quartiles from another column

I want to make a new column out of quartiles from another column
df <- within(df, quartile <- as.integer(cut(df$x, quantile(df$x, probs=0:4/4), include.lowest=TRUE)))
When I use this code I get
Error in `[<-.data.frame`(`*tmp*`, del, value = NULL) : missing values are not allowed in subscripted assignments of data frames
df$x is numeric and doesn't have any NAs.
> sum(is.na(df$x))
[1] 0
Although in my df there are some NAs in some columns, I do not use those columns in the function mentioned above. I can't figure out where the problem is.
This function actually works.
df$Quartile <-cut(df$x,quantile(df$x),include.lowest=TRUE,labels=FALSE)
But the assigned variable df$Quartile doesn't appear in the dataframe.
I would just use dplyr::mutate to add a new column.
library(dplyr)
df = df %>%
mutate(q_col = cut(x, quantile(x), include.lowest=TRUE, labels=FALSE))
mpg cyl disp hp drat wt qsec vs am gear carb q_col
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 1

Curly curly - How to access the variable name [duplicate]

This question already has answers here:
In R, how to get an object's name after it is sent to a function?
(4 answers)
Closed 1 year ago.
I am trying to create a function which summarises a grouped dataset and then adds a column to identify which variable is being summarised (ID column).
I am not sure how to add the ID column using the curly curly appraoch.
my_fun <- function(dat, var_name){
dat %>%
mutate(id_column = names({{var_name}}))
}
my_fun(mtcars, cyl)
What I want is for the variable name, in this case cyl, to be recycled.
Just, deparse/subsitute at the start
my_fun <- function(dat, var_name){
nm1 <- deparse(substitute(var_name))
dat %>%
mutate(id_column = nm1)
}
-testing
my_fun(mtcars, cyl)
mpg cyl disp hp drat wt qsec vs am gear carb id_column
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 cyl
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 cyl
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 cyl
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 cyl
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 cyl
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 cyl
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 cyl
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 cyl
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 cyl
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 cyl
...
In the tidyverse, it may also be done directly from a symbol i.e. use ensym to convert to symbol and then evaluate (!!) to get the value or convert to string with as_string
my_fun <- function(dat, var_name){
var_name <- rlang::ensym(var_name)
dat %>%
mutate(id_column = rlang::as_string(var_name), val_column = !! var_name)
}
-testing
my_fun(head(mtcars), cyl)
mpg cyl disp hp drat wt qsec vs am gear carb id_column val_column
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 cyl 6
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 cyl 6
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 cyl 4
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 cyl 6
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 cyl 8
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 cyl 6

Vectorized function for dplyr::mutate()

I am trying to write a vectorized function to compare a string to a vector of regular expressions and return TRUE if the string matches any of the regular expressions. The function should work on a single string, a vector of strings, and within dplyr::mutate. I'm having trouble vectorizing the function. In the below reprex, the function I would like to vectorize is is_favorite. I came up with work-arounds using base::Vectorize and purrr::map but these quite accomplish the goal. I also tried using stringi::stri_detect_regex instead of stringr::str_detect which produced the same results.
Thank you so much for your help!
library(dplyr)
library(tibble)
library(stringr)
favorite_cars <- c("^Merc", "Firebird$")
mtcars <-
mtcars %>%
tibble::rownames_to_column("car")
is_favorite <- function(x) {
any(stringr::str_detect(x, favorite_cars))
}
# Success: Works on single string
is_favorite("Merc 240D")
#> [1] TRUE
is_favorite("Ferrari Dino")
#> [1] FALSE
# Failure: On vector, returns a single boolean instead of a vector of booleans
is_favorite(mtcars$car)
#> [1] TRUE
# Failure: Add column with single boolean instead of vectorized rowwise
mtcars %>%
mutate(fav_car = is_favorite(car)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 TRUE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
# Success...but not ideal
# https://community.rstudio.com/t/vectorise-text-function-for-dplyr-mutate/53387/2
# https://deanattali.com/blog/mutate-non-vectorized/
# https://www.jimhester.com/post/2018-04-12-vectorize/
v_is_favorite <- Vectorize(is_favorite)
mtcars %>%
mutate(fav_car = v_is_favorite(car)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 FALSE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 FALSE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 FALSE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 FALSE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 FALSE
# Success...better but still not ideal
# https://www.jimhester.com/post/2018-04-12-vectorize/
# http://yoshidk6.hatenablog.com/entry/2018/09/05/222248
mtcars %>%
mutate(fav_car = purrr::map_lgl(car, is_favorite)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 FALSE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 FALSE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 FALSE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 FALSE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 FALSE
Created on 2020-08-28 by the reprex package (v0.3.0)
I also consulted the following resources:
https://adv-r.hadley.nz/perf-improve.html#vectorise
https://deanattali.com/blog/mutate-non-vectorized/
https://www.jimhester.com/post/2018-04-12-vectorize/
https://community.rstudio.com/t/vectorise-text-function-for-dplyr-mutate/53387/2
http://yoshidk6.hatenablog.com/entry/2018/09/05/222248
https://dplyr.tidyverse.org/articles/programming.html
any would always return only one logical value as output. You should collapse your favorite_cars regex as length 1 string.
is_favorite <- function(x) {
stringr::str_detect(x, paste0(favorite_cars, collapse = "|"))
#Will also work with base R grepl
#grepl(paste0(favorite_cars, collapse = "|"), x)
}
and then use :
library(dplyr)
mtcars %>% mutate(fav_car = is_favorite(car))
# car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#1 Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 FALSE
#2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 FALSE
#3 Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 FALSE
#4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 FALSE
#5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 FALSE
#6 Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 FALSE
#7 Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 FALSE
#8 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 TRUE
#9 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 TRUE
#10 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 TRUE
#11 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 TRUE
#...
#...
where the pattern that we are looking for becomes
paste0(favorite_cars, collapse = "|")
#[1] "^Merc|Firebird$"

Using dplyr, how should I create a column of strings repeating a character based on the value of another column?

With mtcars for example, I'd like to create a new column carb_dots such that when carb = 4, carb_dots = "...."
Using dplyr, I've tried
library(dplyr)
mtcars2 <- mtcars %>% mutate(carb_dots = rep(".", carb))
This errors with
Error in mutate_impl(.data, dots) :
Evaluation error: invalid 'times' argument.
What should I do? Thanks for your suggestions.
With the addition of stringr, you can do:
mtcars %>%
mutate(carb_dots = str_dup(".", carb))
mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ..
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ..
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ....
We can use strrep
library(dplyr)
mtcars %>%
mutate(carb_dots = strrep(".", carb))
# mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
#...
If we need to use rep
mtcars %>%
rowwise %>%
mutate(carb_dots = paste(rep(".", carb), collapse=""))

How to have NA's displayed first using arrange()

Sample data:
temp = data.frame(col = list(NA, 1, 2, 3) )
Using arrange:
temp %>%
arrange(col)
gives
col
1 1
2 2
3 3
4 NA
and
temp %>%
arrange(desc(col))
gives
col
1 3
2 2
3 1
4 NA
I would like
col
1 NA
2 3
3 2
4 1
that is, to put NAs first. Does anyone know how to do this?
You could also do:
m %>%
arrange(!is.na(wt), wt) ##Spacedman's dataset
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.7 8 360.0 175 3.15 NA 17.02 0 0 3 2
#2 24.4 4 146.7 62 3.69 NA 20.00 1 0 4 2
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#5 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#6 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#7 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#8 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#9 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#10 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Write a function that sorts a data frame and then pass the handy na.last=FALSE option to order. My original version can be found in the edit history, David Arenburg improved it to this:
> sortNA=function(d,n,...){d[order(d[[deparse(substitute(n))]],...),]}
Then use like this
> m=mtcars[1:10,]
> m$wt[5]=NA
> m$wt[8]=NA
> m %.% sortNA(wt, na.last=FALSE)
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 NA 17.02 0 0 3 2
Merc 240D 24.4 4 146.7 62 3.69 NA 20.00 1 0 4 2
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Add decreasing=TRUE to sort in the opposite order.
You might also consider posting an issue to the dplyr github issue tracker to suggest a new option to the arrange function to do this.
The order function in base R has an na.last argument:
> temp=data.frame(col=c(NA,1,2,3))
> temp[order(temp[,"col"],na.last=F),]
[1] NA 1 2 3

Resources