I want to set column names based on the number of columns.
For example,
#iris1 <- iris[,1:4]
if(ncol(iris)==4) colnames(iris) <- c("a","b","c","d")
if(ncol(iris)==5) colnames(iris) <- c("a","b","c","d","e")
I am looking for a way to do that using the dplyr pipeline. Something like this:
iris1 %>%
setNames(ifelse(ncol(.)==4,c("a","b","c","d"),c("a","b","c","d","e")))
UPDATE:
akrun's answer gave me this idea which works for me in this particular use-case.
cnames <- c("a","b","c","d","e")
iris1 %>% setNames(cnames[1:ncol(.)])
This solution cannot be generalised. Better solutions are welcome.
If this is based on a user input 'n', then we can use rename_at
library(dplyr)
n <- 4
iris %>%
rename_at(seq_len(n), ~ letters[seq_len(n)])
which can be wrapped into a function
rename_fn <- function(dat, n){
dat %>%
rename_at(seq_len(n), ~ letters[seq_len(n)])
}
rename_fn(iris, 4)
rename_fn(iris, 5)
If it is to change all the columns of the dataset, then an easier option is set_names
iris %>%
set_names(cnames[seq_len(ncol(.))])
Or in base R
setNames(iris, cnames[seq_len(ncol(iris))])
If you want to rename all the columns, you should probably use rename_all
library(dplyr)
iris1 %>% rename_all(~cnames[seq_along(.)]) %>% head
# a b c d
#1 5.1 3.5 1.4 0.2
#2 4.9 3.0 1.4 0.2
#3 4.7 3.2 1.3 0.2
#4 4.6 3.1 1.5 0.2
#5 5.0 3.6 1.4 0.2
#6 5.4 3.9 1.7 0.4
Related
I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.
Example data from iris:
library(dplyr)
iris <- as_tibble(iris)
I've created a function to mutate my new columns from the Petal.Width variable:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, varname = Petal.Width * n) ## problem arises here
df
}
Now I create a loop to build my columns:
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).
How can I get mutate() to use my dynamic name as variable name?
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}} in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr").
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).
So here, the answer is to use mutate_() rather than mutate() and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
In the new release of dplyr (0.6.0 awaiting in April 2017), we can also do an assignment (:=) and pass variables as column names by unquoting (!!) to not evaluate it
library(dplyr)
multipetalN <- function(df, n){
varname <- paste0("petal.", n)
df %>%
mutate(!!varname := Petal.Width * n)
}
data(iris)
iris1 <- tbl_df(iris)
iris2 <- tbl_df(iris)
for(i in 2:5) {
iris2 <- multipetalN(df=iris2, n=i)
}
Checking the output based on #MrFlick's multipetal applied on 'iris1'
identical(iris1, iris2)
#[1] TRUE
After a lot of trial and error, I found the pattern UQ(rlang::sym("some string here"))) really useful for working with strings and dplyr verbs. It seems to work in a lot of surprising situations.
Here's an example with mutate. We want to create a function that adds together two columns, where you pass the function both column names as strings. We can use this pattern, together with the assignment operator :=, to do this.
## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
mtcars %>%
mutate(UQ(rlang::sym(new_name)) := UQ(rlang::sym(name1)) + UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')
The pattern works with other dplyr functions as well. Here's filter:
## filter a column by a value
filter_values <- function(name, value){
mtcars %>%
filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)
Or arrange:
## transform a variable and then sort by it
arrange_values <- function(name, transform){
mtcars %>%
arrange(UQ(rlang::sym(name)) %>% UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')
For select, you don't need to use the pattern. Instead you can use !!:
## select a column
select_name <- function(name){
mtcars %>%
select(!!name)
}
select_name('mpg')
With rlang 0.4.0 we have curly-curly operators ({{}}) which makes this very easy. When a dynamic column name shows up on the left-hand side of an assignment, use :=.
library(dplyr)
library(rlang)
iris1 <- tbl_df(iris)
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, {{varname}} := Petal.Width * n)
}
multipetal(iris1, 4)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.8
# 2 4.9 3 1.4 0.2 setosa 0.8
# 3 4.7 3.2 1.3 0.2 setosa 0.8
# 4 4.6 3.1 1.5 0.2 setosa 0.8
# 5 5 3.6 1.4 0.2 setosa 0.8
# 6 5.4 3.9 1.7 0.4 setosa 1.6
# 7 4.6 3.4 1.4 0.3 setosa 1.2
# 8 5 3.4 1.5 0.2 setosa 0.8
# 9 4.4 2.9 1.4 0.2 setosa 0.8
#10 4.9 3.1 1.5 0.1 setosa 0.4
# … with 140 more rows
We can also pass quoted/unquoted variable names to be assigned as column names.
multipetal <- function(df, name, n) {
mutate(df, {{name}} := Petal.Width * n)
}
multipetal(iris1, temp, 3)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.6
# 2 4.9 3 1.4 0.2 setosa 0.6
# 3 4.7 3.2 1.3 0.2 setosa 0.6
# 4 4.6 3.1 1.5 0.2 setosa 0.6
# 5 5 3.6 1.4 0.2 setosa 0.6
# 6 5.4 3.9 1.7 0.4 setosa 1.2
# 7 4.6 3.4 1.4 0.3 setosa 0.900
# 8 5 3.4 1.5 0.2 setosa 0.6
# 9 4.4 2.9 1.4 0.2 setosa 0.6
#10 4.9 3.1 1.5 0.1 setosa 0.3
# … with 140 more rows
It works the same with
multipetal(iris1, "temp", 3)
Here's another version, and it's arguably a bit simpler.
multipetal <- function(df, n) {
varname <- paste("petal", n, sep=".")
df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1 5.1 3.5 1.4 0.2 setosa 0.4 0.6 0.8 1
2 4.9 3.0 1.4 0.2 setosa 0.4 0.6 0.8 1
3 4.7 3.2 1.3 0.2 setosa 0.4 0.6 0.8 1
4 4.6 3.1 1.5 0.2 setosa 0.4 0.6 0.8 1
5 5.0 3.6 1.4 0.2 setosa 0.4 0.6 0.8 1
6 5.4 3.9 1.7 0.4 setosa 0.8 1.2 1.6 2
You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.
You are creating strings that you wish mutate to treat as column names. So using friendlyeval you could write:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
Which under the hood calls rlang functions that check varname is legal as column name.
friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.
I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via #MrFlik 's answer and the R lazyeval vignettes.
I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.
Below is how I did this via SE mutate (mutate_()) and the .dots argument. Criticisms that make this better are welcome.
library(dplyr)
dat <- data.frame(a="leave alone",
dt="2015-08-03 00:00:00",
dt2="2015-01-20 00:00:00")
# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
for (col in dtnames) {
varval <- sprintf("as.Date(%s)", col)
df <- df %>% mutate_(.dots= setNames(list(varval), col))
}
return(df)
}
dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.
Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends #MrFlicks's solution.
multipetal <- function(df, n) {
for (i in 1:n){
varname <- paste("petal", i , sep=".")
df[[varname]] <- with(df, Petal.Width * i)
}
df
}
multipetal(iris, 3)
Another alternative: use {} inside quotation marks to easily create dynamic names. This is similar to other solutions but not exactly the same, and I find it easier.
library(dplyr)
library(tibble)
iris <- as_tibble(iris)
multipetal <- function(df, n) {
df <- mutate(df, "petal.{n}" := Petal.Width * n) ## problem arises here
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
iris
I think this comes from dplyr 1.0.0 but not sure (I also have rlang 4.7.0 if it matters).
If you need the same operation several times it usually tells you that your data format is not optimal. You want a longer format with n being a column in the data.frame that can be achieved by a cross join:
library(tidyverse)
iris %>% mutate(identifier = 1:n()) %>% #necessary to disambiguate row 102 from row 143 (complete duplicates)
full_join(tibble(n = 1:5), by=character()) %>% #cross join for long format
mutate(petal = Petal.Width * n) %>% #calculation in long format
pivot_wider(names_from=n, values_from=petal, names_prefix="petal.width.") #back to wider format (if desired)
Result:
# A tibble: 150 x 11
Sepal.Length Sepal.Width Petal.Length Petal.Width Species identifier petal.width.1 petal.width.2 petal.width.3
<dbl> <dbl> <dbl> <dbl> <fct> <int> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 1 0.2 0.4 0.6
2 4.9 3 1.4 0.2 setosa 2 0.2 0.4 0.6
3 4.7 3.2 1.3 0.2 setosa 3 0.2 0.4 0.6
4 4.6 3.1 1.5 0.2 setosa 4 0.2 0.4 0.6
5 5 3.6 1.4 0.2 setosa 5 0.2 0.4 0.6
6 5.4 3.9 1.7 0.4 setosa 6 0.4 0.8 1.2
7 4.6 3.4 1.4 0.3 setosa 7 0.3 0.6 0.9
8 5 3.4 1.5 0.2 setosa 8 0.2 0.4 0.6
9 4.4 2.9 1.4 0.2 setosa 9 0.2 0.4 0.6
10 4.9 3.1 1.5 0.1 setosa 10 0.1 0.2 0.3
# ... with 140 more rows, and 2 more variables: petal.width.4 <dbl>, petal.width.5 <dbl>
I have this big dataframe, with species in rows and samples in columns. There are 30 samples, with 12 replicates each. The column names are written as such : sample.S1.01; sample.S1.02.....sample.S30.11; sample.S30.12.
I would like to create 30 new tables containing the 12 replicates for each samples.
I have this command line that works perfectly for one sample at a time :
dt<- tab_sp_sum %>%
select(starts_with("sample.S1."))
assign(paste("tab_sp_1"), dt)
But when I put this in a for loop, it doesn't work anymore.
I think it's due to the fact that the variable i is included in the starts_with quotation, and I don't know how to write it.
for (i in 1:30){
dt<- tab_sp_sum %>%
select(starts_with("sample.S",i,".", sep=""))
assign(paste("tab_sp",i,sep="_"), dt)
although the last line works well, 30 tables are created with the right names, but they are empty.
Any suggestion ?
Thank you
Instead of using assign and store it in different objects try to use list . Create the names that you want to select using paste0 and then use map to create list of dataframes.
library(dplyr)
library(purrr)
df_names <- paste0("sample.S", 1:30, ".")
df1 <- map(df_names, ~tab_sp_sum %>% select(starts_with(.x)))
You can then use df1[[1]], df1[[2]] to access individual dataframes.
In base R, we can use lapply by creating a regex to select columns that starts with df_names
df1 <- lapply(df_names, function(x)
tab_sp_sum[grep(paste0("^", x), names(tab_sp_sum))])
Using it with built-in iris dataset
df_names <- c("Sepal", "Petal")
df1 <- map(df_names, ~iris %>% select(starts_with(.x)))
head(df1[[1]])
# Sepal.Length Sepal.Width
#1 5.1 3.5
#2 4.9 3.0
#3 4.7 3.2
#4 4.6 3.1
#5 5.0 3.6
#6 5.4 3.9
head(df1[[2]])
# Petal.Length Petal.Width
#1 1.4 0.2
#2 1.4 0.2
#3 1.3 0.2
#4 1.5 0.2
#5 1.4 0.2
#6 1.7 0.4
We can use split in base R
nm1 <- paste(c("Sepal", "Petal"), collapse="|")
nm2 <- grep(nm1, names(iris), value = TRUE)
out <- split.default(iris[nm2], sub("\\..*", "", nm2))
head(out[[1]])
# Petal.Length Petal.Width
#1 1.4 0.2
#2 1.4 0.2
#3 1.3 0.2
#4 1.5 0.2
#5 1.4 0.2
#6 1.7 0.4
head(out[[2]])
# Sepal.Length Sepal.Width
#1 5.1 3.5
#2 4.9 3.0
#3 4.7 3.2
#4 4.6 3.1
#5 5.0 3.6
#6 5.4 3.9
Or in tidyverse
iris %>%
select(nm2) %>%
split.default(str_remove(nm2, "\\..*"))
I have a problem that I can replicate using the iris dataset, where many groups (same prefix in name) of variables with two different suffixes. I want to be take a ratio for all these groups but can't find a tidyverse solution.. I would have through mutate_at() might have been able to help.
In the iris dataset you could consider for Petal columns I want to generate a Petal proportion of Length / Width. Similarly I want to do this for Sepal. I don't want to manually do this in a mutate() because I have lots of variable groups, and this could change over time.
I do have a solution that works using base R (in the code below) but I wanted to know if there was a tidyverse solution that achieved the same.
# libs ----
library(tidyverse)
# data ----
df <- iris
glimpse(df)
# set up column vectors ----
length_cols <- names(df) %>% str_subset("Length") %>% sort()
width_cols <- names(df) %>% str_subset("Width") %>% sort()
new_col_names <- names(df) %>% str_subset("Length") %>% str_replace(".Length", ".Ratio") %>% sort()
length_cols
width_cols
new_col_names
# make new cols ----
df[, new_col_names] <- df[, length_cols] / df[, width_cols]
df %>% head()
Thanks,
Gareth
Here is one possibility using purrr::map:
library(tidyverse);
df <- map(c("Petal", "Sepal"), ~ iris %>%
mutate(
!!paste0(.x, ".Ratio") := !!as.name(paste0(.x, ".Length")) / !!as.name(paste0(.x, ".Width")) )) %>%
reduce(left_join);
head(df);
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Ratio
#1 5.1 3.5 1.4 0.2 setosa 7.00
#2 4.9 3.0 1.4 0.2 setosa 7.00
#3 4.7 3.2 1.3 0.2 setosa 6.50
#4 4.6 3.1 1.5 0.2 setosa 7.50
#5 5.0 3.6 1.4 0.2 setosa 7.00
#6 5.4 3.9 1.7 0.4 setosa 4.25
# Sepal.Ratio
#1 1.457143
#2 1.633333
#3 1.468750
#4 1.483871
#5 1.388889
#6 1.384615
Explanation: We map the prefixes "Petal" and "Sepal" to iris by extracting for each prefix the columns with suffixes "Length" and "Width", and calculate a new corresponding prefix + ".Ratio" column; reduce merges both data.frames.
I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.
Example data from iris:
library(dplyr)
iris <- as_tibble(iris)
I've created a function to mutate my new columns from the Petal.Width variable:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, varname = Petal.Width * n) ## problem arises here
df
}
Now I create a loop to build my columns:
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).
How can I get mutate() to use my dynamic name as variable name?
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}} in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr").
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).
So here, the answer is to use mutate_() rather than mutate() and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
In the new release of dplyr (0.6.0 awaiting in April 2017), we can also do an assignment (:=) and pass variables as column names by unquoting (!!) to not evaluate it
library(dplyr)
multipetalN <- function(df, n){
varname <- paste0("petal.", n)
df %>%
mutate(!!varname := Petal.Width * n)
}
data(iris)
iris1 <- tbl_df(iris)
iris2 <- tbl_df(iris)
for(i in 2:5) {
iris2 <- multipetalN(df=iris2, n=i)
}
Checking the output based on #MrFlick's multipetal applied on 'iris1'
identical(iris1, iris2)
#[1] TRUE
After a lot of trial and error, I found the pattern UQ(rlang::sym("some string here"))) really useful for working with strings and dplyr verbs. It seems to work in a lot of surprising situations.
Here's an example with mutate. We want to create a function that adds together two columns, where you pass the function both column names as strings. We can use this pattern, together with the assignment operator :=, to do this.
## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
mtcars %>%
mutate(UQ(rlang::sym(new_name)) := UQ(rlang::sym(name1)) + UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')
The pattern works with other dplyr functions as well. Here's filter:
## filter a column by a value
filter_values <- function(name, value){
mtcars %>%
filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)
Or arrange:
## transform a variable and then sort by it
arrange_values <- function(name, transform){
mtcars %>%
arrange(UQ(rlang::sym(name)) %>% UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')
For select, you don't need to use the pattern. Instead you can use !!:
## select a column
select_name <- function(name){
mtcars %>%
select(!!name)
}
select_name('mpg')
With rlang 0.4.0 we have curly-curly operators ({{}}) which makes this very easy. When a dynamic column name shows up on the left-hand side of an assignment, use :=.
library(dplyr)
library(rlang)
iris1 <- tbl_df(iris)
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, {{varname}} := Petal.Width * n)
}
multipetal(iris1, 4)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.8
# 2 4.9 3 1.4 0.2 setosa 0.8
# 3 4.7 3.2 1.3 0.2 setosa 0.8
# 4 4.6 3.1 1.5 0.2 setosa 0.8
# 5 5 3.6 1.4 0.2 setosa 0.8
# 6 5.4 3.9 1.7 0.4 setosa 1.6
# 7 4.6 3.4 1.4 0.3 setosa 1.2
# 8 5 3.4 1.5 0.2 setosa 0.8
# 9 4.4 2.9 1.4 0.2 setosa 0.8
#10 4.9 3.1 1.5 0.1 setosa 0.4
# … with 140 more rows
We can also pass quoted/unquoted variable names to be assigned as column names.
multipetal <- function(df, name, n) {
mutate(df, {{name}} := Petal.Width * n)
}
multipetal(iris1, temp, 3)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.6
# 2 4.9 3 1.4 0.2 setosa 0.6
# 3 4.7 3.2 1.3 0.2 setosa 0.6
# 4 4.6 3.1 1.5 0.2 setosa 0.6
# 5 5 3.6 1.4 0.2 setosa 0.6
# 6 5.4 3.9 1.7 0.4 setosa 1.2
# 7 4.6 3.4 1.4 0.3 setosa 0.900
# 8 5 3.4 1.5 0.2 setosa 0.6
# 9 4.4 2.9 1.4 0.2 setosa 0.6
#10 4.9 3.1 1.5 0.1 setosa 0.3
# … with 140 more rows
It works the same with
multipetal(iris1, "temp", 3)
Here's another version, and it's arguably a bit simpler.
multipetal <- function(df, n) {
varname <- paste("petal", n, sep=".")
df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1 5.1 3.5 1.4 0.2 setosa 0.4 0.6 0.8 1
2 4.9 3.0 1.4 0.2 setosa 0.4 0.6 0.8 1
3 4.7 3.2 1.3 0.2 setosa 0.4 0.6 0.8 1
4 4.6 3.1 1.5 0.2 setosa 0.4 0.6 0.8 1
5 5.0 3.6 1.4 0.2 setosa 0.4 0.6 0.8 1
6 5.4 3.9 1.7 0.4 setosa 0.8 1.2 1.6 2
You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.
You are creating strings that you wish mutate to treat as column names. So using friendlyeval you could write:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
Which under the hood calls rlang functions that check varname is legal as column name.
friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.
I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via #MrFlik 's answer and the R lazyeval vignettes.
I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.
Below is how I did this via SE mutate (mutate_()) and the .dots argument. Criticisms that make this better are welcome.
library(dplyr)
dat <- data.frame(a="leave alone",
dt="2015-08-03 00:00:00",
dt2="2015-01-20 00:00:00")
# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
for (col in dtnames) {
varval <- sprintf("as.Date(%s)", col)
df <- df %>% mutate_(.dots= setNames(list(varval), col))
}
return(df)
}
dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.
Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends #MrFlicks's solution.
multipetal <- function(df, n) {
for (i in 1:n){
varname <- paste("petal", i , sep=".")
df[[varname]] <- with(df, Petal.Width * i)
}
df
}
multipetal(iris, 3)
Another alternative: use {} inside quotation marks to easily create dynamic names. This is similar to other solutions but not exactly the same, and I find it easier.
library(dplyr)
library(tibble)
iris <- as_tibble(iris)
multipetal <- function(df, n) {
df <- mutate(df, "petal.{n}" := Petal.Width * n) ## problem arises here
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
iris
I think this comes from dplyr 1.0.0 but not sure (I also have rlang 4.7.0 if it matters).
If you need the same operation several times it usually tells you that your data format is not optimal. You want a longer format with n being a column in the data.frame that can be achieved by a cross join:
library(tidyverse)
iris %>% mutate(identifier = 1:n()) %>% #necessary to disambiguate row 102 from row 143 (complete duplicates)
full_join(tibble(n = 1:5), by=character()) %>% #cross join for long format
mutate(petal = Petal.Width * n) %>% #calculation in long format
pivot_wider(names_from=n, values_from=petal, names_prefix="petal.width.") #back to wider format (if desired)
Result:
# A tibble: 150 x 11
Sepal.Length Sepal.Width Petal.Length Petal.Width Species identifier petal.width.1 petal.width.2 petal.width.3
<dbl> <dbl> <dbl> <dbl> <fct> <int> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 1 0.2 0.4 0.6
2 4.9 3 1.4 0.2 setosa 2 0.2 0.4 0.6
3 4.7 3.2 1.3 0.2 setosa 3 0.2 0.4 0.6
4 4.6 3.1 1.5 0.2 setosa 4 0.2 0.4 0.6
5 5 3.6 1.4 0.2 setosa 5 0.2 0.4 0.6
6 5.4 3.9 1.7 0.4 setosa 6 0.4 0.8 1.2
7 4.6 3.4 1.4 0.3 setosa 7 0.3 0.6 0.9
8 5 3.4 1.5 0.2 setosa 8 0.2 0.4 0.6
9 4.4 2.9 1.4 0.2 setosa 9 0.2 0.4 0.6
10 4.9 3.1 1.5 0.1 setosa 10 0.1 0.2 0.3
# ... with 140 more rows, and 2 more variables: petal.width.4 <dbl>, petal.width.5 <dbl>
I want to do something like this
df <- iris %>%
rowwise %>%
mutate(new_var = sum(Sepal.Length, Sepal.Width))
Except I want to do it without typing the variable names, e.g.
names_to_add <- c("Sepal.Length", "Sepal.Width")
df <- iris %>%
rowwise %>%
[some function that uses names_to_add]
I attempted a few things e.g.
df <- iris %>%
rowwise %>%
mutate(new_var = sum(sapply(names_to_add, get, envir = as.environment(.))))
but still can't figure it out. I'll take an answer that plays around with lazyeval or something that's simpler. Note that the sum function here is just a placeholder and my actual function is much more complex, although it returns one value per row. I'd also rather not use data.table
You should check out all the functions that end with _ in dplyr. Example mutate_, summarise_ etc.
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df <- iris %>%
rowwise %>% mutate_(names_to_add)
Edit
The results of the code:
df <- iris %>%
rowwise %>% mutate(new_var = sum(Sepal.Length, Sepal.Width))
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df2 <- iris %>%
rowwise %>% mutate_(new_var = names_to_add)
identical(df, df2)
[1] TRUE
Edit
I edited the answer and it solves the problem. I wonder why it was donwvoted. We use SE (standard evaluation), passing a string as an input inside 'mutate_'. More info: vignette("nse","dplyr")
x <- "Sepal.Length + Sepal.Width"
df <- mutate_(iris, x)
head(df)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length + Sepal.Width
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3