Say I wanted to rename a column based on the condition that the contents of the column contain a specific value.
For example, if iris$Species contains "virginica", rename to Species to flower.name, else keep the name as Species.
This code works:
library(dplyr)
iris <- if("virginica" %in% iris$Species){
rename(iris, flower.name = Species)
}
iris %>% names
but I was hoping their was a more elegant dplyr way of doing this with one of the existing functions, such as rename_if()?
One option could be:
iris %>%
rename_with(~ "Flower.Name",
.cols = Species & where(~ any(. %in% "virginica")))
Sepal.Length Sepal.Width Petal.Length Petal.Width Flower.Name
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
With rename_if
library(dplyr)
iris1 <- iris %>%
rename_if(~ is.factor(.) && "virginica" %in% ., ~ 'flower.name')
names(iris1)
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "flower.name"
Related
Is there any way to supply a vector to {glue} to dynamically choose which columns get "glued"? desired here is what would I am hoping to see but I just want to be able to provide vars to a glue statement.
library(glue)
library(dplyr)
vars <- c("Sepal.Length", "Species")
iris %>%
head() %>% ## just for less data
# mutate(glue_string = glue_data("{vars}")) %>%
mutate(desired = glue("{Sepal.Length}{Species}"))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species desired
#> 1 5.1 3.5 1.4 0.2 setosa 5.1setosa
#> 2 4.9 3.0 1.4 0.2 setosa 4.9setosa
#> 3 4.7 3.2 1.3 0.2 setosa 4.7setosa
#> 4 4.6 3.1 1.5 0.2 setosa 4.6setosa
#> 5 5.0 3.6 1.4 0.2 setosa 5setosa
#> 6 5.4 3.9 1.7 0.4 setosa 5.4setosa
We may either use .data to extract the column from each element of 'vars' and glue it
library(dplyr)
library(glue)
iris %>%
head() %>% ## just for less data
mutate(desired = glue("{.data[[vars[1]]]}{.data[[vars[2]]]}"))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species desired
1 5.1 3.5 1.4 0.2 setosa 5.1setosa
2 4.9 3.0 1.4 0.2 setosa 4.9setosa
3 4.7 3.2 1.3 0.2 setosa 4.7setosa
4 4.6 3.1 1.5 0.2 setosa 4.6setosa
5 5.0 3.6 1.4 0.2 setosa 5setosa
6 5.4 3.9 1.7 0.4 setosa 5.4setosa
Or loop across all_of the elements in 'vars' to subset the data, invoke str_c to paste the columns by row
library(stringr)
library(purrr)
iris %>%
head() %>% ## just for less data
mutate(desired = invoke(str_c, across(all_of(vars))))
I want to add a column to a dataframe containing its own name as a string. (This is for inclusion in a function that will bind several of them together...)
Based on this old SO post and my understanding of magrittr pipes I thought this would work:
data(iris)
iris %>%
mutate(df = deparse(substitute(.))
But that just adds a column called "df" populated with full stops! The desired output is the string "iris" in every row of that df column. Can anyone set me right?
If we have a function, it can be done
library(dplyr)
fun1 <- function(data) {
datanm <- deparse(substitute(data))
data %>%
mutate(df = datanm)
}
-testing
> fun1(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species df
1 5.1 3.5 1.4 0.2 setosa iris
2 4.9 3.0 1.4 0.2 setosa iris
3 4.7 3.2 1.3 0.2 setosa iris
4 4.6 3.1 1.5 0.2 setosa iris
5 5.0 3.6 1.4 0.2 setosa iris
6 5.4 3.9 1.7 0.4 setosa iris
7 4.6 3.4 1.4 0.3 setosa iris
...
Maybe this is also helpful:
library(dplyr)
bind_rows(list(iris = iris), .id = 'df')
df Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 iris 5.1 3.5 1.4 0.2 setosa
2 iris 4.9 3.0 1.4 0.2 setosa
3 iris 4.7 3.2 1.3 0.2 setosa
4 iris 4.6 3.1 1.5 0.2 setosa
5 iris 5.0 3.6 1.4 0.2 setosa
6 iris 5.4 3.9 1.7 0.4 setosa
7 iris 4.6 3.4 1.4 0.3 setosa
8 iris 5.0 3.4 1.5 0.2 setosa
9 iris 4.4 2.9 1.4 0.2 setosa
....
I have the following code:
library(dplyr)
df <- data %>%
left_join(., panel_info, by = "PANID") %>%
left_join(., prod_0106, by = "UPC") %>%
left_join(., prod_0106sz, by = "UPC") %>%
left_join(., trips, by = "PANID") %>%
mutate(colnames(.) = gsub(" ", "", colnames(.)))
Everything works except the last line. The df data frame has not been created previously. So using the pipe function I am trying to join all the data together and finally remove all the blank spaces in the column names of the joined together data.
However, the following error occurs;
Error in mutate_impl(.data, dots) :
Column `gsub(" ", "", colnames(.))` must be length 20056 (the number of rows) or one, not 106
Which I assume is due to the (.) in the mutate() part of the code. Just want to see where I am going wrong here.
You can also set colnames in a dplyr pipe by piping into `colnames<-()` which is the generic form of the function called when you do colnames(df) <- c('a', 'b', 'c'):
iris %>%
`colnames<-`(gsub('Length', 'LENGTH', names(.))) %>%
head
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
As from the comments, there are a number of options. A couple I think fit in well with chaining:
library(dplyr)
> iris %>% rename_all(~gsub('Length', 'LENGTH', .x)) %>% head()
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> iris %>% setNames(gsub('Length', 'LENGTH', names(.))) %>% head()
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
I am trying to use the mutate function to update the values in the first column and achieve the output from the below query -
iris %>% head %>% mutate(Sepal.Length = Sepal.Length + Sepal.Width)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 8.6 3.5 1.4 0.2 setosa
# 7.9 3.0 1.4 0.2 setosa
# 7.9 3.2 1.3 0.2 setosa
# 7.7 3.1 1.5 0.2 setosa
# 8.6 3.6 1.4 0.2 setosa
# 9.3 3.9 1.7 0.4 setosa
But by using column index instead of the column name.
iris %>% head %>% mutate ( .[[1]] <- .[[1]]+.[[2]])
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species .[[1]] <- .[[1]] + .[[2]]
# 5.1 3.5 1.4 0.2 setosa 8.6
# 4.9 3.0 1.4 0.2 setosa 7.9
# 4.7 3.2 1.3 0.2 setosa 7.9
# 4.6 3.1 1.5 0.2 setosa 7.7
# 5.0 3.6 1.4 0.2 setosa 8.6
# 5.4 3.9 1.7 0.4 setosa 9.3
But this creates a new column called .[[1]] <- .[[1]]+.[[2]]) instead of updating the first column.
Using dplyr, you can do something like this:
iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?
Here I have the following, but I feel it's not as elegant.
iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
You can try:
iris %>% head %>% mutate(sum = .[[1]] + .[[2]])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.
sum_cols <- function(x, col1, col2){
x[[col1]] + x[[col2]]
}
iris %>%
head %>%
mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))
An alternative to reusing . in mutate that will respect grouping is to use dplyr::cur_data_all(). From help(cur_data_all)
cur_data_all() gives the current data for the current group (including grouping variables)
Consider the following:
iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]]) %>% head
#Error: Problem with `mutate()` column `sum`.
#ℹ `sum = .[[1]] + .[[2]]`.
#ℹ `sum` must be size 50 or 1, not 150.
#ℹ The error occurred in group 1: Species = setosa.
If instead you use cur_data_all(), it works without issue:
iris %>% mutate(sum = select(cur_data_all(),1) + select(cur_data_all(),2)) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 8.6
#2 4.9 3.0 1.4 0.2 setosa 7.9
#3 4.7 3.2 1.3 0.2 setosa 7.9
#4 4.6 3.1 1.5 0.2 setosa 7.7
#5 5.0 3.6 1.4 0.2 setosa 8.6
#6 5.4 3.9 1.7 0.4 setosa 9.3
The same approach works with the extract operator ([[).
iris %>% mutate(sum = cur_data()[[1]] + cur_data()[[2]]) %>% head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#1 5.1 3.5 1.4 0.2 setosa 8.6
#2 4.9 3.0 1.4 0.2 setosa 7.9
#3 4.7 3.2 1.3 0.2 setosa 7.9
#4 4.6 3.1 1.5 0.2 setosa 7.7
#5 5.0 3.6 1.4 0.2 setosa 8.6
#6 5.4 3.9 1.7 0.4 setosa 9.3
What do you think about this version?
Inspired by #SavedByJesus's answer.
applySum <- function(df, ...) {
assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}
iris %>%
head(2) %>%
applySum(1, 2)
#
### output
#
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
head(2) %>%
applySum(1, 2, 3, 4)
#
### output
#
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 9.5
To address the issue that #pluke is asking about in the comments, dplyr doesn't really support column index.
Not a perfect solution, but you can use base R to get around this
iris[1] <- iris[1] + iris[2]
This can now (packageVersion("dplyr") >= 1.0.0) be done very nicely with the combination of dplyr::rowwise() and dplyr::c_across().
library(dplyr)
packageVersion("dplyr")
#> [1] '1.0.10'
iris %>%
head %>%
rowwise() %>%
mutate(sum = sum(c_across(c(1, 2))))
#> # A tibble: 6 × 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 8.6
#> 2 4.9 3 1.4 0.2 setosa 7.9
#> 3 4.7 3.2 1.3 0.2 setosa 7.9
#> 4 4.6 3.1 1.5 0.2 setosa 7.7
#> 5 5 3.6 1.4 0.2 setosa 8.6
#> 6 5.4 3.9 1.7 0.4 setosa 9.3
Created on 2022-11-01 with reprex v2.0.2