Extract name of dataframe as string in R tidyverse - r

I want to add a column to a dataframe containing its own name as a string. (This is for inclusion in a function that will bind several of them together...)
Based on this old SO post and my understanding of magrittr pipes I thought this would work:
data(iris)
iris %>%
mutate(df = deparse(substitute(.))
But that just adds a column called "df" populated with full stops! The desired output is the string "iris" in every row of that df column. Can anyone set me right?

If we have a function, it can be done
library(dplyr)
fun1 <- function(data) {
datanm <- deparse(substitute(data))
data %>%
mutate(df = datanm)
}
-testing
> fun1(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species df
1 5.1 3.5 1.4 0.2 setosa iris
2 4.9 3.0 1.4 0.2 setosa iris
3 4.7 3.2 1.3 0.2 setosa iris
4 4.6 3.1 1.5 0.2 setosa iris
5 5.0 3.6 1.4 0.2 setosa iris
6 5.4 3.9 1.7 0.4 setosa iris
7 4.6 3.4 1.4 0.3 setosa iris
...

Maybe this is also helpful:
library(dplyr)
bind_rows(list(iris = iris), .id = 'df')
df Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 iris 5.1 3.5 1.4 0.2 setosa
2 iris 4.9 3.0 1.4 0.2 setosa
3 iris 4.7 3.2 1.3 0.2 setosa
4 iris 4.6 3.1 1.5 0.2 setosa
5 iris 5.0 3.6 1.4 0.2 setosa
6 iris 5.4 3.9 1.7 0.4 setosa
7 iris 4.6 3.4 1.4 0.3 setosa
8 iris 5.0 3.4 1.5 0.2 setosa
9 iris 4.4 2.9 1.4 0.2 setosa
....

Related

Dynamically "gluing" with {glue}

Is there any way to supply a vector to {glue} to dynamically choose which columns get "glued"? desired here is what would I am hoping to see but I just want to be able to provide vars to a glue statement.
library(glue)
library(dplyr)
vars <- c("Sepal.Length", "Species")
iris %>%
head() %>% ## just for less data
# mutate(glue_string = glue_data("{vars}")) %>%
mutate(desired = glue("{Sepal.Length}{Species}"))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species desired
#> 1 5.1 3.5 1.4 0.2 setosa 5.1setosa
#> 2 4.9 3.0 1.4 0.2 setosa 4.9setosa
#> 3 4.7 3.2 1.3 0.2 setosa 4.7setosa
#> 4 4.6 3.1 1.5 0.2 setosa 4.6setosa
#> 5 5.0 3.6 1.4 0.2 setosa 5setosa
#> 6 5.4 3.9 1.7 0.4 setosa 5.4setosa
We may either use .data to extract the column from each element of 'vars' and glue it
library(dplyr)
library(glue)
iris %>%
head() %>% ## just for less data
mutate(desired = glue("{.data[[vars[1]]]}{.data[[vars[2]]]}"))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species desired
1 5.1 3.5 1.4 0.2 setosa 5.1setosa
2 4.9 3.0 1.4 0.2 setosa 4.9setosa
3 4.7 3.2 1.3 0.2 setosa 4.7setosa
4 4.6 3.1 1.5 0.2 setosa 4.6setosa
5 5.0 3.6 1.4 0.2 setosa 5setosa
6 5.4 3.9 1.7 0.4 setosa 5.4setosa
Or loop across all_of the elements in 'vars' to subset the data, invoke str_c to paste the columns by row
library(stringr)
library(purrr)
iris %>%
head() %>% ## just for less data
mutate(desired = invoke(str_c, across(all_of(vars))))

dplyr::rename() if condition about column contents is met

Say I wanted to rename a column based on the condition that the contents of the column contain a specific value.
For example, if iris$Species contains "virginica", rename to Species to flower.name, else keep the name as Species.
This code works:
library(dplyr)
iris <- if("virginica" %in% iris$Species){
rename(iris, flower.name = Species)
}
iris %>% names
but I was hoping their was a more elegant dplyr way of doing this with one of the existing functions, such as rename_if()?
One option could be:
iris %>%
rename_with(~ "Flower.Name",
.cols = Species & where(~ any(. %in% "virginica")))
Sepal.Length Sepal.Width Petal.Length Petal.Width Flower.Name
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
With rename_if
library(dplyr)
iris1 <- iris %>%
rename_if(~ is.factor(.) && "virginica" %in% ., ~ 'flower.name')
names(iris1)
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "flower.name"

Renaming columns based on condition about their names

I would like to add a prefix to my dataset column names only if they already begin with a certain string, and I would like to do it (if possible) using a dplyr pipeline.
Taking the iris dataset as toy example, I was able to get the expected result with base R (with a quite cumbersome line of code):
data("iris")
colnames(iris)[startsWith(colnames(iris), "Sepal")] <- paste0("YAY_", colnames(iris)[startsWith(colnames(iris), "Sepal")])
head(iris)
YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
In this example, the prefix YAY_ has been added to all the column names starting with Sepal. Is there a way to obtain the same result with a dplyr command/pipeline?
An option would be rename_at
library(tidyverse)
iris %>%
rename_at(vars(starts_with("Sepal")), ~ str_c("YAY_", .))
# YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
# ...

dplyr mutate colnames in pipe function

I have the following code:
library(dplyr)
df <- data %>%
left_join(., panel_info, by = "PANID") %>%
left_join(., prod_0106, by = "UPC") %>%
left_join(., prod_0106sz, by = "UPC") %>%
left_join(., trips, by = "PANID") %>%
mutate(colnames(.) = gsub(" ", "", colnames(.)))
Everything works except the last line. The df data frame has not been created previously. So using the pipe function I am trying to join all the data together and finally remove all the blank spaces in the column names of the joined together data.
However, the following error occurs;
Error in mutate_impl(.data, dots) :
Column `gsub(" ", "", colnames(.))` must be length 20056 (the number of rows) or one, not 106
Which I assume is due to the (.) in the mutate() part of the code. Just want to see where I am going wrong here.
You can also set colnames in a dplyr pipe by piping into `colnames<-()` which is the generic form of the function called when you do colnames(df) <- c('a', 'b', 'c'):
iris %>%
`colnames<-`(gsub('Length', 'LENGTH', names(.))) %>%
head
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
As from the comments, there are a number of options. A couple I think fit in well with chaining:
library(dplyr)
> iris %>% rename_all(~gsub('Length', 'LENGTH', .x)) %>% head()
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> iris %>% setNames(gsub('Length', 'LENGTH', names(.))) %>% head()
Sepal.LENGTH Sepal.Width Petal.LENGTH Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

Updating column data using column index and dplyr::mutate

I am trying to use the mutate function to update the values in the first column and achieve the output from the below query -
iris %>% head %>% mutate(Sepal.Length = Sepal.Length + Sepal.Width)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 8.6 3.5 1.4 0.2 setosa
# 7.9 3.0 1.4 0.2 setosa
# 7.9 3.2 1.3 0.2 setosa
# 7.7 3.1 1.5 0.2 setosa
# 8.6 3.6 1.4 0.2 setosa
# 9.3 3.9 1.7 0.4 setosa
But by using column index instead of the column name.
iris %>% head %>% mutate ( .[[1]] <- .[[1]]+.[[2]])
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species .[[1]] <- .[[1]] + .[[2]]
# 5.1 3.5 1.4 0.2 setosa 8.6
# 4.9 3.0 1.4 0.2 setosa 7.9
# 4.7 3.2 1.3 0.2 setosa 7.9
# 4.6 3.1 1.5 0.2 setosa 7.7
# 5.0 3.6 1.4 0.2 setosa 8.6
# 5.4 3.9 1.7 0.4 setosa 9.3
But this creates a new column called .[[1]] <- .[[1]]+.[[2]]) instead of updating the first column.

Resources