renaming column names with variable

renaming column names with variable - r

I want to rename a specific column with new name which comes as a variable in dplyr.
newName = paste0('nameY', 2017)
What I tried was
iris %>%
rename(newName = Petal.Length) %>%
head(2)
Which gives
Sepal.Length Sepal.Width newName Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
I am getting newName not nameY2017 which is normal. So I tried
iris %>%
rename_(eval(newName) = 'Petal.Length')
But then I am getting an error.
Error: unexpected '=' in "iris %>% rename_(eval(newName) ="
Is there a proper way to do it with dplyr?
I know I can do something like
names(iris)[3] <- newName
But that wouldn't be dplyr solution.

Credit and further information in this post for this dplyr 'rename' standard evaluation function not working as expected?
Your code:
newName = paste0('nameY', 2017)
iris %>%
rename(newName = Petal.Length) %>%
head(2)
Solution:
iris %>%
rename_(.dots = setNames("Petal.Length",newName)) %>%
head(2)
Output:
Sepal.Length Sepal.Width nameY2017 Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa

Related

Get column value associated to another column maximum in dplyr's across

After grouping by species and taken max Sepal.Length (column 1) for each group I need to grab the value of column 2 to 4 that are associated to maximum value of column 1 (by group). I'm able to do so for each single column at once but not in an across process. Any tips?
library(dplyr)
library(datasets)
data(iris)
Summarize by species with data associates to max sepal.length (by group), column by column:
iris_summary <- iris %>%
group_by(Species) %>%
summarise(
max_sep_length = max(Sepal.Length),
sep_w_associated_to = Sepal.Width[which.max(Sepal.Length)],
pet_l_associated_to = Petal.Length[which.max(Sepal.Length)],
pet_w_associated_to = Petal.Width[which.max(Sepal.Length)]
)
Now I would like obtain the same result using across, but the outcome is different from that I expected (the df iris_summary has now same number of rows as iris, I can't understand why...)
iris_summary <- iris %>%
group_by(Species) %>%
summarise(
max_sepa_length = max(Sepal.Length),
across(
.cols = Sepal.Width : Petal.Width,
.funs = ~ .x[which.max(Sepal.Length)]
)
)

Or use slice_max
library(dplyr) # devel can have `.by` or use `group_by(Species)`
iris %>%
slice_max(Sepal.Length, n = 1, by = 'Species')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.8 4.0 1.2 0.2 setosa
2 7.0 3.2 4.7 1.4 versicolor
3 7.9 3.8 6.4 2.0 virginica

in base R you could do:
merge(aggregate(Sepal.Length~Species, iris, max), iris)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.8 4.0 1.2 0.2
2 versicolor 7.0 3.2 4.7 1.4
3 virginica 7.9 3.8 6.4 2.0

If we want to do the same with across, here is one option:
iris %>%
group_by(Species) %>%
summarise(across(everything(), ~ .[which.max(Sepal.Length)]))
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.8 4 1.2 0.2
2 versicolor 7 3.2 4.7 1.4
3 virginica 7.9 3.8 6.4 2

R convert numeric to percentage the 'tidyverse way" using percent_format

I have a column of numbers that I want to change from a count to a percentage.
This code works:
df <- df %>%
select(casualty_veh_ref, JourneyPurpose ) %>%
group_by(JourneyPurpose) %>%
summarise(Number=n()) %>%
mutate(Percentage=Number/sum(Number)*100)
df$Percentage <- paste(round(df$Percentage), "%", sep="")
But if I try to keep the piping using percent_format from the scales package:
df <- df %>%
select(casualty_veh_ref, JourneyPurpose ) %>%
group_by(JourneyPurpose) %>%
summarise(Number=n()) %>%
mutate(Percentage=Number/sum(Number)) %>%
percent_format(Percentage, suffix = "%")
I receive the error message
Error in force_all(accuracy, scale, prefix, suffix, big.mark, decimal.mark, :
object 'Percentage' not found
I don't understand why the object is not found

Try this: I've used iris for representation.
library(dplyr)
iris %>%
slice(1:4) %>%
mutate(Test=Sepal.Length/45,Test=scales::percent(Test))
Result:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 11.33%
2 4.9 3.0 1.4 0.2 setosa 10.89%
3 4.7 3.2 1.3 0.2 setosa 10.44%
4 4.6 3.1 1.5 0.2 setosa 10.22%

Using variables for column functions in mutate()

How can I use variables in place of column names in dplyr strings? As an example say I want to add a column to the iris dataset called sum that is the sum of Sepal.Length and Sepal.Width. In short I want a working version of the below code.
x = "Sepal.Length"
y = "Sepal.Width"
head(iris%>% mutate(sum = x+y))
Currently, running the code outputs "Evaluation error: non-numeric argument to binary operator" as R evaluates x and y as character vectors. How do I instead get R to evaluate x and y as column names of the dataframe? I know that the answer is to use some form of lazy evaluation, but I'm having trouble figuring out exactly how to configure it.
Note that the proposed duplicate: dplyr - mutate: use dynamic variable names does not address this issue. The duplicate answers this question:
Not my question: How do I do:
var = "sum"
head(iris %>% mutate(var = Sepal.Length + Sepal.Width))

I think that recommended way is using sym:
iris %>% mutate(sum = !!sym(x) + !!sym(y)) %>% head

It also works with get():
> rm(list = ls())
> data("iris")
>
> library(dplyr)
>
> x <- "Sepal.Length"
> y <- "Sepal.Width"
>
> head(iris %>% mutate(sum = get(x) + get(y)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3

Create a column at specific location in dataframe

I want to create a new variable at a specific location. I can create the variable with mutate and then reorder with select but I rather would prefer the tibble:add_column way of doing it.
This is a simple example with the iris dataset :
library(tidyverse)
## This works fine
iris %>% mutate(With_mutate = ifelse(Sepal.Length > 4 & Sepal.Width > 3 , TRUE, FALSE)) %>%
select(Sepal.Length:Petal.Width, With_mutate, everything()) %>%
head()
## This works also
iris %>% add_column(With_add_column = "Test", .before = "Species") %>%
head()
## This doesn't work
iris %>% add_column(With_add_column = ifelse(Sepal.Length > 4 & Sepal.Width > 3 , TRUE, FALSE), .before = "Species") %>%
head()
Error in ifelse(Sepal.Length > 2 & Sepal.Width > 1, TRUE, FALSE) :
object 'Sepal.Length' not found
I would greatly appreciate if someone could tell me why my ifelse statement doesn't work with add_column.

The reason is that mutate or summarise etc get the column value based on specifying the symbol, but here add_column wouldn't. So, we can extract the column with .$
iris %>%
add_column(With_add_column = ifelse(.$Sepal.Length > 4 &
.$Sepal.Width > 3 , TRUE, FALSE), .before = "Species") %>%
head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width With_add_column Species
#1 5.1 3.5 1.4 0.2 TRUE setosa
#2 4.9 3.0 1.4 0.2 FALSE setosa
#3 4.7 3.2 1.3 0.2 TRUE setosa
#4 4.6 3.1 1.5 0.2 TRUE setosa
#5 5.0 3.6 1.4 0.2 TRUE setosa
#6 5.4 3.9 1.7 0.4 TRUE setosa
Just to make it compact, the value of logical condition is TRUE/FALSE so, we don't need an ifelse i.e.
add_column(With_add_column = .$Sepal.Length > 4 & .$Sepal.Width > 3, .before = "Species")
can replace the second step

dplyr rowwise mutate without hardcoding names

I want to do something like this
df <- iris %>%
rowwise %>%
mutate(new_var = sum(Sepal.Length, Sepal.Width))
Except I want to do it without typing the variable names, e.g.
names_to_add <- c("Sepal.Length", "Sepal.Width")
df <- iris %>%
rowwise %>%
[some function that uses names_to_add]
I attempted a few things e.g.
df <- iris %>%
rowwise %>%
mutate(new_var = sum(sapply(names_to_add, get, envir = as.environment(.))))
but still can't figure it out. I'll take an answer that plays around with lazyeval or something that's simpler. Note that the sum function here is just a placeholder and my actual function is much more complex, although it returns one value per row. I'd also rather not use data.table

You should check out all the functions that end with _ in dplyr. Example mutate_, summarise_ etc.
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df <- iris %>%
rowwise %>% mutate_(names_to_add)
Edit
The results of the code:
df <- iris %>%
rowwise %>% mutate(new_var = sum(Sepal.Length, Sepal.Width))
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df2 <- iris %>%
rowwise %>% mutate_(new_var = names_to_add)
identical(df, df2)
[1] TRUE

Edit
I edited the answer and it solves the problem. I wonder why it was donwvoted. We use SE (standard evaluation), passing a string as an input inside 'mutate_'. More info: vignette("nse","dplyr")
x <- "Sepal.Length + Sepal.Width"
df <- mutate_(iris, x)
head(df)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length + Sepal.Width
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

renaming column names with variable - r

Related

Get column value associated to another column maximum in dplyr's across

R convert numeric to percentage the 'tidyverse way" using percent_format

Using variables for column functions in mutate()

Create a column at specific location in dataframe

dplyr rowwise mutate without hardcoding names

Categories

Resources