I want to rename a specific column with new name which comes as a variable in dplyr.
newName = paste0('nameY', 2017)
What I tried was
iris %>%
rename(newName = Petal.Length) %>%
head(2)
Which gives
Sepal.Length Sepal.Width newName Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
I am getting newName not nameY2017 which is normal. So I tried
iris %>%
rename_(eval(newName) = 'Petal.Length')
But then I am getting an error.
Error: unexpected '=' in "iris %>% rename_(eval(newName) ="
Is there a proper way to do it with dplyr?
I know I can do something like
names(iris)[3] <- newName
But that wouldn't be dplyr solution.
Credit and further information in this post for this dplyr 'rename' standard evaluation function not working as expected?
Your code:
newName = paste0('nameY', 2017)
iris %>%
rename(newName = Petal.Length) %>%
head(2)
Solution:
iris %>%
rename_(.dots = setNames("Petal.Length",newName)) %>%
head(2)
Output:
Sepal.Length Sepal.Width nameY2017 Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
Related
After grouping by species and taken max Sepal.Length (column 1) for each group I need to grab the value of column 2 to 4 that are associated to maximum value of column 1 (by group). I'm able to do so for each single column at once but not in an across process. Any tips?
library(dplyr)
library(datasets)
data(iris)
Summarize by species with data associates to max sepal.length (by group), column by column:
iris_summary <- iris %>%
group_by(Species) %>%
summarise(
max_sep_length = max(Sepal.Length),
sep_w_associated_to = Sepal.Width[which.max(Sepal.Length)],
pet_l_associated_to = Petal.Length[which.max(Sepal.Length)],
pet_w_associated_to = Petal.Width[which.max(Sepal.Length)]
)
Now I would like obtain the same result using across, but the outcome is different from that I expected (the df iris_summary has now same number of rows as iris, I can't understand why...)
iris_summary <- iris %>%
group_by(Species) %>%
summarise(
max_sepa_length = max(Sepal.Length),
across(
.cols = Sepal.Width : Petal.Width,
.funs = ~ .x[which.max(Sepal.Length)]
)
)
Or use slice_max
library(dplyr) # devel can have `.by` or use `group_by(Species)`
iris %>%
slice_max(Sepal.Length, n = 1, by = 'Species')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.8 4.0 1.2 0.2 setosa
2 7.0 3.2 4.7 1.4 versicolor
3 7.9 3.8 6.4 2.0 virginica
in base R you could do:
merge(aggregate(Sepal.Length~Species, iris, max), iris)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.8 4.0 1.2 0.2
2 versicolor 7.0 3.2 4.7 1.4
3 virginica 7.9 3.8 6.4 2.0
If we want to do the same with across, here is one option:
iris %>%
group_by(Species) %>%
summarise(across(everything(), ~ .[which.max(Sepal.Length)]))
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.8 4 1.2 0.2
2 versicolor 7 3.2 4.7 1.4
3 virginica 7.9 3.8 6.4 2
I have a column of numbers that I want to change from a count to a percentage.
This code works:
df <- df %>%
select(casualty_veh_ref, JourneyPurpose ) %>%
group_by(JourneyPurpose) %>%
summarise(Number=n()) %>%
mutate(Percentage=Number/sum(Number)*100)
df$Percentage <- paste(round(df$Percentage), "%", sep="")
But if I try to keep the piping using percent_format from the scales package:
df <- df %>%
select(casualty_veh_ref, JourneyPurpose ) %>%
group_by(JourneyPurpose) %>%
summarise(Number=n()) %>%
mutate(Percentage=Number/sum(Number)) %>%
percent_format(Percentage, suffix = "%")
I receive the error message
Error in force_all(accuracy, scale, prefix, suffix, big.mark, decimal.mark, :
object 'Percentage' not found
I don't understand why the object is not found
Try this: I've used iris for representation.
library(dplyr)
iris %>%
slice(1:4) %>%
mutate(Test=Sepal.Length/45,Test=scales::percent(Test))
Result:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 11.33%
2 4.9 3.0 1.4 0.2 setosa 10.89%
3 4.7 3.2 1.3 0.2 setosa 10.44%
4 4.6 3.1 1.5 0.2 setosa 10.22%
How can I use variables in place of column names in dplyr strings? As an example say I want to add a column to the iris dataset called sum that is the sum of Sepal.Length and Sepal.Width. In short I want a working version of the below code.
x = "Sepal.Length"
y = "Sepal.Width"
head(iris%>% mutate(sum = x+y))
Currently, running the code outputs "Evaluation error: non-numeric argument to binary operator" as R evaluates x and y as character vectors. How do I instead get R to evaluate x and y as column names of the dataframe? I know that the answer is to use some form of lazy evaluation, but I'm having trouble figuring out exactly how to configure it.
Note that the proposed duplicate: dplyr - mutate: use dynamic variable names does not address this issue. The duplicate answers this question:
Not my question: How do I do:
var = "sum"
head(iris %>% mutate(var = Sepal.Length + Sepal.Width))
I think that recommended way is using sym:
iris %>% mutate(sum = !!sym(x) + !!sym(y)) %>% head
It also works with get():
> rm(list = ls())
> data("iris")
>
> library(dplyr)
>
> x <- "Sepal.Length"
> y <- "Sepal.Width"
>
> head(iris %>% mutate(sum = get(x) + get(y)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
I want to create a new variable at a specific location. I can create the variable with mutate and then reorder with select but I rather would prefer the tibble:add_column way of doing it.
This is a simple example with the iris dataset :
library(tidyverse)
## This works fine
iris %>% mutate(With_mutate = ifelse(Sepal.Length > 4 & Sepal.Width > 3 , TRUE, FALSE)) %>%
select(Sepal.Length:Petal.Width, With_mutate, everything()) %>%
head()
## This works also
iris %>% add_column(With_add_column = "Test", .before = "Species") %>%
head()
## This doesn't work
iris %>% add_column(With_add_column = ifelse(Sepal.Length > 4 & Sepal.Width > 3 , TRUE, FALSE), .before = "Species") %>%
head()
Error in ifelse(Sepal.Length > 2 & Sepal.Width > 1, TRUE, FALSE) :
object 'Sepal.Length' not found
I would greatly appreciate if someone could tell me why my ifelse statement doesn't work with add_column.
The reason is that mutate or summarise etc get the column value based on specifying the symbol, but here add_column wouldn't. So, we can extract the column with .$
iris %>%
add_column(With_add_column = ifelse(.$Sepal.Length > 4 &
.$Sepal.Width > 3 , TRUE, FALSE), .before = "Species") %>%
head()
#Sepal.Length Sepal.Width Petal.Length Petal.Width With_add_column Species
#1 5.1 3.5 1.4 0.2 TRUE setosa
#2 4.9 3.0 1.4 0.2 FALSE setosa
#3 4.7 3.2 1.3 0.2 TRUE setosa
#4 4.6 3.1 1.5 0.2 TRUE setosa
#5 5.0 3.6 1.4 0.2 TRUE setosa
#6 5.4 3.9 1.7 0.4 TRUE setosa
Just to make it compact, the value of logical condition is TRUE/FALSE so, we don't need an ifelse i.e.
add_column(With_add_column = .$Sepal.Length > 4 & .$Sepal.Width > 3, .before = "Species")
can replace the second step
I want to do something like this
df <- iris %>%
rowwise %>%
mutate(new_var = sum(Sepal.Length, Sepal.Width))
Except I want to do it without typing the variable names, e.g.
names_to_add <- c("Sepal.Length", "Sepal.Width")
df <- iris %>%
rowwise %>%
[some function that uses names_to_add]
I attempted a few things e.g.
df <- iris %>%
rowwise %>%
mutate(new_var = sum(sapply(names_to_add, get, envir = as.environment(.))))
but still can't figure it out. I'll take an answer that plays around with lazyeval or something that's simpler. Note that the sum function here is just a placeholder and my actual function is much more complex, although it returns one value per row. I'd also rather not use data.table
You should check out all the functions that end with _ in dplyr. Example mutate_, summarise_ etc.
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df <- iris %>%
rowwise %>% mutate_(names_to_add)
Edit
The results of the code:
df <- iris %>%
rowwise %>% mutate(new_var = sum(Sepal.Length, Sepal.Width))
names_to_add <- ("sum(Sepal.Length, Sepal.Width)")
df2 <- iris %>%
rowwise %>% mutate_(new_var = names_to_add)
identical(df, df2)
[1] TRUE
Edit
I edited the answer and it solves the problem. I wonder why it was donwvoted. We use SE (standard evaluation), passing a string as an input inside 'mutate_'. More info: vignette("nse","dplyr")
x <- "Sepal.Length + Sepal.Width"
df <- mutate_(iris, x)
head(df)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length + Sepal.Width
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3