I'm trying to mutate a new variable from sort of row calculation,
say rowSums as below
iris %>%
mutate_(sumVar =
iris %>%
select(Sepal.Length:Petal.Width) %>%
rowSums)
the result is that "sumVar" is truncated to its first value(10.2):
Source: local data frame [150 x 6]
Groups: <by row>
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 10.2
3 4.7 3.2 1.3 0.2 setosa 10.2
4 4.6 3.1 1.5 0.2 setosa 10.2
5 5.0 3.6 1.4 0.2 setosa 10.2
6 5.4 3.9 1.7 0.4 setosa 10.2
..
Warning message:
Truncating vector to length 1
Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.
Edit:
More specifically, is there any way to realize the inline custom function with dplyr?
I'm wondering if it is possible do something like:
iris %>%
mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
This is more of a workaround but could be used
iris %>% mutate(sumVar = rowSums(.[1:4]))
As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example
iris %>%
mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%
head
or
iris %>%
mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%
head
You can use rowwise() function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across() uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.
A more complicated way would be:
iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Adding #docendodiscimus's comment as an answer. +1 to him!
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:
iris %>% select(Sepal.Length:Petal.Width) %>%
transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)
(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)
You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).
iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
As requested, transforming my commment into an answer:
For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:
df %>% mutate(total = rowSums(across(where(is.numeric))))
across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).
Scroll down the row-wise vignette to find this and have a look at across
Related
library(tidyverse)
df <- iris %>%
group_by(Species) %>%
mutate(Petal.Dim = Petal.Length * Petal.Width,
rank = rank(desc(Petal.Dim))) %>%
mutate(new_col = rank == 4, Sepal.Width)
table <- df %>%
filter(rank == 4) %>%
select(Species, new_col = Sepal.Width)
correct_df <- left_join(df, table, by = "Species")
df
#> # A tibble: 150 x 8
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Dim
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 0.280
#> 2 4.9 3 1.4 0.2 setosa 0.280
#> 3 4.7 3.2 1.3 0.2 setosa 0.26
#> 4 4.6 3.1 1.5 0.2 setosa 0.3
#> 5 5 3.6 1.4 0.2 setosa 0.280
#> 6 5.4 3.9 1.7 0.4 setosa 0.68
#> 7 4.6 3.4 1.4 0.3 setosa 0.42
#> 8 5 3.4 1.5 0.2 setosa 0.3
#> 9 4.4 2.9 1.4 0.2 setosa 0.280
#> 10 4.9 3.1 1.5 0.1 setosa 0.15
#> # ... with 140 more rows, and 2 more variables: rank <dbl>, new_col <lgl>
I'm basically looking for new_col to show the value that corresponds with rank = 4 from the Sepal.Width column. In this case, those values would be 3.9, 3.3, and 3.8. I'm envisioning this similar to a VLookup, or Index/Match in Excel.
When ever I think "now I need to use VLOOKUP like I did in the past in Excel" I find the left_join() function helpful. It's also part of the dplyr package. Instead of "looking up" values in one table in another table, it's easier for R to just make one bigger table where one table remains unchanged (here the "left" one or the first term you put in the function) and the other is added using a column or columns they have in common as an index.
In your specific example, I can't entirely understand what you want new_col to have in it. If you want to do Excel-style VLOOKUP in R, then left_join() is the best starting point.
The question is not clear since it does not mention the purpose of a Vlookup or Index/Match like operation from Excel.
Also, you don't mention what value should "new_col" have if rank is not equal to 4.
Assuming the value is NA, the below solution with a simple ifelse would work:
df <- iris %>%
group_by(Species) %>%
mutate(Petal.Dim = Petal.Length * Petal.Width,
rank = rank(desc(Petal.Dim))) %>%
ungroup() %>%
mutate(new_col = ifelse(rank == 4, Sepal.Width,NA))
df
I'm trying to mutate a new variable from sort of row calculation,
say rowSums as below
iris %>%
mutate_(sumVar =
iris %>%
select(Sepal.Length:Petal.Width) %>%
rowSums)
the result is that "sumVar" is truncated to its first value(10.2):
Source: local data frame [150 x 6]
Groups: <by row>
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 10.2
3 4.7 3.2 1.3 0.2 setosa 10.2
4 4.6 3.1 1.5 0.2 setosa 10.2
5 5.0 3.6 1.4 0.2 setosa 10.2
6 5.4 3.9 1.7 0.4 setosa 10.2
..
Warning message:
Truncating vector to length 1
Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.
Edit:
More specifically, is there any way to realize the inline custom function with dplyr?
I'm wondering if it is possible do something like:
iris %>%
mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
This is more of a workaround but could be used
iris %>% mutate(sumVar = rowSums(.[1:4]))
As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example
iris %>%
mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%
head
or
iris %>%
mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%
head
You can use rowwise() function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across() uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.
A more complicated way would be:
iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Adding #docendodiscimus's comment as an answer. +1 to him!
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:
iris %>% select(Sepal.Length:Petal.Width) %>%
transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)
(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)
You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).
iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
As requested, transforming my commment into an answer:
For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:
df %>% mutate(total = rowSums(across(where(is.numeric))))
across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).
Scroll down the row-wise vignette to find this and have a look at across
I am trying to use dplyr to lag some variables (all of which have a common naming convention) for each group in my data set.
I thought mutate_if would work, but I get an error (below). mutate_each works, but for all columns rather than the select few.
For example, I were looking to lag only the Sepal measurements:
iris %>%
tbl_df() %>%
group_by(Species) %>%
slice(1:3) %>%
# mutate_each(funs(lag(.)))
mutate_if(contains("Sepal"), funs(lag(.)))
#> Error in get(as.character(FUN), mode = "function", envir = envir) : object 'p' of mode 'function' was not found
to get a final data set like:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 NA NA 1.4 0.2 setosa
# 2 5.1 3.5 1.4 0.2 setosa
# 3 4.9 3.0 1.3 0.2 setosa
# 4 NA NA 4.7 1.4 versicolor
# 5 7.0 3.2 4.5 1.5 versicolor
# 6 6.4 3.2 4.9 1.5 versicolor
# 7 NA NA 6.0 2.5 virginica
# 8 6.3 3.3 5.1 1.9 virginica
# 9 5.8 2.7 5.9 2.1 virginica
This seems to work,
library(dplyr)
iris %>%
tbl_df() %>%
group_by(Species) %>%
slice(1:3) %>%
mutate_if(grepl('Sepal', names(.)), funs(lag(.)))
As #aosmith explains, contains returns an index of the columns that match the string, whereas mutate_if relies on a using predicate functions that return logical vectors, which is why the grepl option works.
In addition, as #StevenBeaupre mentions,
iris %>%
tbl_df() %>%
group_by(Species) %>%
slice(1:3) %>%
mutate_at(vars(contains('Sepal')), lag)
I'm trying to mutate a new variable from sort of row calculation,
say rowSums as below
iris %>%
mutate_(sumVar =
iris %>%
select(Sepal.Length:Petal.Width) %>%
rowSums)
the result is that "sumVar" is truncated to its first value(10.2):
Source: local data frame [150 x 6]
Groups: <by row>
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 10.2
3 4.7 3.2 1.3 0.2 setosa 10.2
4 4.6 3.1 1.5 0.2 setosa 10.2
5 5.0 3.6 1.4 0.2 setosa 10.2
6 5.4 3.9 1.7 0.4 setosa 10.2
..
Warning message:
Truncating vector to length 1
Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.
Edit:
More specifically, is there any way to realize the inline custom function with dplyr?
I'm wondering if it is possible do something like:
iris %>%
mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
This is more of a workaround but could be used
iris %>% mutate(sumVar = rowSums(.[1:4]))
As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example
iris %>%
mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%
head
or
iris %>%
mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%
head
You can use rowwise() function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across() uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.
A more complicated way would be:
iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Adding #docendodiscimus's comment as an answer. +1 to him!
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:
iris %>% select(Sepal.Length:Petal.Width) %>%
transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)
(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)
You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).
iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
As requested, transforming my commment into an answer:
For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:
df %>% mutate(total = rowSums(across(where(is.numeric))))
across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).
Scroll down the row-wise vignette to find this and have a look at across
I'm trying to mutate a new variable from sort of row calculation,
say rowSums as below
iris %>%
mutate_(sumVar =
iris %>%
select(Sepal.Length:Petal.Width) %>%
rowSums)
the result is that "sumVar" is truncated to its first value(10.2):
Source: local data frame [150 x 6]
Groups: <by row>
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 10.2
3 4.7 3.2 1.3 0.2 setosa 10.2
4 4.6 3.1 1.5 0.2 setosa 10.2
5 5.0 3.6 1.4 0.2 setosa 10.2
6 5.4 3.9 1.7 0.4 setosa 10.2
..
Warning message:
Truncating vector to length 1
Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.
Edit:
More specifically, is there any way to realize the inline custom function with dplyr?
I'm wondering if it is possible do something like:
iris %>%
mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
This is more of a workaround but could be used
iris %>% mutate(sumVar = rowSums(.[1:4]))
As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example
iris %>%
mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%
head
or
iris %>%
mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%
head
You can use rowwise() function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across() uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.
A more complicated way would be:
iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Adding #docendodiscimus's comment as an answer. +1 to him!
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:
iris %>% select(Sepal.Length:Petal.Width) %>%
transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)
(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)
You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).
iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
As requested, transforming my commment into an answer:
For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:
df %>% mutate(total = rowSums(across(where(is.numeric))))
across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).
Scroll down the row-wise vignette to find this and have a look at across