How to use dplyr::if_else to mutate across a tibble dataframe? - r

I wonder how to combine mutate and if_else to transform a data frame into TRUE and FALSE?
For example, mutate a table into TRUE (value >= 2) and FALSE(value <2):
> iris %>% as_tibble() %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
# A tibble: 150 × 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
# … with 140 more rows
into
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 T T F F
2 T T F F
3 T T F F
4 T T F F
5 T T F F
6 T T F F
7 T T F F
Thanks a lot!

iris %>%
mutate(across(where(is.numeric), ~ . >= 2))
You don't need if_else when the result you want is TRUE or FALSE. Generally, ifelse(test, TRUE, FALSE) is a long way of writing test.
Or in base R
iris[1:4] >= 2

Related

Create a column in the original dataset to indicate whether the row was drawn in a random stratified sample

I would like to draw a stratified random sample (n = 375) from a dataset. Based on the stratified random sample, I would like to add a column to the original dataset indicating whether the row is in the stratified random sample (1) or not (0).
iris <- iris
# Get a random stratified sample
library(tidyverse)
stratified <- iris %>%
group_by(Species) %>%
sample_n(size=1)
# The final result I would like to get:
iris$sample3 <- 0
iris[21,6] <- 1
iris[65,6] <- 1
iris[106,6] <- 1
After doing that, I would like to repeat the procedure by drawing a second stratified random sample (n = 125) from my first stratified random sample (n = 375) and repeat the creation of a column.
You can add a column to your data frame that has the required number of 1s per group (and 0 otherwise).
set.seed(1)
samples <- 1
sample1 <- iris %>%
group_by(Species) %>%
mutate(sampled = as.numeric(row_number() %in% sample(n(), samples)))
sample1
sample1
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sampled
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 0
#> 2 4.9 3 1.4 0.2 setosa 0
#> 3 4.7 3.2 1.3 0.2 setosa 0
#> 4 4.6 3.1 1.5 0.2 setosa 1
#> 5 5 3.6 1.4 0.2 setosa 0
#> 6 5.4 3.9 1.7 0.4 setosa 0
#> 7 4.6 3.4 1.4 0.3 setosa 0
#> 8 5 3.4 1.5 0.2 setosa 0
#> 9 4.4 2.9 1.4 0.2 setosa 0
#> 10 4.9 3.1 1.5 0.1 setosa 0
#> # ... with 140 more rows
To get the sampled values, simply filter to find the 1s:
sample1 %>% filter(sampled == 1)
#> # A tibble: 3 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sampled
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 4.6 3.1 1.5 0.2 setosa 1
#> 2 5.6 3 4.1 1.3 versicolor 1
#> 3 6.3 3.3 6 2.5 virginica 1
Created on 2022-05-16 by the reprex package (v2.0.1)

Select grouped random rows, change value in one column

For my study design I need to select a total of 12 rows from each group (10 groups) and change the value of one column from 0 to 1.
How would I go about this? I tried the sample_n already, but then it only gives me the randomly selected rows, not the entire dataset.
test <- test %>% group_by(group) %>% mutate(
change_value = sample_n(12)
) %>% ungroup()
Sorry I am stuck after this.
Thank you in advance
Your requirement is not very clear.
case-1 when you want to select 12 random rows from each group, change value of one column and return entire dataset.
library(tidyverse)
set.seed(2021)
iris %>% group_by(Species) %>%
mutate(Sepal.Width = ifelse(sample(1:n(), n()) <= 12, 1, Sepal.Width)) %>%
ungroup()
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 1 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 1 1.4 0.2 setosa
6 5.4 1 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 1 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with 140 more rows

Append dataframe to end of each dataframe in a list of dataframes in r

I would like to add one row to the end of each dataframe in a list of dataframes. In this example, I would like to add the column names as a new row to the bottom of each dataframe in the list of dataframes I created by group_split.
library(dplyr)
col_names1 <- as.data.frame(t(as.data.frame(colnames(iris))))
colnames(col_names1) <- unlist(col_names1[1, ])
rownames(col_names1) <-""
iris %>%
group_split(Species) %>%
bind_rows(col_names1) #errors out: Error: Column `Sepal.Length` can't be converted from numeric to factor
I would like to end up with a list of dataframes, each with their column names as a new row at the bottom of each dataframe in the list.
One issue is the type difference. We can convert to same type and then do the bind_rows. Also, as we are splitting into a list of data.frame, we need to loop over the list (map) and apply the bind_rows)
library(dplyr)
library(purrr)
iris %>%
group_split(Species) %>%
map(~ bind_rows(.x %>%
mutate_all(factor), col_names1))
#[[1]]
# A tibble: 51 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# * <fct> <fct> <fct> <fct> <fct>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 8 5 3.4 1.5 0.2 setosa
# 9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
# … with 41 more rows
#[[2]]
# A tibble: 51 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# * <fct> <fct> <fct> <fct> <fct>
#...

R - replace variable with a specific values in dplyr

Let's say I want to replace several variables by 1 in a dataset:
data(iris)
put_1 <- function(x){ x = 1}
iris %>%
mutate_at(vars(Petal.Length, Petal.Width), funs(put_1)) %>%
head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1 1 setosa
# 2 4.9 3.0 1 1 setosa
# 3 4.7 3.2 1 1 setosa
# 4 4.6 3.1 1 1 setosa
# 5 5.0 3.6 1 1 setosa
# 6 5.4 3.9 1 1 setosa
Question : Is there a way to do the same without declaring a function before ?
I tried things like :
mutate_at(vars(...), funs(function(x){ x <- 1 }))
mutate_at(vars(...), funs(~ 1 }))
mutate_at(vars(...), funs(~ . = 1 }))
without success.
Thank you in advance.
This is one of the times when = and <- don't work the same
> iris%>%mutate_at(vars(Petal.Length,Petal.Width),funs(.<-1))%>%head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1 1 setosa
2 4.9 3.0 1 1 setosa
3 4.7 3.2 1 1 setosa
4 4.6 3.1 1 1 setosa
5 5.0 3.6 1 1 setosa
6 5.4 3.9 1 1 setosa
And
> iris%>%mutate_at(vars(Petal.Length,Petal.Width),funs(.=1))%>%head()
Error: Can't create call to non-callable object
Call `rlang::last_error()` to see a backtrace
The best answer is from #josemz
iris %>%
mutate_at(vars(Petal.Length, Petal.Width), ~ 1)

dplyr summarize: how to include all table columns in the output table

I have the follow dataset
# Dataset
x<-tbl_df(data.frame(locus=c(1,2,2,3,4,4,5,5,5,6),v=c(1,1,2,1,1,2,1,2,3,1),rpkm=rnorm(10,10)))
If I use the follow command
# Subset
x%>%group_by(locus)%>%summarize(max(rpkm))
I obtained
locus max(rpkm)
1 9.316949
2 10.273270
3 9.879886
4 10.944641
5 10.837681
6 13.450680
While I'd like to obtain
locus v max(rpkm)
1 1 9.316949
2 1 10.273270
3 1 9.879886
4 2 10.944641
5 1 10.837681
6 1 13.450680
So, I'd like to have in the output table the "v" correspondent row.
Is it possible?
Try:
x %>% group_by(locus) %>%
summarize(max(rpkm), v = v[which(rpkm==max(rpkm))])
You can use the top_n function instead
# with set.seed(15)
x %>% group_by(locus) %>% top_n(1, rpkm)
# locus v rpkm
# 1 1 1 10.258823
# 2 2 1 11.831121
# 3 3 1 10.897198
# 4 4 1 10.488016
# 5 5 2 11.090773
# 6 6 1 8.924999
Try this:
x %>% group_by(locus) %>% filter(rpkm==max(rpkm))
I assume you're looking for a way to not type all of the column names by hand, and you achieve that by using across within summarize, like so:
iris %>%
group_by(Species) %>%
dplyr::summarize(
across(everything()),
mean_l = mean(Sepal.Length)
) %>%
head()
# A tibble: 6 × 6
# Groups: Species [1]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width mean_l
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2 5.01
2 setosa 4.9 3 1.4 0.2 5.01
3 setosa 4.7 3.2 1.3 0.2 5.01
4 setosa 4.6 3.1 1.5 0.2 5.01
5 setosa 5 3.6 1.4 0.2 5.01
6 setosa 5.4 3.9 1.7 0.4 5.01

Resources