R - replace variable with a specific values in dplyr - r

Let's say I want to replace several variables by 1 in a dataset:
data(iris)
put_1 <- function(x){ x = 1}
iris %>%
mutate_at(vars(Petal.Length, Petal.Width), funs(put_1)) %>%
head()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1 1 setosa
# 2 4.9 3.0 1 1 setosa
# 3 4.7 3.2 1 1 setosa
# 4 4.6 3.1 1 1 setosa
# 5 5.0 3.6 1 1 setosa
# 6 5.4 3.9 1 1 setosa
Question : Is there a way to do the same without declaring a function before ?
I tried things like :
mutate_at(vars(...), funs(function(x){ x <- 1 }))
mutate_at(vars(...), funs(~ 1 }))
mutate_at(vars(...), funs(~ . = 1 }))
without success.
Thank you in advance.

This is one of the times when = and <- don't work the same
> iris%>%mutate_at(vars(Petal.Length,Petal.Width),funs(.<-1))%>%head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1 1 setosa
2 4.9 3.0 1 1 setosa
3 4.7 3.2 1 1 setosa
4 4.6 3.1 1 1 setosa
5 5.0 3.6 1 1 setosa
6 5.4 3.9 1 1 setosa
And
> iris%>%mutate_at(vars(Petal.Length,Petal.Width),funs(.=1))%>%head()
Error: Can't create call to non-callable object
Call `rlang::last_error()` to see a backtrace

The best answer is from #josemz
iris %>%
mutate_at(vars(Petal.Length, Petal.Width), ~ 1)

Related

How to use dplyr::if_else to mutate across a tibble dataframe?

I wonder how to combine mutate and if_else to transform a data frame into TRUE and FALSE?
For example, mutate a table into TRUE (value >= 2) and FALSE(value <2):
> iris %>% as_tibble() %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
# A tibble: 150 × 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
# … with 140 more rows
into
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 T T F F
2 T T F F
3 T T F F
4 T T F F
5 T T F F
6 T T F F
7 T T F F
Thanks a lot!
iris %>%
mutate(across(where(is.numeric), ~ . >= 2))
You don't need if_else when the result you want is TRUE or FALSE. Generally, ifelse(test, TRUE, FALSE) is a long way of writing test.
Or in base R
iris[1:4] >= 2

Condition in rows, modify all columns without a loop

what I want to do is to modify all selected columns of an R data table according to the rows conditions i.e
for all 4 columns selected in cols variable, if the value is greater (or equal) than 1.5, i would like to put them to 1 else 0
I tried something like that : iris[(cols) > 1.5 , (cols) := 1, .SDcols = cols]
Thx
One data.table approach:
iris <- as.data.table(iris)
cols <- names(iris)[1:4]
cols
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
iris[, (cols) := lapply(.SD, function(z) fifelse(z > 1.5, 1, z)), .SDcols = cols]
iris
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <num> <num> <num> <num> <fctr>
# 1: 1 1 1.4 0.2 setosa
# 2: 1 1 1.4 0.2 setosa
# 3: 1 1 1.3 0.2 setosa
# 4: 1 1 1.5 0.2 setosa
# 5: 1 1 1.4 0.2 setosa
# 6: 1 1 1.0 0.4 setosa
# 7: 1 1 1.4 0.3 setosa
# 8: 1 1 1.5 0.2 setosa
# 9: 1 1 1.4 0.2 setosa
# 10: 1 1 1.5 0.1 setosa
# ---
# 141: 1 1 1.0 1.0 virginica
# 142: 1 1 1.0 1.0 virginica
# 143: 1 1 1.0 1.0 virginica
# 144: 1 1 1.0 1.0 virginica
# 145: 1 1 1.0 1.0 virginica
# 146: 1 1 1.0 1.0 virginica
# 147: 1 1 1.0 1.0 virginica
# 148: 1 1 1.0 1.0 virginica
# 149: 1 1 1.0 1.0 virginica
# 150: 1 1 1.0 1.0 virginica
An alternative using set:
for (nm in cols) set(iris, which(iris[[nm]] > 1.5), nm, 1)
Another solution:
library(dplyr)
library(data.table)
iris[,1:4] %>% data.table() %>% mutate_all(~ ifelse(.x>=1.5,1,0))
If you just need to check for numeric columns across can be a good fit, it also works with more specific choices like positions and names
library(tidyverse)
iris |>
as_tibble() |>
mutate(across(.cols = where(is.numeric),.fns = ~ if_else(.x > 1.5,1,.x)))
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 1 1 1.4 0.2 setosa
#> 2 1 1 1.4 0.2 setosa
#> 3 1 1 1.3 0.2 setosa
#> 4 1 1 1.5 0.2 setosa
#> 5 1 1 1.4 0.2 setosa
#> 6 1 1 1 0.4 setosa
#> 7 1 1 1.4 0.3 setosa
#> 8 1 1 1.5 0.2 setosa
#> 9 1 1 1.4 0.2 setosa
#> 10 1 1 1.5 0.1 setosa
#> # ... with 140 more rows
Created on 2021-10-18 by the reprex package (v2.0.1)
Base R option -
data <- iris
cols <- 1:4
data[cols] <- +(data[cols] > 1.5)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 1 1 0 0 setosa
#2 1 1 0 0 setosa
#3 1 1 0 0 setosa
#4 1 1 0 0 setosa
#5 1 1 0 0 setosa
#6 1 1 1 0 setosa
#...
#...
The + at the beginning is used to change the logical values (TRUE/FALSE) to integers (1/0).
We may do
library(dplyr)
iris %>%
mutate(across(where(is.numeric), ~ +(. > 1.5)))

Select grouped random rows, change value in one column

For my study design I need to select a total of 12 rows from each group (10 groups) and change the value of one column from 0 to 1.
How would I go about this? I tried the sample_n already, but then it only gives me the randomly selected rows, not the entire dataset.
test <- test %>% group_by(group) %>% mutate(
change_value = sample_n(12)
) %>% ungroup()
Sorry I am stuck after this.
Thank you in advance
Your requirement is not very clear.
case-1 when you want to select 12 random rows from each group, change value of one column and return entire dataset.
library(tidyverse)
set.seed(2021)
iris %>% group_by(Species) %>%
mutate(Sepal.Width = ifelse(sample(1:n(), n()) <= 12, 1, Sepal.Width)) %>%
ungroup()
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 1 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 1 1.4 0.2 setosa
6 5.4 1 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 1 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ... with 140 more rows

Append dataframe to end of each dataframe in a list of dataframes in r

I would like to add one row to the end of each dataframe in a list of dataframes. In this example, I would like to add the column names as a new row to the bottom of each dataframe in the list of dataframes I created by group_split.
library(dplyr)
col_names1 <- as.data.frame(t(as.data.frame(colnames(iris))))
colnames(col_names1) <- unlist(col_names1[1, ])
rownames(col_names1) <-""
iris %>%
group_split(Species) %>%
bind_rows(col_names1) #errors out: Error: Column `Sepal.Length` can't be converted from numeric to factor
I would like to end up with a list of dataframes, each with their column names as a new row at the bottom of each dataframe in the list.
One issue is the type difference. We can convert to same type and then do the bind_rows. Also, as we are splitting into a list of data.frame, we need to loop over the list (map) and apply the bind_rows)
library(dplyr)
library(purrr)
iris %>%
group_split(Species) %>%
map(~ bind_rows(.x %>%
mutate_all(factor), col_names1))
#[[1]]
# A tibble: 51 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# * <fct> <fct> <fct> <fct> <fct>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 8 5 3.4 1.5 0.2 setosa
# 9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
# … with 41 more rows
#[[2]]
# A tibble: 51 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# * <fct> <fct> <fct> <fct> <fct>
#...

dplyr summarize: how to include all table columns in the output table

I have the follow dataset
# Dataset
x<-tbl_df(data.frame(locus=c(1,2,2,3,4,4,5,5,5,6),v=c(1,1,2,1,1,2,1,2,3,1),rpkm=rnorm(10,10)))
If I use the follow command
# Subset
x%>%group_by(locus)%>%summarize(max(rpkm))
I obtained
locus max(rpkm)
1 9.316949
2 10.273270
3 9.879886
4 10.944641
5 10.837681
6 13.450680
While I'd like to obtain
locus v max(rpkm)
1 1 9.316949
2 1 10.273270
3 1 9.879886
4 2 10.944641
5 1 10.837681
6 1 13.450680
So, I'd like to have in the output table the "v" correspondent row.
Is it possible?
Try:
x %>% group_by(locus) %>%
summarize(max(rpkm), v = v[which(rpkm==max(rpkm))])
You can use the top_n function instead
# with set.seed(15)
x %>% group_by(locus) %>% top_n(1, rpkm)
# locus v rpkm
# 1 1 1 10.258823
# 2 2 1 11.831121
# 3 3 1 10.897198
# 4 4 1 10.488016
# 5 5 2 11.090773
# 6 6 1 8.924999
Try this:
x %>% group_by(locus) %>% filter(rpkm==max(rpkm))
I assume you're looking for a way to not type all of the column names by hand, and you achieve that by using across within summarize, like so:
iris %>%
group_by(Species) %>%
dplyr::summarize(
across(everything()),
mean_l = mean(Sepal.Length)
) %>%
head()
# A tibble: 6 × 6
# Groups: Species [1]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width mean_l
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2 5.01
2 setosa 4.9 3 1.4 0.2 5.01
3 setosa 4.7 3.2 1.3 0.2 5.01
4 setosa 4.6 3.1 1.5 0.2 5.01
5 setosa 5 3.6 1.4 0.2 5.01
6 setosa 5.4 3.9 1.7 0.4 5.01

Resources