How to keep other values unchanged with dplyr's recode_factor

How to keep other values unchanged with dplyr's recode_factor - r

In the example below, recoding some values makes all the other NA. How can I keep the other values unchanged?
library(tibble)
library(dplyr)
test <- tibble(
test_vec = as.factor(c(1, 2, 3))
)
test
#> # A tibble: 3 x 1
#> test_vec
#> <fct>
#> 1 1
#> 2 2
#> 3 3
test %>%
mutate(test_vec = recode_factor(test_vec, `3` = 4))
#> # A tibble: 3 x 1
#> test_vec
#> <fct>
#> 1 <NA>
#> 2 <NA>
#> 3 4

Need to make your replacement the same type as the original value.
test %>%
mutate(test_vec = recode_factor(test_vec, "3" = "4"))
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4

Using fct_recode
library(forcats)
library(dplyr)
test %>%
mutate(test_vec = fct_recode(test_vec, `4` = '3'))
-output
# A tibble: 3 x 1
# test_vec
# <fct>
#1 1
#2 2
#3 4

So that you don't get missing NA values, you have to list the other values in the function as well.
test %>%
mutate(test_vec = recode_factor(test_vec, `1` = 1, `2` = 2, `3` = 4))
Result
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4

Another way to do it is using case_when, but for this you have to start from numerical values.
I give you an example starting from numerical values and I convert them to factor.
test <- tibble(
test_vec = (c(1, 2, 3)))
test %>%
mutate(test_vec = case_when( test_vec != 3 ~ test_vec,
test_vec == 3 ~ 4)) %>%
mutate(across(test_vec,factor))
Result
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4

Related

How can a table be rearranged one step at a time so that two or more observations are listed in a row in successive columns?

So far I have done this to achieve the desired result:
# A tibble: 4 x 2
frag treat
<dbl> <dbl>
1 1 1
2 2 1
3 1 2
4 2 2
treat_1 <- tab_example %>% filter(treat == "1")
treat_2 <- tab_example %>% filter(treat == "2")
new_tab_example <- full_join(treat_1, treat_2, by = "frag")
> new_tab_example
# A tibble: 2 x 3
frag treat.x treat.y
<dbl> <dbl> <dbl>
1 1 1 2
2 2 1 2
Is there a way to do it in one step?

You can use pivot_wider :
tidyr::pivot_wider(tab_example, names_from = treat,
names_prefix = 'treat', values_from = treat)
# frag treat1 treat2
# <dbl> <dbl> <dbl>
#1 1 1 2
#2 2 1 2

There is a way using spread() function:
library(dplyr)
library(tidyr)
# Yours data
df = tibble(frag = c(1, 2, 1, 2), treat = c(1,1,2,2) )
dfnew = df %>%
mutate(treat_name = case_when(treat==1 ~ 'treat.x', # Build names of columns
treat==2 ~ 'treat.y')
) %>%
spread(treat_name, treat) # Use spread function
If you print the result:
print(dfnew)
# A tibble: 2 x 3
frag treat.x treat.y
<dbl> <dbl> <dbl>
1 1 1 2
2 2 1 2

How to transform a tibble from one column to two columns with repeated observations

I tried to transform df into df2. I have done it through a very patchy way using df3, Is there a simpler and more elegant way of doing it?
library(tidyverse)
# I want to transform df
df <- tibble(id = c(1, 2, 1, 2, 1, 2),
time = c('t1', 't1', 't2', 't2', 't3', 't3'),
value = c(2, 3, 6, 4, 5, 7))
df
#> # A tibble: 6 x 3
#> id time value
#> <dbl> <chr> <dbl>
#> 1 1 t1 2
#> 2 2 t1 3
#> 3 1 t2 6
#> 4 2 t2 4
#> 5 1 t3 5
#> 6 2 t3 7
# into df2
df2 <- tibble(id = c(1, 2, 1, 2),
t = c(2, 3, 6, 4),
r = c(6, 4, 5, 7))
df2
#> # A tibble: 4 x 3
#> id t r
#> <dbl> <dbl> <dbl>
#> 1 1 2 6
#> 2 2 3 4
#> 3 1 6 5
#> 4 2 4 7
# This is how I did it, but I think it should be a better way
df3 <- df %>% pivot_wider(names_from = time, values_from = value)
b <- tibble(id = numeric(), t = numeric(), r = numeric())
for (i in 2:3){
a <- df3[,c(1,i,i+1)]
colnames(a) <- c('id', 't', 'r')
b <- bind_rows(a, b)
}
b
#> # A tibble: 4 x 3
#> id t r
#> <dbl> <dbl> <dbl>
#> 1 1 6 5
#> 2 2 4 7
#> 3 1 2 6
#> 4 2 3 4
Created on 2020-11-25 by the reprex package (v0.3.0)

For each id you can use lead to select next value and create r column and drop NA rows.
library(dplyr)
df %>%
group_by(id) %>%
mutate(t = value,
r = lead(value)) %>%
na.omit() %>%
select(id, t, r)
# id t r
# <dbl> <dbl> <dbl>
#1 1 2 6
#2 2 3 4
#3 1 6 5
#4 2 4 7

We can use summarise from dplyr version >= 1.0. Previously, it had the constraint of returning only single observation per group. From version >= 1.0, it is no longer the case. Can return any number of rows i.e. it can be shorter or longer than the original number of rows
library(dplyr)
df %>%
group_by(id) %>%
summarise(t = value[-n()], r = value[-1], .groups = 'drop')
-output
# A tibble: 4 x 3
# id t r
# <dbl> <dbl> <dbl>
#1 1 2 6
#2 1 6 5
#3 2 3 4
#4 2 4 7

How to mutate a variable in a tibble based on columns named as numbers in R

I have a tibble with columns named as numbers (e.g. 1). I created a function to compute differences between columns, but I don't know how to do it with that type of columns:
<!-- language-all: lang-r -->
library(tidyverse)
df <- tibble(`1` = c(1,2,3), `2` = c(2,4,6))
# This works
df %>%
mutate(diff = `1` - `2`)
#> # A tibble: 3 x 3
#> `1` `2` diff
#> <dbl> <dbl> <dbl>
#> 1 1 2 -1
#> 2 2 4 -2
#> 3 3 6 -3
# But this doesn't
calc_diffs <- function(x, y){
df %>%
mutate(diff := !!x - !!y)
}
calc_diffs(1, 2)
#> # A tibble: 3 x 3
#> `1` `2` diff
#> <dbl> <dbl> <dbl>
#> 1 1 2 -1
#> 2 2 4 -1
#> 3 3 6 -1
<sup>Created on 2020-10-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

We can convert to a symbol and evaluate
calc_diffs <- function(x, y){
df %>%
mutate(diff := !! rlang::sym(x) - !!rlang::sym(y))
}
Then, we just pass a string as argument
calc_diffs("1", "2")
# A tibble: 3 x 3
# `1` `2` diff
# <dbl> <dbl> <dbl>
#1 1 2 -1
#2 2 4 -2
#3 3 6 -3
Column names are strings. We could pass index to subset the column, but here the column name is an unusual name that starts with number. So, either we can wrap it with backreference using paste or just pass a string, convert to symbol and evaluate (!!)

Does this work:
> df <- tibble(`1` = c(1,2,3), `2` = c(2,4,6))
> df
# A tibble: 3 x 2
`1` `2`
<dbl> <dbl>
1 1 2
2 2 4
3 3 6
> calc_diffs <- function(x, y){
+ df %>%
+ mutate(diff = {{x}} - {{y}})
+ }
> calc_diffs(`1`,`2`)
# A tibble: 3 x 3
`1` `2` diff
<dbl> <dbl> <dbl>
1 1 2 -1
2 2 4 -2
3 3 6 -3
>

Winners within pairs; or vector-valued group_by mutate?

I'm trying to assess which unit in a pair is the "winner". group_by() %>% mutate() is close to the right thing, but it's not quite there. in particular
dat %>% group_by(pair) %>% mutate(winner = ifelse(score[1] > score[2], c(1, 0), c(0, 1))) doesn't work.
The below does, but is clunky with an intermediate summary data frame. Can we improve this?
library(tidyverse)
set.seed(343)
# units within pairs get scores
dat <-
data_frame(pair = rep(1:3, each = 2),
unit = rep(1:2, 3),
score = rnorm(6))
# figure out who won in each pair
summary_df <-
dat %>%
group_by(pair) %>%
summarize(winner = which.max(score))
# merge back and determine whether each unit won
dat <-
left_join(dat, summary_df, "pair") %>%
mutate(won = as.numeric(winner == unit))
dat
#> # A tibble: 6 x 5
#> pair unit score winner won
#> <int> <int> <dbl> <int> <dbl>
#> 1 1 1 -1.40 2 0
#> 2 1 2 0.523 2 1
#> 3 2 1 0.142 1 1
#> 4 2 2 -0.847 1 0
#> 5 3 1 -0.412 1 1
#> 6 3 2 -1.47 1 0
Created on 2018-09-26 by the reprex
package (v0.2.0).
maybe related to Weird group_by + mutate + which.max behavior

You could do:
dat %>%
group_by(pair) %>%
mutate(won = score == max(score),
winner = unit[won == TRUE]) %>%
# A tibble: 6 x 5
# Groups: pair [3]
pair unit score won winner
<int> <int> <dbl> <lgl> <int>
1 1 1 -1.40 FALSE 2
2 1 2 0.523 TRUE 2
3 2 1 0.142 TRUE 1
4 2 2 -0.847 FALSE 1
5 3 1 -0.412 TRUE 1
6 3 2 -1.47 FALSE 1

Using rank:
dat %>% group_by(pair) %>% mutate(won = rank(score) - 1)
More for fun (and slightly faster), using the outcome of the comparison (score[1] > score[2]) to index a vector with 'won alternatives' :
dat %>% group_by(pair) %>%
mutate(won = c(0, 1, 0)[1:2 + (score[1] > score[2])])

How to separate a column list of fixed size X to X different columns?

I have a tibble with one column being a list column, always having two numeric values named a and b (e.g. as a result of calling purrr:map to a function which returns a list), say:
df <- tibble(x = 1:3, y = list(list(a = 1, b = 2), list(a = 3, b = 4), list(a = 5, b = 6)))
df
# A tibble: 3 × 2
x y
<int> <list>
1 1 <list [2]>
2 2 <list [2]>
3 3 <list [2]>
How do I separate the list column y into two columns a and b, and get:
df_res <- tibble(x = 1:3, a = c(1,3,5), b = c(2,4,6))
df_res
# A tibble: 3 × 3
x a b
<int> <dbl> <dbl>
1 1 1 2
2 2 3 4
3 3 5 6
Looking for something like tidyr::separate to deal with a list instead of a string.

Using dplyr (current release: 0.7.0):
bind_cols(df[1], bind_rows(df$y))
# # A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
# 1 1 1 2
# 2 2 3 4
# 3 3 5 6
edit based on OP's comment:
To embed this in a pipe and in case you have many non-list columns, we can try:
df %>% select(-y) %>% bind_cols(bind_rows(df$y))

We could also make use the map_df from purrr
library(tidyverse)
df %>%
summarise(x = list(x), new = list(map_df(.$y, bind_rows))) %>%
unnest
# A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
#1 1 1 2
#2 2 3 4
#3 3 5 6

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to keep other values unchanged with dplyr's recode_factor - r

Need to make your replacement the same type as the original value. test %>% mutate(test_vec = recode_factor(test_vec, "3" = "4")) # A tibble: 3 x 1 test_vec <fct> 1 1 2 2 3 4

Using fct_recode library(forcats) library(dplyr) test %>% mutate(test_vec = fct_recode(test_vec, `4` = '3')) -output # A tibble: 3 x 1 # test_vec # <fct> #1 1 #2 2 #3 4

So that you don't get missing NA values, you have to list the other values in the function as well. test %>% mutate(test_vec = recode_factor(test_vec, `1` = 1, `2` = 2, `3` = 4)) Result # A tibble: 3 x 1 test_vec <fct> 1 1 2 2 3 4

Related

How can a table be rearranged one step at a time so that two or more observations are listed in a row in successive columns?

How to transform a tibble from one column to two columns with repeated observations

How to mutate a variable in a tibble based on columns named as numbers in R

Winners within pairs; or vector-valued group_by mutate?

How to separate a column list of fixed size X to X different columns?

Categories

Resources