In the example below, recoding some values makes all the other NA. How can I keep the other values unchanged?
library(tibble)
library(dplyr)
test <- tibble(
test_vec = as.factor(c(1, 2, 3))
)
test
#> # A tibble: 3 x 1
#> test_vec
#> <fct>
#> 1 1
#> 2 2
#> 3 3
test %>%
mutate(test_vec = recode_factor(test_vec, `3` = 4))
#> # A tibble: 3 x 1
#> test_vec
#> <fct>
#> 1 <NA>
#> 2 <NA>
#> 3 4
Need to make your replacement the same type as the original value.
test %>%
mutate(test_vec = recode_factor(test_vec, "3" = "4"))
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4
Using fct_recode
library(forcats)
library(dplyr)
test %>%
mutate(test_vec = fct_recode(test_vec, `4` = '3'))
-output
# A tibble: 3 x 1
# test_vec
# <fct>
#1 1
#2 2
#3 4
So that you don't get missing NA values, you have to list the other values in the function as well.
test %>%
mutate(test_vec = recode_factor(test_vec, `1` = 1, `2` = 2, `3` = 4))
Result
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4
Another way to do it is using case_when, but for this you have to start from numerical values.
I give you an example starting from numerical values and I convert them to factor.
test <- tibble(
test_vec = (c(1, 2, 3)))
test %>%
mutate(test_vec = case_when( test_vec != 3 ~ test_vec,
test_vec == 3 ~ 4)) %>%
mutate(across(test_vec,factor))
Result
# A tibble: 3 x 1
test_vec
<fct>
1 1
2 2
3 4
Related
So far I have done this to achieve the desired result:
# A tibble: 4 x 2
frag treat
<dbl> <dbl>
1 1 1
2 2 1
3 1 2
4 2 2
treat_1 <- tab_example %>% filter(treat == "1")
treat_2 <- tab_example %>% filter(treat == "2")
new_tab_example <- full_join(treat_1, treat_2, by = "frag")
> new_tab_example
# A tibble: 2 x 3
frag treat.x treat.y
<dbl> <dbl> <dbl>
1 1 1 2
2 2 1 2
Is there a way to do it in one step?
You can use pivot_wider :
tidyr::pivot_wider(tab_example, names_from = treat,
names_prefix = 'treat', values_from = treat)
# frag treat1 treat2
# <dbl> <dbl> <dbl>
#1 1 1 2
#2 2 1 2
There is a way using spread() function:
library(dplyr)
library(tidyr)
# Yours data
df = tibble(frag = c(1, 2, 1, 2), treat = c(1,1,2,2) )
dfnew = df %>%
mutate(treat_name = case_when(treat==1 ~ 'treat.x', # Build names of columns
treat==2 ~ 'treat.y')
) %>%
spread(treat_name, treat) # Use spread function
If you print the result:
print(dfnew)
# A tibble: 2 x 3
frag treat.x treat.y
<dbl> <dbl> <dbl>
1 1 1 2
2 2 1 2
I tried to transform df into df2. I have done it through a very patchy way using df3, Is there a simpler and more elegant way of doing it?
library(tidyverse)
# I want to transform df
df <- tibble(id = c(1, 2, 1, 2, 1, 2),
time = c('t1', 't1', 't2', 't2', 't3', 't3'),
value = c(2, 3, 6, 4, 5, 7))
df
#> # A tibble: 6 x 3
#> id time value
#> <dbl> <chr> <dbl>
#> 1 1 t1 2
#> 2 2 t1 3
#> 3 1 t2 6
#> 4 2 t2 4
#> 5 1 t3 5
#> 6 2 t3 7
# into df2
df2 <- tibble(id = c(1, 2, 1, 2),
t = c(2, 3, 6, 4),
r = c(6, 4, 5, 7))
df2
#> # A tibble: 4 x 3
#> id t r
#> <dbl> <dbl> <dbl>
#> 1 1 2 6
#> 2 2 3 4
#> 3 1 6 5
#> 4 2 4 7
# This is how I did it, but I think it should be a better way
df3 <- df %>% pivot_wider(names_from = time, values_from = value)
b <- tibble(id = numeric(), t = numeric(), r = numeric())
for (i in 2:3){
a <- df3[,c(1,i,i+1)]
colnames(a) <- c('id', 't', 'r')
b <- bind_rows(a, b)
}
b
#> # A tibble: 4 x 3
#> id t r
#> <dbl> <dbl> <dbl>
#> 1 1 6 5
#> 2 2 4 7
#> 3 1 2 6
#> 4 2 3 4
Created on 2020-11-25 by the reprex package (v0.3.0)
For each id you can use lead to select next value and create r column and drop NA rows.
library(dplyr)
df %>%
group_by(id) %>%
mutate(t = value,
r = lead(value)) %>%
na.omit() %>%
select(id, t, r)
# id t r
# <dbl> <dbl> <dbl>
#1 1 2 6
#2 2 3 4
#3 1 6 5
#4 2 4 7
We can use summarise from dplyr version >= 1.0. Previously, it had the constraint of returning only single observation per group. From version >= 1.0, it is no longer the case. Can return any number of rows i.e. it can be shorter or longer than the original number of rows
library(dplyr)
df %>%
group_by(id) %>%
summarise(t = value[-n()], r = value[-1], .groups = 'drop')
-output
# A tibble: 4 x 3
# id t r
# <dbl> <dbl> <dbl>
#1 1 2 6
#2 1 6 5
#3 2 3 4
#4 2 4 7
I have a tibble with columns named as numbers (e.g. 1). I created a function to compute differences between columns, but I don't know how to do it with that type of columns:
<!-- language-all: lang-r -->
library(tidyverse)
df <- tibble(`1` = c(1,2,3), `2` = c(2,4,6))
# This works
df %>%
mutate(diff = `1` - `2`)
#> # A tibble: 3 x 3
#> `1` `2` diff
#> <dbl> <dbl> <dbl>
#> 1 1 2 -1
#> 2 2 4 -2
#> 3 3 6 -3
# But this doesn't
calc_diffs <- function(x, y){
df %>%
mutate(diff := !!x - !!y)
}
calc_diffs(1, 2)
#> # A tibble: 3 x 3
#> `1` `2` diff
#> <dbl> <dbl> <dbl>
#> 1 1 2 -1
#> 2 2 4 -1
#> 3 3 6 -1
<sup>Created on 2020-10-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
We can convert to a symbol and evaluate
calc_diffs <- function(x, y){
df %>%
mutate(diff := !! rlang::sym(x) - !!rlang::sym(y))
}
Then, we just pass a string as argument
calc_diffs("1", "2")
# A tibble: 3 x 3
# `1` `2` diff
# <dbl> <dbl> <dbl>
#1 1 2 -1
#2 2 4 -2
#3 3 6 -3
Column names are strings. We could pass index to subset the column, but here the column name is an unusual name that starts with number. So, either we can wrap it with backreference using paste or just pass a string, convert to symbol and evaluate (!!)
Does this work:
> df <- tibble(`1` = c(1,2,3), `2` = c(2,4,6))
> df
# A tibble: 3 x 2
`1` `2`
<dbl> <dbl>
1 1 2
2 2 4
3 3 6
> calc_diffs <- function(x, y){
+ df %>%
+ mutate(diff = {{x}} - {{y}})
+ }
> calc_diffs(`1`,`2`)
# A tibble: 3 x 3
`1` `2` diff
<dbl> <dbl> <dbl>
1 1 2 -1
2 2 4 -2
3 3 6 -3
>
I'm trying to assess which unit in a pair is the "winner". group_by() %>% mutate() is close to the right thing, but it's not quite there. in particular
dat %>% group_by(pair) %>% mutate(winner = ifelse(score[1] > score[2], c(1, 0), c(0, 1))) doesn't work.
The below does, but is clunky with an intermediate summary data frame. Can we improve this?
library(tidyverse)
set.seed(343)
# units within pairs get scores
dat <-
data_frame(pair = rep(1:3, each = 2),
unit = rep(1:2, 3),
score = rnorm(6))
# figure out who won in each pair
summary_df <-
dat %>%
group_by(pair) %>%
summarize(winner = which.max(score))
# merge back and determine whether each unit won
dat <-
left_join(dat, summary_df, "pair") %>%
mutate(won = as.numeric(winner == unit))
dat
#> # A tibble: 6 x 5
#> pair unit score winner won
#> <int> <int> <dbl> <int> <dbl>
#> 1 1 1 -1.40 2 0
#> 2 1 2 0.523 2 1
#> 3 2 1 0.142 1 1
#> 4 2 2 -0.847 1 0
#> 5 3 1 -0.412 1 1
#> 6 3 2 -1.47 1 0
Created on 2018-09-26 by the reprex
package (v0.2.0).
maybe related to Weird group_by + mutate + which.max behavior
You could do:
dat %>%
group_by(pair) %>%
mutate(won = score == max(score),
winner = unit[won == TRUE]) %>%
# A tibble: 6 x 5
# Groups: pair [3]
pair unit score won winner
<int> <int> <dbl> <lgl> <int>
1 1 1 -1.40 FALSE 2
2 1 2 0.523 TRUE 2
3 2 1 0.142 TRUE 1
4 2 2 -0.847 FALSE 1
5 3 1 -0.412 TRUE 1
6 3 2 -1.47 FALSE 1
Using rank:
dat %>% group_by(pair) %>% mutate(won = rank(score) - 1)
More for fun (and slightly faster), using the outcome of the comparison (score[1] > score[2]) to index a vector with 'won alternatives' :
dat %>% group_by(pair) %>%
mutate(won = c(0, 1, 0)[1:2 + (score[1] > score[2])])
I have a tibble with one column being a list column, always having two numeric values named a and b (e.g. as a result of calling purrr:map to a function which returns a list), say:
df <- tibble(x = 1:3, y = list(list(a = 1, b = 2), list(a = 3, b = 4), list(a = 5, b = 6)))
df
# A tibble: 3 × 2
x y
<int> <list>
1 1 <list [2]>
2 2 <list [2]>
3 3 <list [2]>
How do I separate the list column y into two columns a and b, and get:
df_res <- tibble(x = 1:3, a = c(1,3,5), b = c(2,4,6))
df_res
# A tibble: 3 × 3
x a b
<int> <dbl> <dbl>
1 1 1 2
2 2 3 4
3 3 5 6
Looking for something like tidyr::separate to deal with a list instead of a string.
Using dplyr (current release: 0.7.0):
bind_cols(df[1], bind_rows(df$y))
# # A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
# 1 1 1 2
# 2 2 3 4
# 3 3 5 6
edit based on OP's comment:
To embed this in a pipe and in case you have many non-list columns, we can try:
df %>% select(-y) %>% bind_cols(bind_rows(df$y))
We could also make use the map_df from purrr
library(tidyverse)
df %>%
summarise(x = list(x), new = list(map_df(.$y, bind_rows))) %>%
unnest
# A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
#1 1 1 2
#2 2 3 4
#3 3 5 6