Imagine we have a tibble like shown below. In theory, the first column acts simply a rownames that must have one-on-one correspondence with the columns' names.
For example, excluding the first column (row_name), the third column from the left is named G, but the the corresponding row is E.
I was wondering how we could re-order the rows (e.g., bring up row titled G two rows up) so the rows and columns match?
out <- tibble(row_name=factor(c("A","B","E","F","G")),`A`=as.character(1:5),`B`=as.character(c(2,NA,0:2)),
`G`=as.character(4:8),`E`=as.character(4:8),`F`=as.character(4:8))
# row_name A B G E F
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 A 1 2 4 4 4
#2 B 2 NA 5 5 5
#3 E 3 0 6 6 6
#4 F 4 1 7 7 7
#5 G 5 2 8 8 8
# EXPECTED OUTPUT:
# row_name A B G E F
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 A 1 2 4 4 4
#2 B 2 NA 5 5 5
#5 G 5 2 8 8 8
#3 E 3 0 6 6 6
#4 F 4 1 7 7 7
If we want to reorder the rows, use match within slice
library(dplyr)
out %>%
slice(match(names(.)[-1], row_name))
-output
# A tibble: 5 x 6
row_name A B G E F
<fct> <chr> <chr> <chr> <chr> <chr>
1 A 1 2 4 4 4
2 B 2 <NA> 5 5 5
3 G 5 2 8 8 8
4 E 3 0 6 6 6
5 F 4 1 7 7 7
Or within arrange
out %>%
arrange(factor(row_name, levels = names(.)[-1]))
-output
# A tibble: 5 x 6
row_name A B G E F
<fct> <chr> <chr> <chr> <chr> <chr>
1 A 1 2 4 4 4
2 B 2 <NA> 5 5 5
3 G 5 2 8 8 8
4 E 3 0 6 6 6
5 F 4 1 7 7 7
Related
I have a data frame with ten columns, but five columns of concern: A, B, C, D, E. I also have a list of values. What's the best way to subset the rows whose values in column A, B, C, D, OR, E is included in the list of values?
If I were only concerned with a single column, I know I can use left_join(list_of_values, df$A) but I'm not sure how to do something similar with multiple columns.
The key here is if_any.
library(tidyverse)
set.seed(26)
sample_df <- tibble(col = rep(LETTERS[1:8], each = 5),
val = sample(1:10, 40, replace = TRUE),
ID = rep(1:5, 8)) |>
pivot_wider(names_from = col, values_from = val)
sample_df
#> # A tibble: 5 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 2 3 2 3 3 4 10 2 3
#> 3 3 9 6 6 8 2 10 10 3
#> 4 4 7 6 8 9 3 5 8 3
#> 5 5 6 3 4 1 9 7 9 1
vals <- c(1, 7)
#solution
sample_df |>
filter(if_any(A:E, ~. %in% vals))
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
or any and apply with base R:
#base solution
indx <- apply(sample_df[,which(colnames(sample_df) %in% LETTERS[1:5])], 1, \(x) any(x %in% vals))
sample_df[indx,]
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
Given a list of tibbles
library(dplyr)
library(purrr)
ltb <- list(tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10))
map(ltb, ~head(., 2))
[[1]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
[[2]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
[[3]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
and another single tibble whose number of rows matches the number of elements in the above list
tib <- tibble(data1 = letters[1:3], data2 = LETTERS[1:3], data3 = letters[1:3])
> tib
# A tibble: 3 × 3
data1 data2 data3
<chr> <chr> <chr>
1 a A a
2 b B b
3 c C c
how can I bind the first row of tib to the first tibble in ltb, the second row of tib to the second tibble in ltb? Obviously, this should recycle the rows in tib to (dynamically) match the number of rows in each tibble in ltb.
So the result should look something like this
map(newltb, ~head(., 3))
[[1]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 a A a
2 2 2 a A a
3 3 3 a A a
[[2]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 b B b
2 2 2 b B b
3 3 3 b B b
[[3]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 c C c
2 2 2 c C c
3 3 3 c C c
I struggle whether to use map2 or pmap2, neither one have worked for me.
You could split tib by rows and use map2 and bind_cols like so:
library(dplyr, warn = FALSE)
library(purrr)
ltb <- list(tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10))
tib <- tibble(data1 = letters[1:3], data2 = LETTERS[1:3], data3 = letters[1:3])
tib_split <- tib %>%
split(seq(nrow(.)))
map2(ltb, tib_split, bind_cols)
#> [[1]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 a A a
#> 2 2 2 a A a
#> 3 3 3 a A a
#> 4 4 4 a A a
#> 5 5 5 a A a
#> 6 6 6 a A a
#> 7 7 7 a A a
#> 8 8 8 a A a
#> 9 9 9 a A a
#> 10 10 10 a A a
#>
#> [[2]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 b B b
#> 2 2 2 b B b
#> 3 3 3 b B b
#> 4 4 4 b B b
#> 5 5 5 b B b
#> 6 6 6 b B b
#> 7 7 7 b B b
#> 8 8 8 b B b
#> 9 9 9 b B b
#> 10 10 10 b B b
#>
#> [[3]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 c C c
#> 2 2 2 c C c
#> 3 3 3 c C c
#> 4 4 4 c C c
#> 5 5 5 c C c
#> 6 6 6 c C c
#> 7 7 7 c C c
#> 8 8 8 c C c
#> 9 9 9 c C c
#> 10 10 10 c C c
In base R, can use a for loop
for(i in seq_along(ltb)) ltb[[i]][names(tib)] <- tib[i,]
-output
> ltb
[[1]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 a A a
2 2 2 a A a
3 3 3 a A a
4 4 4 a A a
5 5 5 a A a
6 6 6 a A a
7 7 7 a A a
8 8 8 a A a
9 9 9 a A a
10 10 10 a A a
[[2]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 b B b
2 2 2 b B b
3 3 3 b B b
4 4 4 b B b
5 5 5 b B b
6 6 6 b B b
7 7 7 b B b
8 8 8 b B b
9 9 9 b B b
10 10 10 b B b
[[3]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 c C c
2 2 2 c C c
3 3 3 c C c
4 4 4 c C c
5 5 5 c C c
6 6 6 c C c
7 7 7 c C c
8 8 8 c C c
9 9 9 c C c
10 10 10 c C c
I need to know by how many integers two numeric ranges overlap. I tried using DescTools::Overlap, but the output is not what I expected.
library(DescTools)
library(tidyr)
df1 <- data.frame(ID = c('a', 'b', 'c', 'd', 'e'),
var1 = c(1, 2, 3, 4, 5),
var2 = c(9, 3, 5, 7, 11))
df1 %>% setNames(paste0(names(.), '_2')) %>% tidyr::crossing(df1) %>% filter(ID != ID_2) -> pairwise
pairwise$overlap <- DescTools::Overlap(c(pairwise$var1,pairwise$var2),c(pairwise$var1_2,pairwise$var2_2))
The output (entire column) is '10' for each row in the test dataset created above. I want the row-specific overlap for each, so the first 3 columns would be 2,3,4, respectively.
I find the easiest way to do it is using rowwise. This operation used to be disadvised, but since dplyr 1.0.0 release, it's been improved in terms of performance.
pairwise %>%
rowwise() %>%
mutate(overlap = Overlap(c(var1, var2), c(var1_2, var2_2))) %>%
ungroup()
#> # A tibble: 20 x 7
#> ID_2 var1_2 var2_2 ID var1 var2 overlap
#> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 a 1 9 b 2 3 1
#> 2 a 1 9 c 3 5 2
#> 3 a 1 9 d 4 7 3
#> 4 a 1 9 e 5 11 4
#> 5 b 2 3 a 1 9 1
#> 6 b 2 3 c 3 5 0
#> 7 b 2 3 d 4 7 0
#> 8 b 2 3 e 5 11 0
#> 9 c 3 5 a 1 9 2
#> 10 c 3 5 b 2 3 0
#> 11 c 3 5 d 4 7 1
#> 12 c 3 5 e 5 11 0
#> 13 d 4 7 a 1 9 3
#> 14 d 4 7 b 2 3 0
#> 15 d 4 7 c 3 5 1
#> 16 d 4 7 e 5 11 2
#> 17 e 5 11 a 1 9 4
#> 18 e 5 11 b 2 3 0
#> 19 e 5 11 c 3 5 0
#> 20 e 5 11 d 4 7 2
My version with apply function
pairwise$overlap <- apply(pairwise, 1,
function(x) DescTools::Overlap(as.numeric(c(x[5], x[6])),
as.numeric(c(x[2],x[3]))))
pairwise
# A tibble: 20 x 7
ID_2 var1_2 var2_2 ID var1 var2 overlap
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 a 1 9 b 2 3 1
2 a 1 9 c 3 5 2
3 a 1 9 d 4 7 3
4 a 1 9 e 5 11 4
5 b 2 3 a 1 9 1
6 b 2 3 c 3 5 0
7 b 2 3 d 4 7 0
8 b 2 3 e 5 11 0
9 c 3 5 a 1 9 2
10 c 3 5 b 2 3 0
11 c 3 5 d 4 7 1
12 c 3 5 e 5 11 0
13 d 4 7 a 1 9 3
14 d 4 7 b 2 3 0
15 d 4 7 c 3 5 1
16 d 4 7 e 5 11 2
17 e 5 11 a 1 9 4
18 e 5 11 b 2 3 0
19 e 5 11 c 3 5 0
20 e 5 11 d 4 7 2
Here's the example data I have.
df1 <- tibble(a=1:4,
b=1:4,
c=1:4,
d=1:4,
e=1:4)
# A tibble: 4 x 5
a b c d e
<int> <int> <int> <int> <int>
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
df2 <- tibble(b=1:4,
d=1:4,
e=1:4)
b d e
<int> <int> <int>
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
I would like to add the columns in common so that I can get a data frame like this
a b c d e
<int> <dbl> <int> <dbl> <dbl>
1 1 2 1 2 2
2 2 4 2 4 4
3 3 6 3 6 6
4 4 8 4 8 8
Is there an easy way to do this in R with tools like dplyr?
An easier option is to subset the first dataset 'df1' based on the column names of 'df2' (assuming all the columns in 'df2' are present in 'df1'), add those and assign back to the those in 'df1'
df1[names(df2)] <- df1[names(df2)] + df2
Or using dplyr
library(dplyr)
df1 %>%
mutate(c_across(names(df2)) + df2)
-output
# A tibble: 4 x 5
# a b c d e
# <int> <int> <int> <int> <int>
#1 1 2 1 2 2
#2 2 4 2 4 4
#3 3 6 3 6 6
#4 4 8 4 8 8
i've got some data in two columns:
# A tibble: 16 x 2
code niveau
<chr> <dbl>
1 A 1
2 1 2
3 2 2
4 3 2
5 4 2
6 5 2
7 B 1
8 6 2
9 7 2
My desired output is:
A tibble: 16 x 3
code niveau cat
<chr> <dbl> <chr>
1 A 1 A
2 1 2 A
3 2 2 A
4 3 2 A
5 4 2 A
6 5 2 A
7 B 1 B
8 6 2 B
I there a tidy way to convert these data without looping through it?
Here some dummy data:
data<-tibble(code=c('A', 1,2,3,4,5,'B', 6,7,8,9,'C',10,11,12,13), niveau=c(1, 2,2,2,2,2,1,2,2,2,2,1,2,2,2,2))
desired_output<-tibble(code=c('A', 1,2,3,4,5,'B', 6,7,8,9,'C',10,11,12,13), niveau=c(1, 2,2,2,2,2,1,2,2,2,2,1,2,2,2,2),
cat=c(rep('A', 6),rep('B', 5), rep('C', 5)))
Nicolas
Probably, you can create a new column cat and replace code values with NA where there is a number. We can then use fill to replace missing values with previous non-NA value.
library(dplyr)
data %>% mutate(cat = replace(code, grepl('\\d', code), NA)) %>% tidyr::fill(cat)
# A tibble: 16 x 3
# code niveau cat
# <chr> <dbl> <chr>
# 1 A 1 A
# 2 1 2 A
# 3 2 2 A
# 4 3 2 A
# 5 4 2 A
# 6 5 2 A
# 7 B 1 B
# 8 6 2 B
# 9 7 2 B
#10 8 2 B
#11 9 2 B
#12 C 1 C
#13 10 2 C
#14 11 2 C
#15 12 2 C
#16 13 2 C
We can use str_detect from stringr
library(dplyr)
library(stringr)
library(tidyr)
data %>%
mutate(cat = replace(code, str_detect(code, '\\d'), NA)) %>%
fill(cat)