I want to use the column grp to create suffixes when using pivot_longer from tidyverse.
Say I have data like this
dta <- tibble(grp = rep(c('one', 'two'), each = 3),
date = rep(c('2022-12-31', '2021-12-31'), 3),
a = 1:6, b = 12:7)
dta
# A tibble: 6 x 4
grp date a b
<chr> <chr> <int> <int>
1 one 2022-12-31 1 12
2 one 2021-12-31 2 11
3 one 2022-12-31 3 10
4 two 2021-12-31 4 9
5 two 2022-12-31 5 8
6 two 2021-12-31 6 7
and what to get to something like this,
# A tibble: 12 x 3
date names.one values.one names.two values.two
<chr> <chr> <int> <chr> <int>
1 2022-12-31 a 1 a 4
2 2022-12-31 b 12 b 9
3 2021-12-31 a 2 a 5
4 2021-12-31 b 11 b 8
5 2022-12-31 a 3 a 6
6 2022-12-31 b 10 b 7
Related
I have a data frame with ten columns, but five columns of concern: A, B, C, D, E. I also have a list of values. What's the best way to subset the rows whose values in column A, B, C, D, OR, E is included in the list of values?
If I were only concerned with a single column, I know I can use left_join(list_of_values, df$A) but I'm not sure how to do something similar with multiple columns.
The key here is if_any.
library(tidyverse)
set.seed(26)
sample_df <- tibble(col = rep(LETTERS[1:8], each = 5),
val = sample(1:10, 40, replace = TRUE),
ID = rep(1:5, 8)) |>
pivot_wider(names_from = col, values_from = val)
sample_df
#> # A tibble: 5 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 2 3 2 3 3 4 10 2 3
#> 3 3 9 6 6 8 2 10 10 3
#> 4 4 7 6 8 9 3 5 8 3
#> 5 5 6 3 4 1 9 7 9 1
vals <- c(1, 7)
#solution
sample_df |>
filter(if_any(A:E, ~. %in% vals))
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
or any and apply with base R:
#base solution
indx <- apply(sample_df[,which(colnames(sample_df) %in% LETTERS[1:5])], 1, \(x) any(x %in% vals))
sample_df[indx,]
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
Given a list of tibbles
library(dplyr)
library(purrr)
ltb <- list(tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10))
map(ltb, ~head(., 2))
[[1]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
[[2]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
[[3]]
# A tibble: 2 × 2
a b
<int> <int>
1 1 1
2 2 2
and another single tibble whose number of rows matches the number of elements in the above list
tib <- tibble(data1 = letters[1:3], data2 = LETTERS[1:3], data3 = letters[1:3])
> tib
# A tibble: 3 × 3
data1 data2 data3
<chr> <chr> <chr>
1 a A a
2 b B b
3 c C c
how can I bind the first row of tib to the first tibble in ltb, the second row of tib to the second tibble in ltb? Obviously, this should recycle the rows in tib to (dynamically) match the number of rows in each tibble in ltb.
So the result should look something like this
map(newltb, ~head(., 3))
[[1]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 a A a
2 2 2 a A a
3 3 3 a A a
[[2]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 b B b
2 2 2 b B b
3 3 3 b B b
[[3]]
# A tibble: 3 × 2
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 c C c
2 2 2 c C c
3 3 3 c C c
I struggle whether to use map2 or pmap2, neither one have worked for me.
You could split tib by rows and use map2 and bind_cols like so:
library(dplyr, warn = FALSE)
library(purrr)
ltb <- list(tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10), tibble(a=1:10, b=1:10))
tib <- tibble(data1 = letters[1:3], data2 = LETTERS[1:3], data3 = letters[1:3])
tib_split <- tib %>%
split(seq(nrow(.)))
map2(ltb, tib_split, bind_cols)
#> [[1]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 a A a
#> 2 2 2 a A a
#> 3 3 3 a A a
#> 4 4 4 a A a
#> 5 5 5 a A a
#> 6 6 6 a A a
#> 7 7 7 a A a
#> 8 8 8 a A a
#> 9 9 9 a A a
#> 10 10 10 a A a
#>
#> [[2]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 b B b
#> 2 2 2 b B b
#> 3 3 3 b B b
#> 4 4 4 b B b
#> 5 5 5 b B b
#> 6 6 6 b B b
#> 7 7 7 b B b
#> 8 8 8 b B b
#> 9 9 9 b B b
#> 10 10 10 b B b
#>
#> [[3]]
#> # A tibble: 10 × 5
#> a b data1 data2 data3
#> <int> <int> <chr> <chr> <chr>
#> 1 1 1 c C c
#> 2 2 2 c C c
#> 3 3 3 c C c
#> 4 4 4 c C c
#> 5 5 5 c C c
#> 6 6 6 c C c
#> 7 7 7 c C c
#> 8 8 8 c C c
#> 9 9 9 c C c
#> 10 10 10 c C c
In base R, can use a for loop
for(i in seq_along(ltb)) ltb[[i]][names(tib)] <- tib[i,]
-output
> ltb
[[1]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 a A a
2 2 2 a A a
3 3 3 a A a
4 4 4 a A a
5 5 5 a A a
6 6 6 a A a
7 7 7 a A a
8 8 8 a A a
9 9 9 a A a
10 10 10 a A a
[[2]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 b B b
2 2 2 b B b
3 3 3 b B b
4 4 4 b B b
5 5 5 b B b
6 6 6 b B b
7 7 7 b B b
8 8 8 b B b
9 9 9 b B b
10 10 10 b B b
[[3]]
# A tibble: 10 × 5
a b data1 data2 data3
<int> <int> <chr> <chr> <chr>
1 1 1 c C c
2 2 2 c C c
3 3 3 c C c
4 4 4 c C c
5 5 5 c C c
6 6 6 c C c
7 7 7 c C c
8 8 8 c C c
9 9 9 c C c
10 10 10 c C c
I want to filter out all the row_number >12 from a data frame like this:
head(dat1)
# A tibble: 6 × 7
date order_id product_id row_number shelf_number shelf_level position
<date> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 2020-01-02 ES100025694747 000072489501 6 01 C 51
2 2020-01-02 ES100025694747 000058155401 2 39 B 51
3 2020-01-02 ES100025694747 000067694201 21 28 B 51
4 2020-01-02 ES100025699052 000057235001 9 05 B 31
5 2020-01-02 ES100025699052 000050456101 5 29 D 31
6 2020-01-02 ES100025699052 000067091601 2 17 D 11
The row_number orginally contains values like this:
dat1 %>% distinct(row_number)
# A tibble: 15 × 1
row_number
<dbl>
1 6
2 2
3 21
4 9
5 5
6 1
7 10
8 3
9 4
10 8
11 7
12 20
13 22
14 11
15 12
I filtered like this: dat1 <- dat1 %>% filter(row_number < '13')
The result: instead of keeping all values <13, it removes values from 2 to 9.
dat1 %>% distinct(row_number)
# A tibble: 4 × 1
row_number
<dbl>
1 1
2 10
3 11
4 12
What s wrong with my codes?
Imagine we have a tibble like shown below. In theory, the first column acts simply a rownames that must have one-on-one correspondence with the columns' names.
For example, excluding the first column (row_name), the third column from the left is named G, but the the corresponding row is E.
I was wondering how we could re-order the rows (e.g., bring up row titled G two rows up) so the rows and columns match?
out <- tibble(row_name=factor(c("A","B","E","F","G")),`A`=as.character(1:5),`B`=as.character(c(2,NA,0:2)),
`G`=as.character(4:8),`E`=as.character(4:8),`F`=as.character(4:8))
# row_name A B G E F
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 A 1 2 4 4 4
#2 B 2 NA 5 5 5
#3 E 3 0 6 6 6
#4 F 4 1 7 7 7
#5 G 5 2 8 8 8
# EXPECTED OUTPUT:
# row_name A B G E F
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 A 1 2 4 4 4
#2 B 2 NA 5 5 5
#5 G 5 2 8 8 8
#3 E 3 0 6 6 6
#4 F 4 1 7 7 7
If we want to reorder the rows, use match within slice
library(dplyr)
out %>%
slice(match(names(.)[-1], row_name))
-output
# A tibble: 5 x 6
row_name A B G E F
<fct> <chr> <chr> <chr> <chr> <chr>
1 A 1 2 4 4 4
2 B 2 <NA> 5 5 5
3 G 5 2 8 8 8
4 E 3 0 6 6 6
5 F 4 1 7 7 7
Or within arrange
out %>%
arrange(factor(row_name, levels = names(.)[-1]))
-output
# A tibble: 5 x 6
row_name A B G E F
<fct> <chr> <chr> <chr> <chr> <chr>
1 A 1 2 4 4 4
2 B 2 <NA> 5 5 5
3 G 5 2 8 8 8
4 E 3 0 6 6 6
5 F 4 1 7 7 7
Here's the example data I have.
df1 <- tibble(a=1:4,
b=1:4,
c=1:4,
d=1:4,
e=1:4)
# A tibble: 4 x 5
a b c d e
<int> <int> <int> <int> <int>
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
df2 <- tibble(b=1:4,
d=1:4,
e=1:4)
b d e
<int> <int> <int>
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
I would like to add the columns in common so that I can get a data frame like this
a b c d e
<int> <dbl> <int> <dbl> <dbl>
1 1 2 1 2 2
2 2 4 2 4 4
3 3 6 3 6 6
4 4 8 4 8 8
Is there an easy way to do this in R with tools like dplyr?
An easier option is to subset the first dataset 'df1' based on the column names of 'df2' (assuming all the columns in 'df2' are present in 'df1'), add those and assign back to the those in 'df1'
df1[names(df2)] <- df1[names(df2)] + df2
Or using dplyr
library(dplyr)
df1 %>%
mutate(c_across(names(df2)) + df2)
-output
# A tibble: 4 x 5
# a b c d e
# <int> <int> <int> <int> <int>
#1 1 2 1 2 2
#2 2 4 2 4 4
#3 3 6 3 6 6
#4 4 8 4 8 8