R: How to keep names while unnesting doubled nested tibble? - r

at the moment I'm trying to figure out how to keep the names of an inner and other list nested within a tibble while unnesting.
The .id parameter of the unnest function is the closest I found, but it starts to number the values instead of using the given names.
here is a MWE with my idea of the final tibble:
library(dplyr)
library(tidyr)
df.1 <- tibble(
x = list("Foo","Bar"),
y = list(
list(a = list(aa = 1, ab = 2), b = list(ba = 6, bb = 22)),
list(c = list(ca = 561, cb = 35), d = list(da = 346, db = 17))
)
)
df.2 <- unnest(df.1, .id = "name.outher")
df.3 <- unnest(df.2, .id = "name.inner")
# How do I get from this:
#
#-----------------------------------------------------------------------
# x | y |
#-----+----------------------------------------------------------------+
# Foo | list(a = list(aa = 1, ab = 2), b = list(ba = 6, bb = 22)) |
#-----+----------------------------------------------------------------+
# Bar | list(c = list(ca = 561, cb = 35), d = list(da = 346, db = 17)) |
#-----------------------------------------------------------------------
#
# to this:
#
#---------------------------------------
# x | name.outher | y | name.inner |
#-----+-------------+-----+------------+
# Foo | a | 1 | aa |
#-----+-------------+-----+------------+
# Foo | a | 2 | ab |
#-----+-------------+-----+------------+
# Foo | b | 6 | ba |
#-----+-------------+-----+------------+
# Foo | b | 22 | bb |
#-----+-------------+-----+------------+
# Bar | c | 561 | ca |
#-----+-------------+-----+------------+
# Bar | c | 35 | cb |
#-----+-------------+-----+------------+
# Bar | d | 346 | da |
#-----+-------------+-----+------------+
# Bar | d | 17 | db |
#-------------------------------------
#
# instead of this:
#
#---------------------------------------
# x | name.outher | y | name.inner |
#-----+-------------+-----+------------+
# Foo | 1 | 1 | 1 |
#-----+-------------+-----+------------+
# Foo | 1 | 2 | 1 |
#-----+-------------+-----+------------+
# Foo | 1 | 6 | 2 |
#-----+-------------+-----+------------+
# Foo | 1 | 22 | 2 |
#-----+-------------+-----+------------+
# Bar | 2 | 561 | 3 |
#-----+-------------+-----+------------+
# Bar | 2 | 35 | 3 |
#-----+-------------+-----+------------+
# Bar | 2 | 346 | 4 |
#-----+-------------+-----+------------+
# Bar | 2 | 17 | 4 |
#---------------------------------------
Do you have any idea how i can preserve the names while unnesting this data structure?

We can melt
library(reshape2)
library(dplyr)
df.1 %>%
.$y %>%
melt %>%
select(x = L1, name.outher = L2, y = value, name.inner = L3)
# x name.outher y name.inner
#1 1 a 1 aa
#2 1 a 2 ab
#3 1 b 6 ba
#4 1 b 22 bb
#5 2 c 561 ca
#6 2 c 35 cb
#7 2 d 346 da
#8 2 d 17 db
Or use map and as_tibble
library(tidyverse)
df.1 %>%
pull(y) %>%
map_df(~ as_tibble(.x) %>%
map_df(~as_tibble(.x) %>%
gather(name.inner, y), .id = 'name.outer'),
.id = 'x')
# A tibble: 8 x 4
# x name.outer name.inner y
# <chr> <chr> <chr> <dbl>
#1 1 a aa 1
#2 1 a ab 2
#3 1 b ba 6
#4 1 b bb 22
#5 2 c ca 561
#6 2 c cb 35
#7 2 d da 346
#8 2 d db 17

Related

How to conditionally duplicate and edit rows in r

I need to add 2 rows to a dataframe that have the same values as existing rows. For example, below I would need to add "a" = 3 with the same "b" values as "a" = 2, going from this:
| a | b |
| --| ------|
| 1 | higha |
| 1 | lowa |
| 2 | highb |
| 2 | lowb |
to this:
| a | b |
| --| ------|
| 1 | higha |
| 1 | lowa |
| 2 | highb |
| 2 | lowb |
| 3 | highb |
| 3 | lowb |
A one-liner in base R would be:
`rownames<-`(rbind(df, within(df[df$a == 2,], a <- 3)), NULL)
#> a b
#> 1 1 higha
#> 2 1 lowa
#> 3 2 highb
#> 4 2 lowb
#> 5 3 highb
#> 6 3 lowb
We may use
library(dplyr)
library(tidyr)
df %>%
uncount((a == 2)+1) %>%
mutate(a = replace(a, duplicated(b) & a == 2, 3)) %>%
arrange(a)
-output
# A tibble: 6 × 2
a b
<dbl> <chr>
1 1 higha
2 1 lowa
3 2 highb
4 2 lowb
5 3 highb
6 3 lowb
Or with base R
i1 <- df$a == 2
df[nrow(df) + seq_len(sum(i1)),] <- data.frame(a = 3, b = df$b[i1])
data
df <- data.frame(a = rep(1:2, each = 2),
b = c("higha", "lowa", "highb", "lowb"))

Aggregate rows into new column based on common value in another column in R

I have two data frames
df1 is like this
| NOC | 2007 | 2008 |
|:---- |:------:| -----:|
| A | 100 | 5 |
| B | 100 | 5 |
| C | 100 | 5|
| D | 20 | 2 |
| E | 10 | 12 |
| F | 2 | 1 |
df2
| NOC | GROUP |
|:---- |:------:|
| A | aa|
| B | aa |
| C | aa |
| D | bb |
| E | bb |
| F | cc |
I would like to create a new df3 which will aggregate the columns 2007 and 2008 based on Group identity by assigning the sum of rows with the same group identity, so my df3 would look like this
NOC
2007
2008
GROUP
S2007
s2008
A
100
5
aa
300
15
B
100
5
aa
300
15
C
100
5
aa
300
15
D
20
2
bb
30
14
E
10
12
bb
30
14
F
2
1
cc
2
1
my codes are not very efficient, I first merged df1 with df2 by NOC, into df3
df3<-merge(df1, df2, by="NOC",all.x=TRUE)
then used dprl summarised into df4 and created s2007 and s2008
df3 %>%
group_by(GROUP) %>%
summarise(num = n(),
s2017 = sum(2007),s2018 = sum(2008))->df3
then I merged df1 with df3 again to create my final database
I am wondering two problems:
is there a more efficient way?
since my dataframe contains annual data 2007-2030, currently I am writing out the summarize function for each year, is there a faster way of summarize all the columns except NOC?
Thank you!
Before this, a small piece of advice, never name your columns in numeric, it may create you many glitches.
library(tidyverse)
df1 %>% left_join(df2, by = 'NOC') %>%
group_by(GROUP) %>%
mutate(across(c(`2007`, `2008`), ~sum(.), .names = 's.{.col}' ))
# A tibble: 6 x 6
# Groups: GROUP [3]
NOC `2007` `2008` GROUP s.2007 s.2008
<chr> <int> <int> <chr> <int> <int>
1 A 100 5 aa 300 15
2 B 100 5 aa 300 15
3 C 100 5 aa 300 15
4 D 20 2 bb 30 14
5 E 10 12 bb 30 14
6 F 2 1 cc 2 1

R: How to filter column as long as it contains combination of values?

I have a df like this:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
3 | B |
3 | C |
4 | D |
4 | C |
In R, how do I filter for VisitIDs as long as they contain Item A & B?
Expected Outcome:
VisitID | Item |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
2 | D |
2 | B |
I tried df %>% group_by(VisitID) %>% filter(any(Item == 'A' & Item == 'B')) but it doesn't work..
df <- read_delim("ID | Item
1 | A
1 | B
2 | A
3 | B
1 | C
4 | C
5 | B
3 | A
4 | A
5 | D", delim = "|", trim_ws = TRUE)
Since you want both "A" and "B" you can use all
library(dplyr)
df %>% group_by(VisitID) %>% filter(all(c("A", "B") %in% Item))
# VisitID Item
# <int> <chr>
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 A
#6 2 D
#7 2 B
OR if you want to use any use them separately.
df %>% group_by(VisitID) %>% filter(any(Item == 'A') && any(Item == 'B'))
An otion with data.table
library(data.table)
setDT(df)[, .SD[all(c("A", "B") %in% Item)], VisitID]

Add Previous Row to Corresponding Column by Group in R

I will post a reproducible Example.
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
I want something like this as end result.
+====+========+========+
| id | group1 | group2 |
+====+========+========+
| 1 | a | b |
+----+--------+--------+
| 1 | b | c |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
| 2 | a | b |
+----+--------+--------+
| 2 | b | - |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
Just to mention the order of ID's matter. I have another column as timestamp.
One solution with dplyr and rleid from data.table:
library(dplyr)
df %>%
mutate(id2 = data.table::rleid(id)) %>%
group_by(id2) %>%
mutate(group2 = lead(group))
# A tibble: 8 x 4
# Groups: id2 [3]
id group id2 group2
<dbl> <fct> <int> <fct>
1 1.00 a 1 b
2 1.00 b 1 c
3 1.00 c 1 d
4 1.00 d 1 NA
5 2.00 a 2 b
6 2.00 b 2 NA
7 1.00 c 3 d
8 1.00 d 3 NA
If I understood correct your question, you can use the following function:
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
add_group2 <- function(df) {
n <-length(group)
group2 <- as.character(df$group[2:n])
group2 <- c(group2, "-")
group2[which(c(df$id[-n] - c(df$id[2:n]), 0) != 0)] <- "-"
return(data.frame(df, group2))
}
add_group2(df)
Result should be:
id group group2
1 1 a b
2 1 b c
3 1 c d
4 1 d -
5 2 a b
6 2 b -
7 1 c d
8 1 d -

R data.table get unique id with maximum other id

I have a data.table like so:
id | id2 | val
--------------
1 | 1 | A
1 | 2 | B
2 | 3 | C
2 | 4 | D
3 | 5 | E
3 | 6 | F
I want to group by the id column, and return the maximum id2 for that `id. Like so:
id | id2 | val
--------------
1 | 2 | B
2 | 4 | D
3 | 6 | F
It's easy in SQL:
SELECT id, MAX(id2) FROM tbl GROUP BY id;
But I want to know how to do this with data.table. So far I have:
tbl[, .(id2 = max(id2)), by = id]
but I don't know how to get the val part.
df <- read.table(header = T, text = "id id2 val
1 1 A
1 2 B
2 3 C
2 4 D
3 5 E
3 6 F")
library(data.table)
setDT(df)
df[, max_id2 := max(id2), by = id]
df <- df[id2 == max_id2, ]
df[, max_id2 := NULL]
id id2 val
1: 1 2 B
2: 2 4 D
3: 3 6 F

Resources