Matching of positive to negative numbers in the same group - R - r

I would like to do the following thing:
id calendar_week value
1 1 10
2 2 2
3 2 -2
4 2 3
5 3 10
6 3 -10
The output which I want is the list of id (or the rows) which have a positiv to negative match for a given calendar_week -> which means I want for example the id 2 and 3 because there is a match of -2 to 2 in Calendar week 2. I don't want id 4 because there is no -3 value in calendar week 2 and so on.
output:
id calendar_week value
2 2 2
3 2 -2
5 3 10
6 3 -10

Could also do:
library(dplyr)
df %>%
group_by(calendar_week, ab = abs(value)) %>%
filter(n() > 1) %>% ungroup() %>%
select(-ab)
Output:
# A tibble: 4 x 3
id calendar_week value
<int> <int> <int>
1 2 2 2
2 3 2 -2
3 5 3 10
4 6 3 -10
Given your additional clarifications, you could do:
df %>%
group_by(calendar_week, value) %>%
mutate(idx = row_number()) %>%
group_by(calendar_week, idx, ab = abs(value)) %>%
filter(n() > 1) %>% ungroup() %>%
select(-idx, -ab)
On a modified data frame:
id calendar_week value
1 1 1 10
2 2 2 2
3 3 2 -2
4 3 2 2
5 4 2 3
6 5 3 10
7 6 3 -10
8 7 4 10
9 8 4 10
This gives:
# A tibble: 4 x 3
id calendar_week value
<int> <int> <int>
1 2 2 2
2 3 2 -2
3 5 3 10
4 6 3 -10

Using tidyverse :
library(tidyverse)
df %>%
group_by(calendar_week) %>%
nest() %>%
mutate(values = map_chr(data, ~ str_c(.x$value, collapse = ', '))) %>%
unnest() %>%
filter(str_detect(values, as.character(-value))) %>%
select(-values)
Output :
calendar_week id value
<dbl> <int> <dbl>
1 2 2 2
2 2 3 -2
3 3 5 10
4 3 6 -10

If as stated in the comments only a single match is required you could try:
library(dplyr)
df %>%
group_by(calendar_week, nvalue = abs(value)) %>%
filter(!duplicated(value)) %>%
filter(sum(value) == 0) %>%
ungroup() %>%
select(-nvalue)
id calendar_week value
<int> <int> <int>
1 2 2 2
2 3 2 -2
3 5 3 -10
4 6 3 10

Related

How to count data frame elements grouped by multiple conditions in dplyr?

I am trying to use dplyr to count elements grouped by multiple conditions (columns) in a data frame. In the below example (dataframe output is at the top (except that I manually inserted the 2 right-most columns to explain what I am trying to do), and R code is underneath), I am trying to count the joint groupings of the Element and Group columns. My multiple condition grouping attempt is eleGrpCnt. Any recommendations for the correct way to do this in dplyr? I thought that group_by a combined (Element, Group) would work.
desired
Element Group origOrder eleCnt eleGrpCnt eleGrpCnt explanation
<chr> <dbl> <int> <int> <int> <comment> <comment>
1 B 0 1 1 1 1 1st grouping of B where Group = 0
2 R 0 2 1 1 1 1st grouping of R where Group = 0
3 R 1 3 2 1 2 2nd grouping of R where Group = 1
4 R 1 4 3 2 2 2nd grouping of R where Group = 1
5 B 0 5 2 2 1 1st grouping of B where Group = 0
6 X 2 6 1 1 1 1st grouping of X where Group = 2
7 X 2 7 2 2 1 1st grouping of X where Group = 2
8 X 0 8 3 1 2 2nd grouping of X where Group = 0
9 X 0 9 4 2 2 2nd grouping of X where Group = 0
10 X -1 10 5 1 3 3rd grouping of X where Group = -1
library(dplyr)
myData6 <-
data.frame(
Element = c("B","R","R","R","B","X","X","X","X","X"),
Group = c(0,0,1,1,0,2,2,0,0,-1)
)
myData6 %>%
mutate(origOrder = row_number()) %>%
group_by(Element) %>%
mutate(eleCnt = row_number()) %>%
ungroup() %>%
group_by(Element, Group) %>%
mutate(eleGrpCnt = row_number())%>%
ungroup()
If you group by element then the numbers you are looking for are simply the matches of Group against the unique values of Group:
library(dplyr)
myData6 %>%
mutate(origOrder = row_number()) %>%
group_by(Element) %>%
mutate(eleCnt = row_number()) %>%
ungroup() %>%
group_by(Element) %>%
mutate(eleGrpCnt = match(Group, unique(Group)))
#> # A tibble: 10 x 5
#> # Groups: Element [3]
#> Element Group origOrder eleCnt eleGrpCnt
#> <chr> <dbl> <int> <int> <dbl>
#> 1 B 0 1 1 1
#> 2 R 0 2 1 1
#> 3 R 1 3 2 2
#> 4 R 1 4 3 2
#> 5 B 0 5 2 1
#> 6 X 2 6 1 1
#> 7 X 2 7 2 1
#> 8 X 0 8 3 2
#> 9 X 0 9 4 2
#> 10 X -1 10 5 3
Created on 2022-09-11 with reprex v2.0.2
Here's one approach; I'm sorting by Group value but if you want to change the order to match original appearance order we could add a step.
myData6 %>%
mutate(origOrder = row_number()) %>%
group_by(Element) %>%
mutate(eleCnt = row_number()) %>%
ungroup() %>%
arrange(Element, Group) %>%
group_by(Element) %>%
mutate(eleGrpCnt = cumsum(Group != lag(Group, default = -999))) %>%
ungroup() %>%
arrange(origOrder)
# A tibble: 10 × 5
Element Group origOrder eleCnt eleGrpCnt
<chr> <dbl> <int> <int> <int>
1 B 0 1 1 1
2 R 0 2 1 1
3 R 1 3 2 2
4 R 1 4 3 2
5 B 0 5 2 1
6 X 2 6 1 3
7 X 2 7 2 3
8 X 0 8 3 2
9 X 0 9 4 2
10 X -1 10 5 1

DPLYR - merging rows together using a column value as a conditional

I have a series of rows in a single dataframe. I'm trying to aggregate the first two rows for each ID- i.e. - I want to combine events 1 and 2 for ID 1 into a single row, events 1 and 2 for ID 2 into a singlw row etc, but leave event 3 completely untouched.
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
event <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
score <- c(3,NA,1,3,NA,2,6,NA,1,8,NA,2,4,NA,1)
score2 <- c(NA,4,1,NA,5,2,NA,0,3,NA,5,6,NA,8,7)
df <- tibble(id, event, score, score2)
# A tibble: 15 x 4
id event score score2
<dbl> <dbl> <dbl> <dbl>
1 1 1 3 NA
2 1 2 NA 4
3 1 3 1 1
4 2 1 3 NA
5 2 2 NA 5
6 2 3 2 2
7 3 1 6 NA
8 3 2 NA 0
9 3 3 1 3
10 4 1 8 NA
11 4 2 NA 5
12 4 3 2 6
13 5 1 4 NA
14 5 2 NA 8
15 5 3 1 7
I've tried :
df_merged<- df %>% group_by (id) %>% summarise_all(funs(min(as.character(.),na.rm=TRUE))),
which aggregates these nicely, but then I struggle to merge these back into the orignal dataframe/tibble (there are really about 300 different "score" columns in the full dataset, so a right_join is a headache with score.x, score.y, score2.x, score2.y all over the place...)
Ideally, the situation would need to be dplyr as the rest of my code runs on this!
EDIT:
Ideally, my expected output would be:
# A tibble: 10 x 4
id event score score2
<dbl> <dbl> <dbl> <dbl>
1 1 1 3 4
3 1 3 1 1
4 2 1 3 5
6 2 3 2 2
7 3 1 6 0
9 3 3 1 3
10 4 1 8 5
12 4 3 2 6
13 5 1 4 8
15 5 3 1 7
We may change the order of NA elements with replace
library(dplyr)
df %>%
group_by(id) %>%
mutate(across(starts_with('score'),
~replace(., 1:2, .[1:2][order(is.na(.[1:2]))]))) %>%
ungroup %>%
filter(if_all(starts_with('score'), Negate(is.na)))
-output
# A tibble: 10 x 4
id event score score2
<dbl> <dbl> <dbl> <dbl>
1 1 1 3 4
2 1 3 1 1
3 2 1 3 5
4 2 3 2 2
5 3 1 6 0
6 3 3 1 3
7 4 1 8 5
8 4 3 2 6
9 5 1 4 8
10 5 3 1 7
Here is an alternative way to achieve your task with fill from tidyr package:
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
fill(everything(), .direction = "down") %>%
fill(everything(), .direction = "up") %>%
slice(1,3)
id event score score2
<dbl> <dbl> <dbl> <dbl>
1 1 1 3 4
2 1 3 1 1
3 2 1 3 5
4 2 3 2 2
5 3 1 6 0
6 3 3 1 3
7 4 1 8 5
8 4 3 2 6
9 5 1 4 8
10 5 3 1 7
How about this?
library(dplyr)
df_e12 <- df %>%
filter(event %in% c(1, 2)) %>%
group_by(id) %>%
mutate(across(starts_with("score"), ~min(.x, na.rm = TRUE))) %>%
ungroup() %>%
distinct(id, .keep_all = TRUE)
df_e3 <- df %>%
filter(event == 3)
df <- bind_rows(df_e12, df_e3) %>%
arrange(id, event)
df
> df
# A tibble: 10 x 4
id event score score2
<dbl> <dbl> <dbl> <dbl>
1 1 1 3 4
2 1 3 1 1
3 2 1 3 5
4 2 3 2 2
5 3 1 6 0
6 3 3 1 3
7 4 1 8 5
8 4 3 2 6
9 5 1 4 8
10 5 3 1 7

dplyr: replace grouping values with 1 through N groups [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have data where each row represents one observation from one person. For example:
library(dplyr)
dat <- tibble(ID = rep(sample(1111:9999, 3), each = 3),
X = 1:9)
# A tibble: 9 x 2
ID X
<int> <int>
1 9573 1
2 9573 2
3 9573 3
4 7224 4
5 7224 5
6 7224 6
7 7917 7
8 7917 8
9 7917 9
I want to replace these IDs with a different value. It can be anything, but the easiest (and preferred) solutions is just to replace with 1:n groups. So the desired solution would be:
# A tibble: 9 x 2
ID X
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
7 3 7
8 3 8
9 3 9
Probably something that starts with:
dat %>%
group_by(IID) %>%
???
A fast option would be match
library(dplyr)
dat %>%
mutate(ID = match(ID, unique(ID)))
-output
# A tibble: 9 x 2
# ID X
# <int> <int>
#1 1 1
#2 1 2
#3 1 3
#4 2 4
#5 2 5
#6 2 6
#7 3 7
#8 3 8
#9 3 9
Or use as.integer on a factor
dat %>%
mutate(ID = as.integer(factor(ID, levels = unique(ID))))
In tidyverse, we can also cur_group_id
dat %>%
group_by(ID = factor(ID, levels = unique(ID))) %>%
mutate(ID = cur_group_id()) %>%
ungroup

bootstrap by group in tibble

Suppose I have a tibble tbl_
tbl_ <- tibble(id = c(1,1,2,2,3,3), dta = 1:6)
tbl_
# A tibble: 6 x 2
id dta
<dbl> <int>
1 1 1
2 1 2
3 2 3
4 2 4
5 3 5
6 3 6
There are 3 id groups. I want to resample entire id groups 3 times with replacement. For example the resulting tibble can be:
id dta
<dbl> <int>
1 1 1
2 1 2
3 1 1
4 1 2
5 3 5
6 3 6
but not
id dta
<dbl> <int>
1 1 1
2 1 2
3 1 1
4 2 4
5 3 5
6 3 6
or
id dta
<dbl> <int>
1 1 1
2 1 1
3 2 3
4 2 4
5 3 5
6 3 6
Here is one option with sample_n and distinct
library(tidyverse)
distinct(tbl_, id) %>%
sample_n(nrow(.), replace = TRUE) %>%
pull(id) %>%
map_df( ~ tbl_ %>%
filter(id == .x)) %>%
arrange(id)
# A tibble: 6 x 2
# id dta
# <dbl> <int>
#1 1.00 1
#2 1.00 2
#3 1.00 1
#4 1.00 2
#5 3.00 5
#6 3.00 6
An option can be to get the minimum row number for each id. That row number will be used to generate random samples from wiht replace = TRUE.
library(dplyr)
tbl_ %>% mutate(rn = row_number()) %>%
group_by(id) %>%
summarise(minrow = min(rn)) ->min_row
indx <- rep(sample(min_row$minrow, nrow(min_row), replace = TRUE), each = 2) +
rep(c(0,1), 3)
tbl_[indx,]
# # A tibble: 6 x 2
# id dta
# <dbl> <int>
# 1 1.00 1
# 2 1.00 2
# 3 3.00 5
# 4 3.00 6
# 5 2.00 3
# 6 2.00 4
Note: In the above answer the number of rows for each id has been assumed as 2 but this answer can tackle any number of IDs. The hard-coded each=2 and c(0,1) needs to be modified in order to scale it up to handle more than 2 rows for each id

How to group_by(everything())

I want to count unique combinations in a dataframe using dplyr
I tried the following:
require(dplyr)
set.seed(314)
dat <- data.frame(a = sample(1:3, 100, replace = T),
b = sample(1:2, 100, replace = T),
c = sample(1:2, 100, replace = T))
dat %>% group_by(a,b,c) %>% summarise(n = n())
But to make this generic (unrelated to the names of the columns) I tried:
dat %>% group_by(everything()) %>% summarise(n = n())
Which results in:
a b c n
<int> <int> <int> <int>
1 1 1 1 6
2 1 1 2 8
3 1 2 1 13
4 1 2 2 8
5 2 1 1 7
6 2 1 2 12
7 2 2 1 14
8 2 2 2 10
9 3 1 1 3
10 3 1 2 4
11 3 2 1 7
12 3 2 2 8
Which gives the error
Error in mutate_impl(.data, dots) : `c(...)` must be a character vector
I fiddled around with different things but cannot get it to work. I know I could use names(dat) but the columns in the dataframe that need to be in the group_by() are depended on previous steps in the dplyr chain.
There is a function called group_by_all() (and in the same sense group_by_at and group_by_if )which does exactly that.
library(dplyr)
dat %>%
group_by_all() %>%
summarise(n = n())
which gives the same result,
# A tibble: 12 x 4
# Groups: a, b [?]
a b c n
<int> <int> <int> <int>
1 1 1 1 6
2 1 1 2 8
3 1 2 1 13
4 1 2 2 8
5 2 1 1 7
6 2 1 2 12
7 2 2 1 14
8 2 2 2 10
9 3 1 1 3
10 3 1 2 4
11 3 2 1 7
12 3 2 2 8
PS
packageVersion('dplyr')
#[1] ‘0.7.2’
We can use .dots
dat %>%
group_by(.dots = names(.)) %>%
summarise(n = n())
# A tibble: 12 x 4
# Groups: a, b [?]
# a b c n
# <int> <int> <int> <int>
#1 1 1 1 6
#2 1 1 2 8
#3 1 2 1 13
#4 1 2 2 8
#5 2 1 1 7
#6 2 1 2 12
#7 2 2 1 14
#8 2 2 2 10
#9 3 1 1 3
#10 3 1 2 4
#11 3 2 1 7
#12 3 2 2 8
Another option would be to use the unquote, sym approach
dat %>%
group_by(!!! rlang::syms(names(.))) %>%
summarise(n = n())
In dplyr version 1.0.0 and later, you would now use across().
library(dplyr)
dat %>%
group_by(across(everything())) %>%
summarise(n = n())
Package version:
> packageVersion("dplyr")
[1] ‘1.0.5’

Resources