R Mutate and Count Backwards with a group - r

For the following dataframe:
A<-c('A','A','A','B','B','B','B','B','C','C','C','C','D','D','D','D','D','D')
A<-data.frame(A)
How do you add a column to count backwards, each time the group for 'A' changes....as in:
Desired Output:
desired_output<-c(3,2,1,6,5,4,3,2,1,4,3,2,1,6,5,4,3,2,1)
desired_output<-data.frame(desired_output)
Thanks for your help.

We can use rev on the row_number() after grouping by 'A'
library(dplyr)
A <- A %>%
group_by(A) %>%
mutate(desired = rev(row_number())) %>%
ungroup
-output
# A tibble: 18 x 2
# A desired
# <chr> <int>
# 1 A 3
# 2 A 2
# 3 A 1
# 4 B 5
# 5 B 4
# 6 B 3
# 7 B 2
# 8 B 1
# 9 C 4
#10 C 3
#11 C 2
#12 C 1
#13 D 6
#14 D 5
#15 D 4
#16 D 3
#17 D 2
#18 D 1
Or another option is create the sequence with : to 1
A %>%
group_by(A) %>%
mutate(desired = n():1) %>%
ungroup

Related

Cumulative count by group over time in r

I know this might be a simple operation but I can't find a solution. I know it should be some form of group_by and sum or cumsum, but I cant figure out how. I want to plot a cumulative count of something by group over time. I have multiple rows per group and time that need to be counted (and some missing data).
My dataset looks somewhat like this
df <- data.frame(group = c("A","A","A","A","B","B","B","C","C","C","C","C"),
time = c(1,1,2,3,1,2,2,1,2,2,3,3))
and I want this result:
group time count
A 1 2
A 2 3
A 3 4
B 1 1
B 2 3
C 1 1
C 2 3
C 3 5
I am usually use dplyr, but I am also happy with base R.
How do I do that?
You can use the following solution:
library(dplyr)
df %>%
group_by(group, time) %>%
add_count() %>%
distinct() %>%
group_by(group) %>%
mutate(n = cumsum(n))
# A tibble: 8 x 3
# Groups: group [3]
group time n
<chr> <dbl> <int>
1 A 1 2
2 A 2 3
3 A 3 4
4 B 1 1
5 B 2 3
6 C 1 1
7 C 2 3
8 C 3 5
We can use summarise with group_by
library(dplyr)
df %>%
group_by(group, time) %>%
summarise(count = n()) %>%
group_by(group) %>%
mutate(count = cumsum(count)) %>%
ungroup
-output
# A tibble: 8 x 3
group time count
<chr> <dbl> <int>
1 A 1 2
2 A 2 3
3 A 3 4
4 B 1 1
5 B 2 3
6 C 1 1
7 C 2 3
8 C 3 5
You can use count and cumsum -
library(dplyr)
df %>%
count(group, time, name = 'count') %>%
group_by(group) %>%
mutate(count = cumsum(count)) %>%
ungroup
# group time count
# <chr> <dbl> <int>
#1 A 1 2
#2 A 2 3
#3 A 3 4
#4 B 1 1
#5 B 2 3
#6 C 1 1
#7 C 2 3
#8 C 3 5

Keeping all NAs in dplyr distinct function

I have a data.frame (the eBird basic dataset) where many observers may upload a record from a same sighting to a database, in this case, the event is given a "group identifier"; when not from a group session, a NA will appear in the database; so I'm trying to filter out all those duplicates from group events and keep all NAs, I'm trying to do this without splitting the dataframe in two:
library(dplyr)
set.seed(1)
df <- tibble(
x = sample(c(1:6, NA), 30, replace = T),
y = sample(c(letters[1:4]), 30, replace = T)
)
df %>% count(x,y)
gives:
> df %>% count(x,y)
# A tibble: 20 x 3
x y n
<int> <chr> <int>
1 1 a 1
2 1 b 2
3 2 a 1
4 2 b 1
5 2 c 1
6 2 d 3
7 3 a 1
8 3 b 1
9 3 c 4
10 4 d 1
11 5 a 1
12 5 b 2
13 5 c 1
14 5 d 1
15 6 a 1
16 6 c 2
17 NA a 1
18 NA b 2
19 NA c 2
20 NA d 1
I want no NA at x to be grouped together, as here happened with "NA b" and "NA c" combinations; distinct function has no information on not taking NAs into the computation; is splitting the dataframe the only solution?
With distinct an option is to create a new column based on the NA elements in 'x'
library(dplyr)
df %>%
mutate(x1 = row_number() * is.na(x)) %>%
distinct %>%
select(-x1)
Or we can use duplicated with an OR (|) condition to return all NA elements in 'x' with filter
df %>%
filter(is.na(x)|!duplicated(cur_data()))
# A tibble: 20 x 2
# x y
# <int> <chr>
# 1 1 b
# 2 4 b
# 3 NA a
# 4 1 d
# 5 2 c
# 6 5 a
# 7 NA d
# 8 3 c
# 9 6 b
#10 2 b
#11 3 b
#12 1 c
#13 5 d
#14 2 d
#15 6 d
#16 2 a
#17 NA c
#18 NA a
#19 1 a
#20 5 b

How to create column, from the cumulative column in r?

df <- data.frame(dat=c("11-03","12-03","13-03"),
c=c(0,15,20,4,19,21,2,10,14), d=rep(c("A","B","C"),each=3))
suppose c has the cumulative values. I want to create a column daily that will look like
dat c d daily
1 11-03 0 A 0
2 12-03 15 A 15
3 13-03 20 A 5
4 11-03 4 B 4
5 12-03 19 B 15
6 13-03 21 B 2
7 11-03 2 C 2
8 12-03 10 C 8
9 13-03 14 C 4
for each value of d and dat (date wise) a daily change in value is generated from the column c has that cumulative value.
We can get the diff of 'c' after grouping by 'd'
library(dplyr)
df %>%
group_by(d) %>%
mutate(daily = c(first(c), diff(c)))
# A tibble: 9 x 4
# Groups: d [3]
# dat c d daily
# <fct> <dbl> <fct> <dbl>
#1 11-03 0 A 0
#2 12-03 15 A 15
#3 13-03 20 A 5
#4 11-03 4 B 4
#5 12-03 19 B 15
#6 13-03 21 B 2
#7 11-03 2 C 2
#8 12-03 10 C 8
#9 13-03 14 C 4
Or do the difference between the 'c' and the lag of 'c'
df %>%
group_by(d) %>%
mutate(daily = c - lag(c))
Data.table solution:
df <- as.data.table(df)
df[, daily:= c - shift(c, fill = 0),by=d]
Shift is datatable's lag operator, so basically we subtract from C its previous value within each group.
fill = 0 replaces NAs with zeros, because within each group, there is no previous value (shift(c)) for the first element.

restructuring multiple columns in R

Here is a sample of my data:
dat<-read.table(text=" id bx1 Z1A Z1B Z1C QR1 bx2 Z2A Z2B Z2C QR2
1 1 1 2 3 C 18 2 2 1 E
2 11 2 3 3 B 14 3 3 3 A
",header=TRUE)
I want to get the following table:
id bx Z QR Score
1 1 Z1A C 1
1 1 Z1B C 2
1 1 Z1C C 3
1 18 Z2A E 2
1 18 Z2B E 2
1 18 Z2C E 1
2 11 Z1A B 2
2 11 Z1B B 3
2 11 Z1C B 3
2 14 Z2A A 3
2 14 Z2B A 3
2 14 Z2C A 3
Assuming that I have more bxs and Zs and I have done this, but it does not work. I would like to do it with tidyverse or other pakages. I was unable to find out a solution.
df1<-melt(dat, id.var= "id")
Thanks for your help
In this case, we can use a left_join after separately doing the pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
dat %>%
select(id, starts_with('Z')) %>%
pivot_longer(cols = starts_with('Z'), values_to = 'Score',
names_to = 'Z') %>%
group_by(id) %>%
mutate(group = as.character(as.integer(factor(str_remove(Z, "[A-Z]$"))))) %>%
left_join(dat %>%
select(id, matches('^[^Z]')) %>%
pivot_longer(cols = -id, names_to = c(".value", "group"),
names_pattern = "^([A-Za-z]+)([0-9]+)")) %>%
select(-group)
# A tibble: 12 x 5
# Groups: id [2]
# id Z Score bx QR
# <int> <chr> <int> <int> <fct>
# 1 1 Z1A 1 1 C
# 2 1 Z1B 2 1 C
# 3 1 Z1C 3 1 C
# 4 1 Z2A 2 18 E
# 5 1 Z2B 2 18 E
# 6 1 Z2C 1 18 E
# 7 2 Z1A 2 11 B
# 8 2 Z1B 3 11 B
# 9 2 Z1C 3 11 B
#10 2 Z2A 3 14 A
#11 2 Z2B 3 14 A
#12 2 Z2C 3 14 A
Or another option is to do a single pivot_longer and then fill the selected columns
dat %>%
pivot_longer(cols = -id, names_to = c(".value", "group"),
names_pattern = "^([A-Za-z]+)([0-9]+[A-Z]?)") %>%
group_by(id) %>%
fill(bx, QR) %>%
ungroup %>%
filter(!is.na(Z)) %>%
rename_at(vars(Z, group), ~ c('Score', 'Z')) %>%
mutate(Z = str_c('Z', Z))
# A tibble: 12 x 5
# id Z bx Score QR
# <int> <chr> <int> <int> <fct>
# 1 1 Z1A 1 1 C
# 2 1 Z1B 1 2 C
# 3 1 Z1C 1 3 C
# 4 1 Z2A 18 2 E
# 5 1 Z2B 18 2 E
# 6 1 Z2C 18 1 E
# 7 2 Z1A 11 2 B
# 8 2 Z1B 11 3 B
# 9 2 Z1C 11 3 B
#10 2 Z2A 14 3 A
#11 2 Z2B 14 3 A
#12 2 Z2C 14 3 A

Summarise all using which on other column in dplyr [duplicate]

This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
For some reason, I could not find a solution using the summarise_all function for the following problem:
df <- data.frame(A = c(1,2,2,3,3,3,4,4), B = 1:8, C = 8:1, D = c(1,2,3,1,2,5,10,9))
desired results:
df %>%
group_by(A) %>%
summarise(B = B[which.min(D)],
C = C[which.min(D)],
D = D[which.min(D)])
# A tibble: 4 x 4
A B C D
<dbl> <int> <int> <dbl>
1 1 1 8 1
2 2 2 7 2
3 3 4 5 1
4 4 8 1 9
What I tried:
df %>%
group_by(A) %>%
summarise_all(.[which.min(D)])
In words, I want to group by a variable and find for each column the value that belongs to the minimum value of another column. I could not find a solution for this using summarise_all. I am searching for a dplyr approach.
You can just filter down to the row that has a minimum value of D for each level of A. The code below assumes there is only one minimum row in each group.
df %>%
group_by(A) %>%
arrange(D) %>%
slice(1)
A B C D
1 1 1 8 1
2 2 2 7 2
3 3 4 5 1
4 4 8 1 9
If there can be multiple rows with minimum D, then:
df <- data.frame(A = c(1,2,2,3,3,3,4,4), B = 1:8, C = 8:1, D = c(1,2,3,1,2,5,9,9))
df %>%
group_by(A) %>%
filter(D == min(D))
A B C D
1 1 1 8 1
2 2 2 7 2
3 3 4 5 1
4 4 7 2 9
5 4 8 1 9
You need filter - any time you're trying to drop some rows and keep others, that's the verb you want.
df %>% group_by(A) %>% filter(D == min(D))
#> # A tibble: 4 x 4
#> # Groups: A [4]
#> A B C D
#> <dbl> <int> <int> <dbl>
#> 1 1 1 8 1
#> 2 2 2 7 2
#> 3 3 4 5 1
#> 4 4 8 1 9

Resources