By group relative order - r

I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!

A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))

You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3

Related

Pasting values from a vector to a new column in a for loop with nested data

I have a dataframe that currently looks like this:
subjectID
Trial
1
3
1
3
1
3
1
4
1
4
1
5
1
5
1
5
2
1
2
1
2
3
2
3
2
3
2
5
2
5
2
6
3
1
Etc., where trial number is nested under subject ID. I need to make a new column in which column "NewTrial" is simply what order the trials now appear in. For example:
subjectID
Trial
NewTrial
1
3
1
1
3
1
1
3
1
1
4
2
1
4
2
1
5
3
1
5
3
1
5
3
2
1
1
2
1
1
2
3
2
2
3
2
2
3
2
2
5
3
2
5
3
2
6
4
3
1
1
So far, I have a for-loop written that looks like this:
for (myperson in unique(data$subjectID)){
#This line creates a vector of the number of unique trials per subject: for subject 1, c(1, 2, 3)
triallength=1:length(unique(data$Trial[data$subID==myperson]))
I'm having trouble now finding a way to paste the numbers from the created triallength vector as a column in the dataframe. Does anyone know of a way to accomplish this? I am lacking some experience with for-loops and hoping to gain more. If anyone has a tidyverse/dplyr solution, however, I am open to that as well as an alternative to a for-loop. Thanks in advance, and let me know if any clarification is needed!
Converting to factor with unique values as levels, then as.numeric in an ave should be nice.
transform(dat, NewTrial=ave(Trial, subjectID, FUN=\(x) as.numeric(factor(x, levels=unique(x)))))
# subjectID Trial NewTrial
# 1 1 3 1
# 2 1 3 1
# 3 1 3 1
# 4 1 4 2
# 5 1 4 2
# 6 1 5 3
# 7 1 5 3
# 8 1 5 3
# 9 2 1 1
# 10 2 1 1
# 11 2 3 2
# 12 2 3 2
# 13 2 3 2
# 14 2 5 3
# 15 2 5 3
# 16 2 6 4
# 17 3 1 1
Data:
dat <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), Trial = c(3L, 3L, 3L, 4L,
4L, 5L, 5L, 5L, 1L, 1L, 3L, 3L, 3L, 5L, 5L, 6L, 1L)), class = "data.frame", row.names = c(NA,
-17L))
We could use match on the unique values after grouping by 'subjectID'
library(dplyr)
df1 <- df1 %>%
group_by(subjectID) %>%
mutate(NewTrial = match(Trial, unique(Trial))) %>%
ungroup
We could use rleid:
library(dplyr)
library(data.table)
df %>%
group_by(subjectID) %>%
mutate(NewTrial = rleid(subjectID, Trial))
subjectID Trial NewTrial
<int> <int> <int>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 4 2
5 1 4 2
6 1 5 3
7 1 5 3
8 1 5 3
9 2 1 1
10 2 1 1
11 2 3 2
12 2 3 2
13 2 3 2
14 2 5 3
15 2 5 3
16 2 6 4
17 3 1 1

R: Count number of days before specific date by ID

I have data frame with multiple columns, most importantly id and date. I would like to make another column in R which will fill in every row which day of date interval is by id.
Something like this.
id date
1 12/31/2019
1 12/30/2019
2 12/26/2019
2 12/25/2019
2 12/24/2019
3 12/22/2019
3 12/21/2019
3 12/20/2019
3 12/19/2019
4 12/15/2019
4 12/14/2019
4 12/13/2019
to make like this
id date date count
1 12/31/2019 2
1 12/30/2019 1
2 12/26/2019 3
2 12/25/2019 2
2 12/24/2019 1
3 12/22/2019 4
3 12/21/2019 3
3 12/20/2019 2
3 12/19/2019 1
4 12/15/2019 3
4 12/14/2019 2
4 12/13/2019 1
One dplyr possibility could be:
df %>%
group_by(id) %>%
mutate(date_count = dense_rank(as.Date(date, format = "%m/%d/%Y")))
id date date_count
<int> <chr> <int>
1 1 12/31/2019 2
2 1 12/30/2019 1
3 2 12/26/2019 3
4 2 12/25/2019 2
5 2 12/24/2019 1
6 3 12/22/2019 4
7 3 12/21/2019 3
8 3 12/20/2019 2
9 3 12/19/2019 1
10 4 12/15/2019 3
11 4 12/14/2019 2
12 4 12/13/2019 1
We can use data.table methods
library(data.table)
setDT(df)[, date_count := frank(as.IDate(date, format = "%m/%d/%Y"),
ties.method = 'dense'), id][]
# id date date_count
# 1: 1 12/31/2019 2
# 2: 1 12/30/2019 1
# 3: 2 12/26/2019 3
# 4: 2 12/25/2019 2
# 5: 2 12/24/2019 1
# 6: 3 12/22/2019 4
# 7: 3 12/21/2019 3
# 8: 3 12/20/2019 2
# 9: 3 12/19/2019 1
#10: 4 12/15/2019 3
#11: 4 12/14/2019 2
#12: 4 12/13/2019 1
data
df <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L,
4L, 4L), date = c("12/31/2019", "12/30/2019", "12/26/2019", "12/25/2019",
"12/24/2019", "12/22/2019", "12/21/2019", "12/20/2019", "12/19/2019",
"12/15/2019", "12/14/2019", "12/13/2019")),
class = "data.frame", row.names = c(NA,
-12L))
Another data.table option:
DT[order(id, as.IDate(date, format="%m/%d/%Y")), dc := rowid(id)]

Order values within column according to values within different column by group in R

I have the following panel data set:
group i f r d
1 4 8 3 3
1 9 4 5 1
1 2 2 2 2
2 5 5 3 2
2 3 9 3 3
2 9 1 3 1
I want to reorder column i in this data frame according to values in column d for each group. So the highest value for group 1 in column i should correspond to the highest value in column d. In the end my data.frame should look like this:
group i f r d
1 9 8 3 3
1 2 4 5 1
1 4 2 2 2
2 5 5 3 2
2 9 9 3 3
2 3 1 3 1
Here is a dplyr solution.
First, group by group. Then get the permutation rearrangement of column d in a temporary new column, ord and use it to reorder i.
library(dplyr)
df1 %>%
group_by(group) %>%
mutate(ord = order(d),
i = i[ord]) %>%
ungroup() %>%
select(-ord)
## A tibble: 6 x 5
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
original (wrong)
You can achieve this using dplyr and rank:
library(dplyr)
df1 %>% group_by(group) %>%
mutate(i = i[rev(rank(d))])
Edit
This question is actually trickier than it first seems and the original answer I posted is incorrect. The correct solution orders by i before subsetting by the rank of d. This gives OP's desired output which my previous answer did not (not paying attention!)
df1 %>% group_by(group) %>%
mutate(i = i[order(i)][rank(d)])
# A tibble: 6 x 5
# Groups: group [2]
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
There is some confusion regarding the expected output. Here I am showing a way to get both the versions of the output.
A base R using split and mapply
df$i <- c(mapply(function(x, y) sort(y)[x],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
Or another version
df$i <- c(mapply(function(x, y) y[order(x)],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
We can also use dplyr for this :
For 1st version
library(dplyr)
df %>%
group_by(group) %>%
mutate(i = sort(i)[d])
2nd version is already shown by #Rui using order
df %>%
group_by(group) %>%
mutate(i = i[order(d)])
An option with data.table
library(data.table)
setDT(df1)[, i := i[order(d)], group]
df1
# group i f r d
#1: 1 9 8 3 3
#2: 1 2 4 5 1
#3: 1 4 2 2 2
#4: 2 9 5 3 2
#5: 2 5 9 3 3
#6: 2 3 1 3 1
If we need the second version
setDT(df1)[, i := sort(i)[d], group]
data
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L), i = c(4L, 9L,
2L, 5L, 3L, 9L), f = c(8L, 4L, 2L, 5L, 9L, 1L), r = c(3L, 5L,
2L, 3L, 3L, 3L), d = c(3L, 1L, 2L, 2L, 3L, 1L)), class = "data.frame",
row.names = c(NA,
-6L))

Longest consecutive count of the same value per group

I have a data.frame as below and I want to add a variable describing the longest consecutive count of 1 in the VALUE variable observed in the group (i.e. longest consecutive rows with 1 in VALUE per group).
GROUP_ID VALUE
1 0
1 1
1 1
1 1
1 1
1 0
2 1
2 1
2 0
2 1
2 1
2 1
3 1
3 0
3 1
3 0
So the output would look like this:
GROUP_ID VALUE CONSECUTIVE
1 0 4
1 1 4
1 1 4
1 1 4
1 1 4
1 0 4
2 1 3
2 1 3
2 0 3
2 1 3
2 1 3
2 1 3
3 1 1
3 0 1
3 1 1
3 0 1
Any help would be greatly appreciated!
Using dplyr:
library(dplyr)
dat %>%
group_by(GROUP_ID) %>%
mutate(CONSECUTIVE = {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])})
which gives:
# A tibble: 16 x 3
# Groups: GROUP_ID [3]
GROUP_ID VALUE CONSECUTIVE
<int> <int> <int>
1 1 0 4
2 1 1 4
3 1 1 4
4 1 1 4
5 1 1 4
6 1 0 4
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
13 3 1 1
14 3 0 1
15 3 1 1
16 3 0 1
Or with data.table:
library(data.table)
setDT(dat) # convert to a 'data.table'
dat[, CONSECUTIVE := {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])}
, by = GROUP_ID][]
We can use ave with rle and get maximum occurrence of consecutive 1's for each group. (GROUP_ID)
df$Consecutive <- ave(df$VALUE, df$GROUP_ID, FUN = function(x) {
y <- rle(x == 1)
max(y$lengths[y$values])
})
df
# GROUP_ID VALUE Consecutive
#1 1 0 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 1 4
#6 1 0 4
#7 2 1 3
#8 2 1 3
#9 2 0 3
#10 2 1 3
#11 2 1 3
#12 2 1 3
#13 3 1 1
#14 3 0 1
#15 3 1 1
#16 3 0 1
Here is another option with data.table
library(data.table)
library(dplyr)
setDT(df1)[, CONSECUTIVE := max(table(na_if(rleid(VALUE)*VALUE, 0))), .(GROUP_ID)]
df1
# GROUP_ID VALUE CONSECUTIVE
# 1: 1 0 4
# 2: 1 1 4
# 3: 1 1 4
# 4: 1 1 4
# 5: 1 1 4
# 6: 1 0 4
# 7: 2 1 3
# 8: 2 1 3
# 9: 2 0 3
#10: 2 1 3
#11: 2 1 3
#12: 2 1 3
#13: 3 1 1
#14: 3 0 1
#15: 3 1 1
#16: 3 0 1
data
df1 <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), VALUE = c(0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-16L))

Add sequence along blocks [duplicate]

This question already has answers here:
Create counter within consecutive runs of values
(3 answers)
Closed 5 years ago.
I would like to have a sequence along each Blocks as such:
Blocks MySeq
1 1
1 2
2 1
2 2
1 1
1 2
1 3
1 4
3 1
3 2
3 3
4 1
4 2
4 3
4 4
Based on this I have try
myDf %>% dplyr::mutate(MySeq= seq(1:length(unique(Blocks)),rle(Blocks)$"lengths")
However, the sequence is not resetting with each new block. See below:
Blocks MySeq
1 1
1 2
2 1
2 2
1 3
1 4
1 5
1 6
3 1
3 2
3 3
4 1
4 2
4 3
4 4
How can I make a new sequence from each individual Blocks?
Try this
unlist(sapply(rle(df1$Blocks)$lengths,seq_len))
We can use rleid from data.table by grouping the rleid of 'Blocks' and assign (:=) 'MySeq' as the sequence of rows.
library(data.table)
setDT(df1)[, MySeq := seq_len(.N) , .(rleid(Blocks))]
df1
# Blocks MySeq
# 1: 1 1
# 2: 1 2
# 3: 2 1
# 4: 2 2
# 5: 1 1
# 6: 1 2
# 7: 1 3
# 8: 1 4
# 9: 3 1
#10: 3 2
#11: 3 3
#12: 4 1
#13: 4 2
#14: 4 3
#15: 4 4
Or if we are using base R, then sequence of lengths will get the expected output
sequence(rle(df1$Blocks)$lengths)
#[1] 1 2 1 2 1 2 3 4 1 2 3 1 2 3 4
data
df1 <- structure(list(Blocks = c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 4L, 4L, 4L, 4L)), .Names = "Blocks", row.names = c(NA,
-15L), class = "data.frame")

Resources