Add sequence along blocks [duplicate] - r

This question already has answers here:
Create counter within consecutive runs of values
(3 answers)
Closed 5 years ago.
I would like to have a sequence along each Blocks as such:
Blocks MySeq
1 1
1 2
2 1
2 2
1 1
1 2
1 3
1 4
3 1
3 2
3 3
4 1
4 2
4 3
4 4
Based on this I have try
myDf %>% dplyr::mutate(MySeq= seq(1:length(unique(Blocks)),rle(Blocks)$"lengths")
However, the sequence is not resetting with each new block. See below:
Blocks MySeq
1 1
1 2
2 1
2 2
1 3
1 4
1 5
1 6
3 1
3 2
3 3
4 1
4 2
4 3
4 4
How can I make a new sequence from each individual Blocks?

Try this
unlist(sapply(rle(df1$Blocks)$lengths,seq_len))

We can use rleid from data.table by grouping the rleid of 'Blocks' and assign (:=) 'MySeq' as the sequence of rows.
library(data.table)
setDT(df1)[, MySeq := seq_len(.N) , .(rleid(Blocks))]
df1
# Blocks MySeq
# 1: 1 1
# 2: 1 2
# 3: 2 1
# 4: 2 2
# 5: 1 1
# 6: 1 2
# 7: 1 3
# 8: 1 4
# 9: 3 1
#10: 3 2
#11: 3 3
#12: 4 1
#13: 4 2
#14: 4 3
#15: 4 4
Or if we are using base R, then sequence of lengths will get the expected output
sequence(rle(df1$Blocks)$lengths)
#[1] 1 2 1 2 1 2 3 4 1 2 3 1 2 3 4
data
df1 <- structure(list(Blocks = c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 4L, 4L, 4L, 4L)), .Names = "Blocks", row.names = c(NA,
-15L), class = "data.frame")

Related

Pasting values from a vector to a new column in a for loop with nested data

I have a dataframe that currently looks like this:
subjectID
Trial
1
3
1
3
1
3
1
4
1
4
1
5
1
5
1
5
2
1
2
1
2
3
2
3
2
3
2
5
2
5
2
6
3
1
Etc., where trial number is nested under subject ID. I need to make a new column in which column "NewTrial" is simply what order the trials now appear in. For example:
subjectID
Trial
NewTrial
1
3
1
1
3
1
1
3
1
1
4
2
1
4
2
1
5
3
1
5
3
1
5
3
2
1
1
2
1
1
2
3
2
2
3
2
2
3
2
2
5
3
2
5
3
2
6
4
3
1
1
So far, I have a for-loop written that looks like this:
for (myperson in unique(data$subjectID)){
#This line creates a vector of the number of unique trials per subject: for subject 1, c(1, 2, 3)
triallength=1:length(unique(data$Trial[data$subID==myperson]))
I'm having trouble now finding a way to paste the numbers from the created triallength vector as a column in the dataframe. Does anyone know of a way to accomplish this? I am lacking some experience with for-loops and hoping to gain more. If anyone has a tidyverse/dplyr solution, however, I am open to that as well as an alternative to a for-loop. Thanks in advance, and let me know if any clarification is needed!
Converting to factor with unique values as levels, then as.numeric in an ave should be nice.
transform(dat, NewTrial=ave(Trial, subjectID, FUN=\(x) as.numeric(factor(x, levels=unique(x)))))
# subjectID Trial NewTrial
# 1 1 3 1
# 2 1 3 1
# 3 1 3 1
# 4 1 4 2
# 5 1 4 2
# 6 1 5 3
# 7 1 5 3
# 8 1 5 3
# 9 2 1 1
# 10 2 1 1
# 11 2 3 2
# 12 2 3 2
# 13 2 3 2
# 14 2 5 3
# 15 2 5 3
# 16 2 6 4
# 17 3 1 1
Data:
dat <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), Trial = c(3L, 3L, 3L, 4L,
4L, 5L, 5L, 5L, 1L, 1L, 3L, 3L, 3L, 5L, 5L, 6L, 1L)), class = "data.frame", row.names = c(NA,
-17L))
We could use match on the unique values after grouping by 'subjectID'
library(dplyr)
df1 <- df1 %>%
group_by(subjectID) %>%
mutate(NewTrial = match(Trial, unique(Trial))) %>%
ungroup
We could use rleid:
library(dplyr)
library(data.table)
df %>%
group_by(subjectID) %>%
mutate(NewTrial = rleid(subjectID, Trial))
subjectID Trial NewTrial
<int> <int> <int>
1 1 3 1
2 1 3 1
3 1 3 1
4 1 4 2
5 1 4 2
6 1 5 3
7 1 5 3
8 1 5 3
9 2 1 1
10 2 1 1
11 2 3 2
12 2 3 2
13 2 3 2
14 2 5 3
15 2 5 3
16 2 6 4
17 3 1 1

By group relative order

I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!
A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))
You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3

Expand an R Column Values To Column Headers with Another Column's values

I'm trying to expand an R data table that looks like this:
a step_num duration
1 1 5
1 2 4
1 3 1
2 1 7
2 2 2
2 3 9
3 1 1
3 2 1
3 3 3
Into something that looks like this:
a | step_num | duration | 1_duration | 2_duration | 3_duration |
----------------------------------------------------------------
1 1 5 5 - -
1 2 4 - 4 -
1 3 1 - - 1
2 1 7 7 - -
2 2 2 - 2 -
2 3 9 - - 9
3 1 1 1 - -
3 2 1 - 1 -
3 3 3 - - 3
I'm wondering if there's an 'expand' function, so to speak, that would do this.
Thanks!
We can do this in base r.
cbind(df,
reshape(df, idvar = c("a","step_num"), timevar = "step_num", direction = "wide")[,-1])
#> a step_num duration duration.1 duration.2 duration.3
#> 1 1 1 5 5 NA NA
#> 2 1 2 4 NA 4 NA
#> 3 1 3 1 NA NA 1
#> 4 2 1 7 7 NA NA
#> 5 2 2 2 NA 2 NA
#> 6 2 3 9 NA NA 9
#> 7 3 1 1 1 NA NA
#> 8 3 2 1 NA 1 NA
#> 9 3 3 3 NA NA 3
Created on 2019-05-21 by the reprex package (v0.2.1)
Simple tidyverse solution:
library(tidyverse)
df %>%
mutate(step = step_num) %>%
spread(step, duration, fill = '-') %>%
rename_all( ~ gsub('(\\d+)', 'duration_\\1', .))
# a step_num duration_1 duration_2 duration_3
# 1 1 1 5 - -
# 2 1 2 - 4 -
# 3 1 3 - - 1
# 4 2 1 7 - -
# 5 2 2 - 2 -
# 6 2 3 - - 9
# 7 3 1 1 - -
# 8 3 2 - 1 -
# 9 3 3 - - 3
Or an option with dcast from data.table
library(data.table)
dcast(setDT(df), a + step_num ~
paste0("duration_", step_num), value.var = 'duration')
# a step_num duration_1 duration_2 duration_3
#1: 1 1 5 NA NA
#2: 1 2 NA 4 NA
#3: 1 3 NA NA 1
#4: 2 1 7 NA NA
#5: 2 2 NA 2 NA
#6: 2 3 NA NA 9
#7: 3 1 1 NA NA
#8: 3 2 NA 1 NA
#9: 3 3 NA NA 3
NOTE: It is better to have NA instead of - as NA is easily removable with is.na/complete.cases/na.omit and it wouldn't change the class of the column to character
data
df <- structure(list(a = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), step_num = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), duration = c(5L, 4L, 1L, 7L,
2L, 9L, 1L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
Here's an approach using dplyr and tidyr.
We take the original data and add on some columns by first adding a new column col which holds the column header we want, based on the step_num. Then we use tidyr::spread to put the durations into different columns depending on which col they go with. fill = "-" fills all the empty columns with dashes. Finally, we drop the a and step_num columns since they're already there in the original data and we don't want to have copies of them.
(Note, we needed step_num to still exist at the spread step, because we wanted to keep each row aligned with the original rows. Without step_num, the data would get spread into a wider, shorter format that would have misaligned rows.)
library(dplyr); library(tidyr)
df %>%
mutate(col = paste0(step_num, "_duration")) %>%
spread(col, duration, fill = "-") %>%
select(-a, -step_num)) %>%
bind_cols(df, .) # Edit, per excellent suggestion from M-M
a step_num duration 1_duration 2_duration 3_duration
1 1 1 5 5 - -
2 1 2 4 - 4 -
3 1 3 1 - - 1
4 2 1 7 7 - -
5 2 2 2 - 2 -
6 2 3 9 - - 9
7 3 1 1 1 - -
8 3 2 1 - 1 -
9 3 3 3 - - 3

Longest consecutive count of the same value per group

I have a data.frame as below and I want to add a variable describing the longest consecutive count of 1 in the VALUE variable observed in the group (i.e. longest consecutive rows with 1 in VALUE per group).
GROUP_ID VALUE
1 0
1 1
1 1
1 1
1 1
1 0
2 1
2 1
2 0
2 1
2 1
2 1
3 1
3 0
3 1
3 0
So the output would look like this:
GROUP_ID VALUE CONSECUTIVE
1 0 4
1 1 4
1 1 4
1 1 4
1 1 4
1 0 4
2 1 3
2 1 3
2 0 3
2 1 3
2 1 3
2 1 3
3 1 1
3 0 1
3 1 1
3 0 1
Any help would be greatly appreciated!
Using dplyr:
library(dplyr)
dat %>%
group_by(GROUP_ID) %>%
mutate(CONSECUTIVE = {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])})
which gives:
# A tibble: 16 x 3
# Groups: GROUP_ID [3]
GROUP_ID VALUE CONSECUTIVE
<int> <int> <int>
1 1 0 4
2 1 1 4
3 1 1 4
4 1 1 4
5 1 1 4
6 1 0 4
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
13 3 1 1
14 3 0 1
15 3 1 1
16 3 0 1
Or with data.table:
library(data.table)
setDT(dat) # convert to a 'data.table'
dat[, CONSECUTIVE := {rl <- rle(VALUE); max(rl$lengths[rl$values == 1])}
, by = GROUP_ID][]
We can use ave with rle and get maximum occurrence of consecutive 1's for each group. (GROUP_ID)
df$Consecutive <- ave(df$VALUE, df$GROUP_ID, FUN = function(x) {
y <- rle(x == 1)
max(y$lengths[y$values])
})
df
# GROUP_ID VALUE Consecutive
#1 1 0 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 1 4
#6 1 0 4
#7 2 1 3
#8 2 1 3
#9 2 0 3
#10 2 1 3
#11 2 1 3
#12 2 1 3
#13 3 1 1
#14 3 0 1
#15 3 1 1
#16 3 0 1
Here is another option with data.table
library(data.table)
library(dplyr)
setDT(df1)[, CONSECUTIVE := max(table(na_if(rleid(VALUE)*VALUE, 0))), .(GROUP_ID)]
df1
# GROUP_ID VALUE CONSECUTIVE
# 1: 1 0 4
# 2: 1 1 4
# 3: 1 1 4
# 4: 1 1 4
# 5: 1 1 4
# 6: 1 0 4
# 7: 2 1 3
# 8: 2 1 3
# 9: 2 0 3
#10: 2 1 3
#11: 2 1 3
#12: 2 1 3
#13: 3 1 1
#14: 3 0 1
#15: 3 1 1
#16: 3 0 1
data
df1 <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), VALUE = c(0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-16L))

Panel data sequence adding for a particular value

I am really new in r and stackoverflow. Apologies in advance for this novice question.
I have a panel data set like the following table.
ID Choice
1 1
1 1
1 2
1 5
1 1
2 1
2 1
2 5
2 1
2 1
3 3
3 1
3 1
3 2
3 4
I want to add another column like the following table when choice is 1. This is basically, sequencing the choice 1 within ID.
ID Choice BUS
1 1 0 (The first 1 will be considered as 0)
1 1 1
1 2 1
1 5 1
1 1 2
2 1 0
2 1 1
2 5 1
2 1 2
2 1 3
3 3 0
3 1 0
3 1 1
3 2 1
3 4 1
with(df, ave(Choice == 1, ID, FUN = cumsum))
Almost gives you what you want but as you want to consider first 1 as 0 it needs some modification.
df$BUS <- with(df, ave(Choice == 1, ID, FUN = function(x) {
inds = cumsum(x)
ifelse(inds > 0, inds - 1, inds)
}))
df
# ID Choice BUS
#1 1 1 0
#2 1 1 1
#3 1 2 1
#4 1 5 1
#5 1 1 2
#6 2 1 0
#7 2 1 1
#8 2 5 1
#9 2 1 2
#10 2 1 3
#11 3 3 0
#12 3 1 0
#13 3 1 1
#14 3 2 1
#15 3 4 1
Here we subtract 1 from cumulative sum from the first 1.
Using the same logic in dplyr
library(dplyr)
df %>%
group_by(ID) %>%
mutate(inds = cumsum(Choice == 1),
BUS = ifelse(inds > 0, inds - 1, inds)) %>%
select(-inds)
We can also use data.table
library(data.table)
setDT(df1)[, BUS := pmax(0, cumsum(Choice == 1)-1), ID]
df1
# ID Choice BUS
# 1: 1 1 0
# 2: 1 1 1
# 3: 1 2 1
# 4: 1 5 1
# 5: 1 1 2
# 6: 2 1 0
# 7: 2 1 1
# 8: 2 5 1
# 9: 2 1 2
#10: 2 1 3
#11: 3 3 0
#12: 3 1 0
#13: 3 1 1
#14: 3 2 1
#15: 3 4 1
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L), Choice = c(1L, 1L, 2L, 5L, 1L, 1L, 1L, 5L,
1L, 1L, 3L, 1L, 1L, 2L, 4L)), class = "data.frame", row.names = c(NA,
-15L))

Resources