Group-ID according to numbering reset [duplicate] - r

This question already has answers here:
Group variable based on continuous values
(1 answer)
Group a dataframe based on sequence breaks in a column?
(2 answers)
Something like conditional seq_along on grouped data
(1 answer)
How do I create a variable that increments by 1 based on the value of another variable?
(3 answers)
Closed 3 years ago.
I have following data:
d <- as_tibble(c(1,2,1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7))
The running numbers are one group, and for every reset
I need hvae a new group. What I need is a group-ID for
every numbering reset; hence:
d$ID <- c(1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4)
To visualize it:
value ID
1 1
2 1
1 2
2 2
3 2
4 2
5 2
1 3
2 3
3 3
4 3
1 4
2 4
3 4
4 4
5 4
6 4
7 4
I have tried using group_indices of dplyr but
that doesnt do the trick as it groups by same value:
d$ID <- d %>% group_indices(value)

We can use diff to subtract the current value with previous value and increment the counter whenever the values are reset.
cumsum(c(TRUE, diff(d$value) < 0))
#[1] 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4
In dplyr,we can use lag to compare it with previous value.
library(dplyr)
d %>% mutate(ID = cumsum(value < lag(value, default = first(value))) + 1)

Related

Calculate difference between current row and first row within group [duplicate]

This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6
You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()

Create a ID value based on an incremental value when a value in a column changes in R [duplicate]

This question already has answers here:
Is there a dplyr equivalent to data.table::rleid?
(6 answers)
Closed 5 years ago.
I would like to create a 'segment' ID so that:
If the value (in one column) is the same as the row before you maintain the same segment ID
However, if the value (in one column) is different than the row before the segment ID increments by one
I am currently trying to achieve this via:
require(dplyr)
person <- c("Mark","Mark","Mark","Mark","Mark","Steve","Steve","Tim", "Tim", "Tim","Mark")
df <- data.frame(person,stringsAsFactors = FALSE)
df$segment = 1
df$segment <- ifelse(df$person == dplyr::lag(df$person),dplyr::lag(df$segment),dplyr::lag(df$segment)+1)
But I am not getting the desired result through this method.
Any help would be appreciated
If you want to increment on change, try this
df %>% mutate(segment = cumsum(person != lag(person, default="")))
# person segment
# 1 Mark 1
# 2 Mark 1
# 3 Mark 1
# 4 Mark 1
# 5 Mark 1
# 6 Steve 2
# 7 Steve 2
# 8 Tim 3
# 9 Tim 3
# 10 Tim 3
# 11 Mark 4
A base R solution might look like this
c(1, cumsum(person[-1] != person[-length(person)]) +1)
[1] 1 1 1 1 1 2 2 3 3 3 4

Apply a maximum value to whole group [duplicate]

This question already has answers here:
Aggregate a dataframe on a given column and display another column
(8 answers)
Closed 6 years ago.
I have a df like this:
Id count
1 0
1 5
1 7
2 5
2 10
3 2
3 5
3 4
and I want to get the maximum count and apply that to the whole "group" based on ID, like this:
Id count max_count
1 0 7
1 5 7
1 7 7
2 5 10
2 10 10
3 2 5
3 5 5
3 4 5
I've tried pmax, slice etc. I'm generally having trouble working with data that is in interval-specific form; if you could direct me to tools well-suited to that type of data, would really appreciate it!
Figured it out with help from Gavin Simpson here: Aggregate a dataframe on a given column and display another column
maxcount <- aggregate(count ~ Id, data = df, FUN = max)
new_df<-merge(df, maxcount)
Better way:
df$max_count <- with(df, ave(count, Id, FUN = max))

How to create a new row that would show me the number of observations in a group in an unbalanced panel dataset in R? [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 6 years ago.
I have a dataset that looks like this:
id time
1 1
1 2
2 5
2 3
3 2
3 7
3 8
And I want to add another column to show me how many observations there are in a group.
id time label
1 1 1
1 2 2
2 5 1
2 3 2
3 2 1
3 7 2
3 8 3
We can use ave
df1$label <- with(df1, ave(seq_along(id), id, FUN=seq_along))
Or with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(label = row_number())

Subtract from the previous row R [duplicate]

This question already has answers here:
How to find the difference in value in every two consecutive rows in R?
(4 answers)
Closed 7 years ago.
I have a dataframe like so:
df <- data.frame(start=c(5,4,2),end=c(2,6,3))
start end
5 2
4 6
2 3
And I want the following result:
start end diff
5 2
4 6 1
2 3 -1
Essentially it is:
end[2] (second row) - start[1] = 6-5=1
and end[3] - start[2] = 3-4 = -1
What is a good way of doing this in R?
Just a simple vector subtraction should work
df$diff <- c(NA,df[2:nrow(df), 2] - df[1:(nrow(df)-1), 1])
start end diff
1 5 2 NA
2 4 6 1
3 2 3 -1
library(data.table)
setDT(df)[,value:=end-shift(start,1,type="lag")]
start end value
1: 5 2 NA
2: 4 6 1
3: 2 3 -1

Resources