Create a ID value based on an incremental value when a value in a column changes in R [duplicate] - r

This question already has answers here:
Is there a dplyr equivalent to data.table::rleid?
(6 answers)
Closed 5 years ago.
I would like to create a 'segment' ID so that:
If the value (in one column) is the same as the row before you maintain the same segment ID
However, if the value (in one column) is different than the row before the segment ID increments by one
I am currently trying to achieve this via:
require(dplyr)
person <- c("Mark","Mark","Mark","Mark","Mark","Steve","Steve","Tim", "Tim", "Tim","Mark")
df <- data.frame(person,stringsAsFactors = FALSE)
df$segment = 1
df$segment <- ifelse(df$person == dplyr::lag(df$person),dplyr::lag(df$segment),dplyr::lag(df$segment)+1)
But I am not getting the desired result through this method.
Any help would be appreciated

If you want to increment on change, try this
df %>% mutate(segment = cumsum(person != lag(person, default="")))
# person segment
# 1 Mark 1
# 2 Mark 1
# 3 Mark 1
# 4 Mark 1
# 5 Mark 1
# 6 Steve 2
# 7 Steve 2
# 8 Tim 3
# 9 Tim 3
# 10 Tim 3
# 11 Mark 4

A base R solution might look like this
c(1, cumsum(person[-1] != person[-length(person)]) +1)
[1] 1 1 1 1 1 2 2 3 3 3 4

Related

Repeat rows making each repeated rows following the original rows and assign new variables for each row [duplicate]

This question already has answers here:
Repeat rows of a data.frame [duplicate]
(10 answers)
Closed 1 year ago.
I know there are a lot of posts about how to repeat rows making the whole "repeated" rows following the "whole" original data. However, my question is a bit different, what I want is to repeat each row and let the newly created row becoming the next row of the repeated rows, meanwhile, I would like to create a new variable for each row.
To make my example clear, you can use this example data frame:
data.frame(a = c(1,2,3),b = c(1,2,3))
a b
1 1 1
2 2 2
3 3 3
What I want to get is some data frame like this:
a b type
1 1 1 origin
2 1 1 destination
3 2 2 origin
4 2 2 destination
5 3 3 origin
6 3 3 destination
Any hint will be much appreciated! Thanks for your help in advance
You can repeat each row twice and repeat c('origin', 'destination') for each row.
In base R, you can do -
transform(df[rep(seq(nrow(df)), each = 2), ], type = c('origin', 'destination'))
Or in tidyverse -
library(dplyr)
library(tidyr)
df %>%
uncount(2) %>%
mutate(type = rep(c('origin', 'destination'), length.out = n()))
# a b type
#1 1 1 origin
#2 1 1 destination
#3 2 2 origin
#4 2 2 destination
#5 3 3 origin
#6 3 3 destination
Have a look at Repeat rows of a data.frame N times to see ways to repeat lines, and to bind another column you can use auto repetition.
cbind(x[rep(seq_len(nrow(x)), each = 2), ], type = c("origin", "destination"))
# a b type
#1 1 1 origin
#1.1 1 1 destination
#2 2 2 origin
#2.1 2 2 destination
#3 3 3 origin
#3.1 3 3 destination

R: Matching and repeating occurence [duplicate]

This question already has answers here:
Complete dataframe with missing combinations of values
(2 answers)
Closed 2 years ago.
(sample code below) I have two data sets. One is a library of products, the other is customer id, date and viewed product and another detail.I want to get a merge where I see per each id AND date all the library of products as well as where the match was. I have tried using full_join and merge and right and left joins, but they do not repeat the rows. below is the sample of what i am trying to achieve.
id=c(1,1,1,1,2,2)
date=c(1,1,2,2,1,3)
offer=c('a','x','y','x','y','a')
section=c('general','kitchen','general','general','general','kitchen')
t=data.frame(id,date,offer,section)
offer=c('a','x','y','z')
library=data.frame(offer)
######
t table
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 2 y general
4 1 2 x general
5 2 1 y general
6 2 3 a kitchen
library table
offer
1 a
2 x
3 y
4 z
and i want to get this:
id date offer section
1 1 1 a general
2 1 1 x kitchen
3 1 1 y NA
4 1 1 z general
...
(there would have to be 6*4 observations)
I realize because I match by offer it is not going to repeat the values like so, but what is another option to do that? Thanks a lot!!
You can use complete to get all combinations of library$offer for each id and date.
tidyr::complete(t, id, date, offer = library$offer)
# A tibble: 24 x 4
# id date offer section
# <dbl> <dbl> <chr> <chr>
# 1 1 1 a general
# 2 1 1 x kitchen
# 3 1 1 y NA
# 4 1 1 z NA
# 5 1 2 a NA
# 6 1 2 x general
# 7 1 2 y general
# 8 1 2 z NA
# 9 1 3 a NA
#10 1 3 x NA
# … with 14 more rows
You can use tidyr and dplyr to get the data. The crossing() function will create all combinations of the variables you pass in
library(dplyr)
library(tidyr)
t %>%
select(id, date) %>%
{crossing(id=.$id, date=.$date, library)} %>%
left_join(t)

Group-ID according to numbering reset [duplicate]

This question already has answers here:
Group variable based on continuous values
(1 answer)
Group a dataframe based on sequence breaks in a column?
(2 answers)
Something like conditional seq_along on grouped data
(1 answer)
How do I create a variable that increments by 1 based on the value of another variable?
(3 answers)
Closed 3 years ago.
I have following data:
d <- as_tibble(c(1,2,1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7))
The running numbers are one group, and for every reset
I need hvae a new group. What I need is a group-ID for
every numbering reset; hence:
d$ID <- c(1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4)
To visualize it:
value ID
1 1
2 1
1 2
2 2
3 2
4 2
5 2
1 3
2 3
3 3
4 3
1 4
2 4
3 4
4 4
5 4
6 4
7 4
I have tried using group_indices of dplyr but
that doesnt do the trick as it groups by same value:
d$ID <- d %>% group_indices(value)
We can use diff to subtract the current value with previous value and increment the counter whenever the values are reset.
cumsum(c(TRUE, diff(d$value) < 0))
#[1] 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4
In dplyr,we can use lag to compare it with previous value.
library(dplyr)
d %>% mutate(ID = cumsum(value < lag(value, default = first(value))) + 1)

How to sort a column from ascending order for EACH ID in R [duplicate]

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 7 years ago.
If I want to sort the Chrom# from ascending order (1 to 23) for each unique ID (as shown below there's multiple rows of same IDs, how to write the R code for it? eg) MB-0002, chrom from 1,1,1,2,4,22... etc. 1 chrom per row. I am new to R so any help would be appreciated. Thanks so much!
sample dataset
If you can use dplyr::arrange then you can easily sort by two variables.
tmp <- data.frame(id=c("a","a","b","a","b","c","a","b","c"),
value=c(3,2,4,1,2,1,7,4,3))
tmp
# id value
# 1 a 3
# 2 a 2
# 3 b 4
# 4 a 1
# 5 b 2
# 6 c 1
# 7 a 7
# 8 b 4
# 9 c 3
library(dplyr)
tmp %>% arrange(id, value)
# id value
# 1 a 1
# 2 a 2
# 3 a 3
# 4 a 7
# 5 b 2
# 6 b 4
# 7 b 4
# 8 c 1
# 9 c 3
FYI, an image doesn't work as a usable sample dataset.

Extract rows from two data.frames that are similar in a column?

For the following two data.frames
Set1 <- data.frame(Object=c("one","two","three","four"),
Age=c(1,1,1,1),
Value=c(1,2,4,8))
Set2 <- data.frame(Object=c("one","two","three","five"),
Age=c(2,2,2,2),
Value=c(4,8,2,7))
I want to get the entries that are repeated (according to column "Object") in both Set1 and Set2, i.e.
Object Age Value
1 one 1 1
2 two 1 2
3 three 1 4
4 one 2 4
5 two 2 8
6 three 2 2
How would I go about doing this?
> x = intersect(Set1$Object, Set2$Object)
> rbind(Set1[Set1$Object %in% x,], Set2[Set2$Object %in% x,])
Object Age Value
1 one 1 1
2 two 1 2
3 three 1 4
4 one 2 4
5 two 2 8
6 three 2 2
As I am not sure I understand correctly your question (your example does not fit the question in any way), I can only suggest the hint:
Set <- rbind(Set1, Set2)
rv <- Set[Set[, "Object"] %in% duplicated(Set[, "Object"]), ]

Resources