I would like to create a new column that sequentially labels groups of rows. Original data:
> dt = data.table(index=(1:10), group = c("apple","apple","orange","orange","orange","orange","apple","apple","orange","apple"))
> dt
index group
1: 1 apple
2: 2 apple
3: 3 orange
4: 4 orange
5: 5 orange
6: 6 orange
7: 7 apple
8: 8 apple
9: 9 orange
10: 10 apple
Desired output:
index group id
1: 1 apple 1
2: 2 apple 1
3: 3 orange 1
4: 4 orange 1
5: 5 orange 1
6: 6 orange 1
7: 7 apple 2
8: 8 apple 2
9: 9 orange 2
10: 10 apple 3
dplyr attempt:
dt %>% group_by(group) %>% mutate( id= row_number())
# A tibble: 10 x 3
# Groups: group [2]
index group id
<int> <chr> <int>
1 1 apple 1
2 2 apple 2
3 3 orange 1
4 4 orange 2
5 5 orange 3
6 6 orange 4
7 7 apple 3
8 8 apple 4
9 9 orange 5
10 10 apple 5
How can I edit this to get the first group of apples as 1, then the first group of oranges as 1, then the second group of apples as 2 etc (see desired output above). Also open to data.table solution.
library(data.table)
dt[, id := cumsum(c(TRUE, diff(index) > 1)), by="group"]
dt
# index group id
# 1: 1 apple 1
# 2: 2 apple 1
# 3: 3 orange 1
# 4: 4 orange 1
# 5: 5 orange 1
# 6: 6 orange 1
# 7: 7 apple 2
# 8: 8 apple 2
# 9: 9 orange 2
# 10: 10 apple 3
Starting from original dt:
library(dplyr)
dt %>%
group_by(group) %>%
mutate(id = cumsum(c(TRUE, diff(index) > 1))) %>%
ungroup()
# # A tibble: 10 x 3
# index group id
# <int> <chr> <int>
# 1 1 apple 1
# 2 2 apple 1
# 3 3 orange 1
# 4 4 orange 1
# 5 5 orange 1
# 6 6 orange 1
# 7 7 apple 2
# 8 8 apple 2
# 9 9 orange 2
# 10 10 apple 3
Base R, perhaps a little clunky:
out <- do.call(rbind, by(dt, dt$group,
function(x) transform(x, id = cumsum(c(TRUE, diff(index) > 1)))))
out[order(out$index),]
# index group id
# apple.1 1 apple 1
# apple.2 2 apple 1
# orange.3 3 orange 1
# orange.4 4 orange 1
# orange.5 5 orange 1
# orange.6 6 orange 1
# apple.7 7 apple 2
# apple.8 8 apple 2
# orange.9 9 orange 2
# apple.10 10 apple 3
The names can be removed easily with rownames(out) <- NULL. The order part isn't necessary, but I wanted to present it in the same order as the other solutions, and do.call/by does not preserve the original order.
Another option using data.table::rleid twice:
dt[, gid := rleid(group)][, id := rleid(gid), .(group)]
We can also use rle from base R
with(dt, with(rle(group), rep(ave(seq_along(values),
values, FUN = seq_along), lengths)))
#[1] 1 1 1 1 1 1 2 2 2 3
Related
I need some help with grouping data by continuous values.
If I have this data.table
dt <- data.table::data.table( a = c(1,1,1,2,2,2,2,1,1,2), b = seq(1:10), c = seq(1:10)+1 )
a b c
1: 1 1 2
2: 1 2 3
3: 1 3 4
4: 2 4 5
5: 2 5 6
6: 2 6 7
7: 2 7 8
8: 1 8 9
9: 1 9 10
10: 2 10 11
I need a group for every following equal values in column a. Of this group i need the first (also min possible) value of column b and the last (also max possible) value of column c.
Like this:
a b c
1: 1 1 4
2: 2 4 8
3: 1 8 10
4: 2 10 11
Thank you very much for your help. I do not get it solved alone.
Probably we can try
> dt[, .(a = a[1], b = b[1], c = c[.N]), rleid(a)][, -1]
a b c
1: 1 1 4
2: 2 4 8
3: 1 8 10
4: 2 10 11
An option with dplyr
library(dplyr)
dt %>%
group_by(grp = cumsum(c(TRUE, diff(a) != 0))) %>%
summarise(across(a:b, first), c = last(c)) %>%
select(-grp)
-output
# A tibble: 4 × 3
a b c
<dbl> <int> <dbl>
1 1 1 4
2 2 4 8
3 1 8 10
4 2 10 11
I have this data frame
df <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5),
school=c('school_1','school_2','school_3','school_3','school_4','school_5','school_1','school_1','school_1','school_4','school_6','school_7','school_5','school_5','school_8','school_10','school_10','school_10','school_12','school_13','school_7','school_2','school_2','school_13','school_2'))
and I would like to sequentially order it by id and school. If the school should repeat, I would want the same number used. Once the school changes, the seq should too. Ideal output is below. (I added breaks by id just so it's easier to read)
id school seq
1 school_1 1
1 school_2 2
1 school_3 3
1 school_3 3
1 school_4 4
2 school_5 1
2 school_1 2
2 school_1 2
2 school_1 2
2 school_4 3
3 school_6 1
3 school_7 2
3 school_5 3
3 school_5 3
3 school_8 4
4 school_10 1
4 school_10 1
4 school_10 1
4 school_12 2
4 school_13 3
5 school_7 1
5 school_2 2
5 school_2 2
5 school_13 3
5 school_2 4
I've tried:
setDT(df)[, sequence := seq_len(.N), by = c("id", "school")]
and per this question
df[, .id := sequence(.N), by = "id,school"]
and neither produced what I wanted. Many of the suggestions don't have numbers repeating if the second variable doesn't change.
You could use data.table:
library(data.table)
setDT(df)
df[, seq := rleid(school), by = id]
df
id school seq
1: 1 school_1 1
2: 1 school_2 2
3: 1 school_3 3
4: 1 school_3 3
5: 1 school_4 4
6: 2 school_5 1
7: 2 school_1 2
8: 2 school_1 2
9: 2 school_1 2
10: 2 school_4 3
11: 3 school_6 1
.....
You can use match + unique to get unique school number for each id.
This can be done using dplyr
library(dplyr)
df %>% group_by(id) %>% mutate(seq = match(school, unique(school)))
# id school seq
# <dbl> <chr> <int>
# 1 1 school_1 1
# 2 1 school_2 2
# 3 1 school_3 3
# 4 1 school_3 3
# 5 1 school_4 4
# 6 2 school_5 1
# 7 2 school_1 2
# 8 2 school_1 2
# 9 2 school_1 2
#10 2 school_4 3
# … with 15 more rows
Base R :
df$seq <- with(df, ave(school, id, FUN = function(x) match(x, unique(x))))
and data.table :
library(data.table)
setDT(df)[, seq := match(school, unique(school)), id]
I'm trying to rank the certain groups by their counts using dense_rank, it doesn't make a distinct rank for groups that are tied. And any ranking function I try that has some sort of ties.method doesn't give me the rankings in a consecutive 1,2,3 order. Example:
library(dplyr)
id <- c(rep(1, 8),
rep(2, 8))
fruit <- c(rep('apple', 4), rep('orange', 1), rep('banana', 2), 'orange',
rep('orange', 4), rep('banana', 1), rep('apple', 2), 'banana')
df <- data.frame(id, fruit, stringsAsFactors = FALSE)
df2 <- df %>%
mutate(counter = 1) %>%
group_by(id, fruit) %>%
mutate(fruitCnt = sum(counter)) %>%
ungroup() %>%
group_by(id) %>%
mutate(fruitCntRank = dense_rank(desc(fruitCnt))) %>%
select(id, fruit, fruitCntRank)
df2
id fruit fruitCntRank
1 1 apple 1
2 1 apple 1
3 1 apple 1
4 1 apple 1
5 1 orange 2
6 1 banana 2
7 1 banana 2
8 1 orange 2
9 2 orange 1
10 2 orange 1
11 2 orange 1
12 2 orange 1
13 2 banana 2
14 2 apple 2
15 2 apple 2
16 2 banana 2
It doesn't matter which of orange or banana are ranked 3, and it doesn't even need to be consistent. I just need the groups to be ranked 1, 2, 3.
Desired result:
id fruit fruitCntRank
1 1 apple 1
2 1 apple 1
3 1 apple 1
4 1 apple 1
5 1 orange 2
6 1 banana 3
7 1 banana 3
8 1 orange 2
9 2 orange 1
10 2 orange 1
11 2 orange 1
12 2 orange 1
13 2 banana 2
14 2 apple 3
15 2 apple 3
16 2 banana 2
We can add count for each id and fruit combination, arrange them in descending order of count and get the rank using match.
library(dplyr)
df %>%
add_count(id, fruit) %>%
arrange(id, desc(n)) %>%
group_by(id) %>%
mutate(n = match(fruit, unique(fruit)))
#Another option with cumsum and duplicated
#mutate(n = cumsum(!duplicated(fruit)))
# id fruit n
# <dbl> <chr> <int>
# 1 1 apple 1
# 2 1 apple 1
# 3 1 apple 1
# 4 1 apple 1
# 5 1 orange 2
# 6 1 banana 3
# 7 1 banana 3
# 8 1 orange 2
# 9 2 orange 1
#10 2 orange 1
#11 2 orange 1
#12 2 orange 1
#13 2 banana 2
#14 2 apple 3
#15 2 apple 3
#16 2 banana 2
I want to copy values from one column to a new variable, and than add this values to other columns based on conditions.
Minimal Example would be
VP <- c("1","1","2","1","1","2","2","1", "1")
Group <- c("1","1","1","2","2","2","3","3", "3")
Value<-c("6","4","7","2","3","8","4","3", "5")
df <- data.frame(cbind(VP, Group, Value))
The goal would be a result like this:
VP Group Value NewVariable
1 1 6 7
1 1 4 7
2 1 7
1 2 2 8
1 2 3 8
2 2 8
2 3 4
1 3 3 4
1 3 5 4
So taking the value for VP and copy it to every other person in the corresponding group, except for the own row.
One possible approach is updating in a join:
library(data.table)
setDT(df)[df[VP == "2"][, VP := "1"], on = .(VP, Group), NewVariable := i.Value]
df
VP Group Value NewVariable
1: 1 1 6 7
2: 1 1 4 7
3: 2 1 7 <NA>
4: 1 2 2 8
5: 1 2 3 8
6: 2 2 8 <NA>
7: 2 3 4 <NA>
8: 1 3 3 4
9: 1 3 5 4
Or, with NA replaced:
setDT(df)[df[VP == "2"][, VP := "1"], on = .(VP, Group), NewVariable := i.Value][
is.na(NewVariable), NewVariable := ""]
df
VP Group Value NewVariable
1: 1 1 6 7
2: 1 1 4 7
3: 2 1 7
4: 1 2 2 8
5: 1 2 3 8
6: 2 2 8
7: 2 3 4
8: 1 3 3 4
9: 1 3 5 4
Assuming you would have one value for VP = 2 for every group we could do
library(dplyr)
df %>%
group_by(Group) %>%
mutate(NewVar = ifelse(VP == 2, NA, Value[VP == 2]))
# VP Group Value NewVar
# <chr> <chr> <chr> <chr>
#1 1 1 6 7
#2 1 1 4 7
#3 2 1 7 NA
#4 1 2 2 8
#5 1 2 3 8
#6 2 2 8 NA
#7 2 3 4 NA
#8 1 3 3 4
#9 1 3 5 4
I am returning NA here instead of empty string. You could choose based on your preference.
data
VP <- c("1","1","2","1","1","2","2","1", "1")
Group <- c("1","1","1","2","2","2","3","3", "3")
Value<-c("6","4","7","2","3","8","4","3", "5")
df <- data.frame(VP, Group, Value, stringsAsFactors = FALSE)
I have a data.frame:
ID <-c(2,2,2,2,3,3,5,5)
Pur<-c(0,1,2,3,1,2,4,5)
df<-data.frame(ID,Pur)
I would like to push the Pur up for each ID to get the up.Pur as follows:
ID Pur up.Pur
2 0 1
2 1 2
2 2 3
2 3 NA
3 1 2
3 2 NA
5 4 5
5 5 NA
Would appreciate your help with this.
Here is a dplyr approach
library(dplyr)
ID <-c(2,2,2,2,3,3,5,5)
Pur<-c(0,1,2,3,1,2,4,5)
df<-data.frame(ID,Pur)
df %>%
group_by(ID) %>%
mutate(up.Pur = lead(Pur))
# Source: local data frame [8 x 3]
# Groups: ID [3]
#
# ID Pur up.Pur
# <dbl> <dbl> <dbl>
# 1 2 0 1
# 2 2 1 2
# 3 2 2 3
# 4 2 3 NA
# 5 3 1 2
# 6 3 2 NA
# 7 5 4 5
# 8 5 5 NA
For completeness, I've added a base R approach, just in case you don't feel like installing any packages.
dfList = split(df, ID)
dfList = lapply(dfList, function(x){
x$up.Pur = c(x$Pur[-1], NA)
return(x)
})
unsplit(dfList, ID)
# ID Pur up.Pur
# 1 2 0 1
# 2 2 1 2
# 3 2 2 3
# 4 2 3 NA
# 5 3 1 2
# 6 3 2 NA
# 7 5 4 5
# 8 5 5 NA
We can use shift from data.table
library(data.table)
setDT(df)[, up.Pur := shift(Pur, type = "lead"), by = ID]
df
# ID Pur up.Pur
#1: 2 0 1
#2: 2 1 2
#3: 2 2 3
#4: 2 3 NA
#5: 3 1 2
#6: 3 2 NA
#7: 5 4 5
#8: 5 5 NA