R: Count number of days before specific date by ID - r

I have data frame with multiple columns, most importantly id and date. I would like to make another column in R which will fill in every row which day of date interval is by id.
Something like this.
id date
1 12/31/2019
1 12/30/2019
2 12/26/2019
2 12/25/2019
2 12/24/2019
3 12/22/2019
3 12/21/2019
3 12/20/2019
3 12/19/2019
4 12/15/2019
4 12/14/2019
4 12/13/2019
to make like this
id date date count
1 12/31/2019 2
1 12/30/2019 1
2 12/26/2019 3
2 12/25/2019 2
2 12/24/2019 1
3 12/22/2019 4
3 12/21/2019 3
3 12/20/2019 2
3 12/19/2019 1
4 12/15/2019 3
4 12/14/2019 2
4 12/13/2019 1

One dplyr possibility could be:
df %>%
group_by(id) %>%
mutate(date_count = dense_rank(as.Date(date, format = "%m/%d/%Y")))
id date date_count
<int> <chr> <int>
1 1 12/31/2019 2
2 1 12/30/2019 1
3 2 12/26/2019 3
4 2 12/25/2019 2
5 2 12/24/2019 1
6 3 12/22/2019 4
7 3 12/21/2019 3
8 3 12/20/2019 2
9 3 12/19/2019 1
10 4 12/15/2019 3
11 4 12/14/2019 2
12 4 12/13/2019 1

We can use data.table methods
library(data.table)
setDT(df)[, date_count := frank(as.IDate(date, format = "%m/%d/%Y"),
ties.method = 'dense'), id][]
# id date date_count
# 1: 1 12/31/2019 2
# 2: 1 12/30/2019 1
# 3: 2 12/26/2019 3
# 4: 2 12/25/2019 2
# 5: 2 12/24/2019 1
# 6: 3 12/22/2019 4
# 7: 3 12/21/2019 3
# 8: 3 12/20/2019 2
# 9: 3 12/19/2019 1
#10: 4 12/15/2019 3
#11: 4 12/14/2019 2
#12: 4 12/13/2019 1
data
df <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L,
4L, 4L), date = c("12/31/2019", "12/30/2019", "12/26/2019", "12/25/2019",
"12/24/2019", "12/22/2019", "12/21/2019", "12/20/2019", "12/19/2019",
"12/15/2019", "12/14/2019", "12/13/2019")),
class = "data.frame", row.names = c(NA,
-12L))

Another data.table option:
DT[order(id, as.IDate(date, format="%m/%d/%Y")), dc := rowid(id)]

Related

By group relative order

I have a data set that looks like this
ID
Week
1
3
1
5
1
5
1
8
1
11
1
16
2
2
2
2
2
3
2
3
2
9
Now, what I would like to do is to add another column to the DataFrame so that, for every ID I will mark the week's relative position. More elaborately, I would like to the mark ID's earliest week (smallest number) as 1, then the next week for the ID as 2 and so forth, where if there are two observations of the same week they get the same number.
So, in the above example I should get:
ID
Week
Order
1
3
1
1
5
2
1
5
2
1
8
3
1
11
4
1
16
5
2
2
1
2
2
1
2
3
2
2
3
2
2
9
3
How could I achieve this?
Thank you very much!
A base R option using ave + match
transform(
df,
Order = ave(Week,
ID,
FUN = function(x) match(x, sort(unique(x)))
)
)
or ave + order (thank #IRTFM for comments)
transform(
df,
Order = ave(Week,
ID,
FUN = order
)
)
gives
ID Week Order
1 1 3 1
2 1 5 2
3 1 5 2
4 1 8 3
5 1 11 4
6 1 16 5
7 2 2 1
8 2 2 1
9 2 3 2
10 2 3 2
11 2 9 3
A data.table option with frank
> setDT(df)[, Order := frank(Week, ties.method = "dense"), ID][]
ID Week Order
1: 1 3 1
2: 1 5 2
3: 1 5 2
4: 1 8 3
5: 1 11 4
6: 1 16 5
7: 2 2 1
8: 2 2 1
9: 2 3 2
10: 2 3 2
11: 2 9 3
Data
> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Week = c(3L, 5L, 5L, 8L, 11L, 16L, 2L, 2L, 3L, 3L, 9L)), class = "data.frame", row.names =
c(NA,
-11L))
You can use dense_rank in dplyr :
library(dplyr)
df %>% group_by(ID) %>% mutate(Order = dense_rank(Week)) %>% ungroup
# ID Week Order
# <int> <int> <int>
# 1 1 3 1
# 2 1 5 2
# 3 1 5 2
# 4 1 8 3
# 5 1 11 4
# 6 1 16 5
# 7 2 2 1
# 8 2 2 1
# 9 2 3 2
#10 2 3 2
#11 2 9 3

Order values within column according to values within different column by group in R

I have the following panel data set:
group i f r d
1 4 8 3 3
1 9 4 5 1
1 2 2 2 2
2 5 5 3 2
2 3 9 3 3
2 9 1 3 1
I want to reorder column i in this data frame according to values in column d for each group. So the highest value for group 1 in column i should correspond to the highest value in column d. In the end my data.frame should look like this:
group i f r d
1 9 8 3 3
1 2 4 5 1
1 4 2 2 2
2 5 5 3 2
2 9 9 3 3
2 3 1 3 1
Here is a dplyr solution.
First, group by group. Then get the permutation rearrangement of column d in a temporary new column, ord and use it to reorder i.
library(dplyr)
df1 %>%
group_by(group) %>%
mutate(ord = order(d),
i = i[ord]) %>%
ungroup() %>%
select(-ord)
## A tibble: 6 x 5
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
original (wrong)
You can achieve this using dplyr and rank:
library(dplyr)
df1 %>% group_by(group) %>%
mutate(i = i[rev(rank(d))])
Edit
This question is actually trickier than it first seems and the original answer I posted is incorrect. The correct solution orders by i before subsetting by the rank of d. This gives OP's desired output which my previous answer did not (not paying attention!)
df1 %>% group_by(group) %>%
mutate(i = i[order(i)][rank(d)])
# A tibble: 6 x 5
# Groups: group [2]
# group i f r d
# <int> <int> <int> <int> <int>
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
There is some confusion regarding the expected output. Here I am showing a way to get both the versions of the output.
A base R using split and mapply
df$i <- c(mapply(function(x, y) sort(y)[x],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 5 5 3 2
#5 2 9 9 3 3
#6 2 3 1 3 1
Or another version
df$i <- c(mapply(function(x, y) y[order(x)],
split(df$d, df$group), split(df$i, df$group)))
df
# group i f r d
#1 1 9 8 3 3
#2 1 2 4 5 1
#3 1 4 2 2 2
#4 2 9 5 3 2
#5 2 5 9 3 3
#6 2 3 1 3 1
We can also use dplyr for this :
For 1st version
library(dplyr)
df %>%
group_by(group) %>%
mutate(i = sort(i)[d])
2nd version is already shown by #Rui using order
df %>%
group_by(group) %>%
mutate(i = i[order(d)])
An option with data.table
library(data.table)
setDT(df1)[, i := i[order(d)], group]
df1
# group i f r d
#1: 1 9 8 3 3
#2: 1 2 4 5 1
#3: 1 4 2 2 2
#4: 2 9 5 3 2
#5: 2 5 9 3 3
#6: 2 3 1 3 1
If we need the second version
setDT(df1)[, i := sort(i)[d], group]
data
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L), i = c(4L, 9L,
2L, 5L, 3L, 9L), f = c(8L, 4L, 2L, 5L, 9L, 1L), r = c(3L, 5L,
2L, 3L, 3L, 3L), d = c(3L, 1L, 2L, 2L, 3L, 1L)), class = "data.frame",
row.names = c(NA,
-6L))

concatenate data based on a certain sequence

my data look like this way, and variable day ranges from 1 to 232. This is just a shorter version of the data, the real data have over 20000000 rows with variable 'day' ranging from 1 to 232
day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2
and I have a vector that contains 1000 of randomly selected from sequences of variable day (1-232), say
df=c(3,4,1,2,...,4,1,3)
I want to create a new dataset that sorts based on the sequence. The we first extract day=3 from the data, and then extract day=4 after it, then extracr day=1 and rbind thereafter. For example, the first 4 sequence should look like this way:
day time
3 4
3 5
4 4
4 2
1 2
1 2
2 2
2 3
Base R method:
x <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), time = c(2L,
2L, 2L, 3L, 4L, 5L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
df <- c(3,4,1,2,4,1,3)
do.call("rbind.data.frame", lapply(df, function(i) subset(x, day == i)))
# day time
# 5 3 4
# 6 3 5
# 7 4 4
# 8 4 2
# 1 1 2
# 2 1 2
# 3 2 2
# 4 2 3
# 71 4 4
# 81 4 2
# 11 1 2
# 21 1 2
# 51 3 4
# 61 3 5
The use of do.call("rbind.data.frame", ...) is prone to typical data.frame instantiation, meaning if your real data has any columns of type character, you will likely want to do
do.call("rbind.data.frame", c(lapply(df, function(i) subset(x, day == i)), stringsAsFactors = FALSE))
Also, it could easily be replaced (without the risk of factors) with data.table::rbindlist or dplyr::bind_rows.
If I understand correctly, you can do this in a pretty straight forward manner with data.table():
library(data.table)
df <- fread(text = "day time
1 2
1 2
2 2
2 3
3 4
3 5
4 4
4 2", header = TRUE)
seqs <- data.table(day = c(3,4,1,2,4,1,3))
df[seqs, on = "day"]
#> day time
#> 1: 3 4
#> 2: 3 5
#> 3: 4 4
#> 4: 4 2
#> 5: 1 2
#> 6: 1 2
#> 7: 2 2
#> 8: 2 3
#> 9: 4 4
#> 10: 4 2
#> 11: 1 2
#> 12: 1 2
#> 13: 3 4
#> 14: 3 5
Created on 2019-02-10 by the reprex package (v0.2.1)

Split column name and its values into row and column [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 5 years ago.
Heyho,
I want to split my dataframe which looks like:
ID name1_attr1 name1_attr2 name2_attr2 ...
1 2 3 1
2 1 3 4
3 3 4 2
4 6 7 5
into:
ID name attr1 attr2
1 1 2 3
2 1 1 3
3 1 3 4
4 1 6 7
1 2 1
2 2 4
3 2 2
4 2 5
I am really not sure how to do that? Do you have any hint or start for me?
Thanks in advance :)
We can use melt from data.table which can take multiple patterns
library(data.table)
melt(setDT(df),measure = patterns("attr1", "attr2"),
value.name = c("attr1", "attr2"), variable.name = "name")
# ID name attr1 attr2
#1: 1 1 2 3
#2: 2 1 1 3
#3: 3 1 3 4
#4: 4 1 6 7
#5: 1 2 NA 1
#6: 2 2 NA 4
#7: 3 2 NA 2
#8: 4 2 NA 5
data
df <- structure(list(ID = 1:4, name1_attr1 = c(2L, 1L, 3L, 6L), name1_attr2 = c(3L,
3L, 4L, 7L), name2_attr2 = c(1L, 4L, 2L, 5L)), .Names = c("ID",
"name1_attr1", "name1_attr2", "name2_attr2"), class = "data.frame", row.names = c(NA,
-4L))
You can try a tidyverse solution and using a combination of gather and spread.
d <- read.table(text="ID name1_attr1 name1_attr2 name2_attr2
1 2 3 1
2 1 3 4
3 3 4 2
4 6 7 5", header=T)
library(tidyverse)
d %>%
gather(k, v, -ID) %>%
separate(k, c("name","b"), sep = "_") %>%
spread(b, v, fill = "") %>%
arrange(name)
ID name attr1 attr2
1 1 name1 2 3
2 2 name1 1 3
3 3 name1 3 4
4 4 name1 6 7
5 1 name2 1
6 2 name2 4
7 3 name2 2
8 4 name2 5

Add sequence along blocks [duplicate]

This question already has answers here:
Create counter within consecutive runs of values
(3 answers)
Closed 5 years ago.
I would like to have a sequence along each Blocks as such:
Blocks MySeq
1 1
1 2
2 1
2 2
1 1
1 2
1 3
1 4
3 1
3 2
3 3
4 1
4 2
4 3
4 4
Based on this I have try
myDf %>% dplyr::mutate(MySeq= seq(1:length(unique(Blocks)),rle(Blocks)$"lengths")
However, the sequence is not resetting with each new block. See below:
Blocks MySeq
1 1
1 2
2 1
2 2
1 3
1 4
1 5
1 6
3 1
3 2
3 3
4 1
4 2
4 3
4 4
How can I make a new sequence from each individual Blocks?
Try this
unlist(sapply(rle(df1$Blocks)$lengths,seq_len))
We can use rleid from data.table by grouping the rleid of 'Blocks' and assign (:=) 'MySeq' as the sequence of rows.
library(data.table)
setDT(df1)[, MySeq := seq_len(.N) , .(rleid(Blocks))]
df1
# Blocks MySeq
# 1: 1 1
# 2: 1 2
# 3: 2 1
# 4: 2 2
# 5: 1 1
# 6: 1 2
# 7: 1 3
# 8: 1 4
# 9: 3 1
#10: 3 2
#11: 3 3
#12: 4 1
#13: 4 2
#14: 4 3
#15: 4 4
Or if we are using base R, then sequence of lengths will get the expected output
sequence(rle(df1$Blocks)$lengths)
#[1] 1 2 1 2 1 2 3 4 1 2 3 1 2 3 4
data
df1 <- structure(list(Blocks = c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
3L, 3L, 4L, 4L, 4L, 4L)), .Names = "Blocks", row.names = c(NA,
-15L), class = "data.frame")

Resources