Incrementally add seconds of a timestamp column grouped by ID in R - r

I have a dataframe that is essentially a time series data.
Timestamp <- c("1/27/2015 18:28:16","1/27/2015 18:28:17","1/27/2015 18:28:19","1/27/2015 18:28:20","1/27/2015 18:28:23","1/28/2015 22:43:08","1/28/2015 22:43:09","1/28/2015 22:43:13","1/28/2015 22:43:15","1/28/2015 22:43:16"
)
ID <- c("A","A","A","A","A","B","B","B","B","B")
v1<- c(1.70,1.71,1.77,1.79,1.63,7.20,7.26,7.16,7.18,7.18)
df <- data.frame(Timestamp ,ID,v1)
Timestamp ID v1
1/27/2015 18:28:16 A 1.70
1/27/2015 18:28:17 A 1.71
1/27/2015 18:28:19 A 1.77
1/27/2015 18:28:20 A 1.79
1/27/2015 18:28:23 A 1.63
1/28/2015 22:43:08 B 7.20
1/28/2015 22:43:09 B 7.26
1/28/2015 22:43:13 B 7.16
1/28/2015 22:43:15 B 7.18
1/28/2015 22:43:16 B 7.18
Since I dont really care about the timestamp, I was thinking of creating a column called interval to plot this data in one plot.
I am wrongly creating the interval column by doing this
df$interval <- cut(df$Timestamp, breaks="sec")
I want to incrementally add the "secs" of the timestamp and put it in the interval column and this should by grouped by ID. By this I mean, Everytime it has a new ID, the interval column resets to 1 and then incrementally adds the timestamp (secs).
My desired output
Timestamp ID v1 Interval
1/27/2015 18:28:16 A 1.70 1
1/27/2015 18:28:17 A 1.71 2
1/27/2015 18:28:19 A 1.77 4
1/27/2015 18:28:20 A 1.79 5
1/27/2015 18:28:23 A 1.63 8
1/28/2015 22:43:08 B 7.20 1
1/28/2015 22:43:09 B 7.26 2
1/28/2015 22:43:13 B 7.16 6
1/28/2015 22:43:15 B 7.18 8
1/28/2015 22:43:16 B 7.18 9
I also would like to plot this using ggplot with interval vs v1 by ID and so we get 2 time series in the same plot. I will then extract features from it.
Please help me how to work around this problem so that I can apply it to a larger dataset.

One solution with data.table:
For the data:
library(data.table)
df <- as.data.table(df)
df$Timestamp <- as.POSIXct(df$Timestamp, format='%m/%d/%Y %H:%M:%S')
df[, Interval := as.numeric(difftime(Timestamp, .SD[1, Timestamp], units='secs') + 1) , by=ID]
which outputs:
> df
Timestamp ID v1 Interval
1: 2015-01-27 18:28:16 A 1.70 1
2: 2015-01-27 18:28:17 A 1.71 2
3: 2015-01-27 18:28:19 A 1.77 4
4: 2015-01-27 18:28:20 A 1.79 5
5: 2015-01-27 18:28:23 A 1.63 8
6: 2015-01-28 22:43:08 B 7.20 1
7: 2015-01-28 22:43:09 B 7.26 2
8: 2015-01-28 22:43:13 B 7.16 6
9: 2015-01-28 22:43:15 B 7.18 8
10: 2015-01-28 22:43:16 B 7.18 9
Then for ggplot:
library(ggplot2)
ggplot(df, aes(x=Interval, y=v1, color=ID)) + geom_line()
and the graph:

Related

Group by weekly data and summarize by month in R with dplyr

I have a dataset of weekly mortgage rate data.
The data looks very simple:
library(tibble)
library(lubridate)
df <- tibble(
Date = as_date(c("2/7/2008 ", "2/14/2008", "2/21/2008", "2/28/2008", "3/6/2008"), format = "%m/%d/%Y"),
Rate = c(5.67, 5.72, 6.04, 6.24, 6.03)
)
I am trying to group it and summarize by month.
This blogpost and this answer are not what I want, because they just add the month column.
They give me the output:
month Date summary_variable
2008-02-01 2008-02-07 5.67
2008-02-01 2008-02-14 5.72
2008-02-01 2008-02-21 6.04
2008-02-01 2008-02-28 6.24
My desired output (ideally the last day of the month):
Month Average rate
2/28/2008 6
3/31/2008 6.1
4/30/2008 5.9
In the output above I put random numbers, not real calculations.
We can get the month extracted as column and do a group by mean
library(dplyr)
library(lubridate)
library(zoo)
df1 %>%
group_by(Month = as.Date(as.yearmon(mdy(DATE)), 1)) %>%
summarise(Average_rate = mean(MORTGAGE30US))
-output
# A tibble: 151 x 2
# Month Average_rate
# <date> <dbl>
# 1 2008-02-29 5.92
# 2 2008-03-31 5.97
# 3 2008-04-30 5.92
# 4 2008-05-31 6.04
# 5 2008-06-30 6.32
# 6 2008-07-31 6.43
# 7 2008-08-31 6.48
# 8 2008-09-30 6.04
# 9 2008-10-31 6.2
#10 2008-11-30 6.09
# … with 141 more rows

AUC based on Rehimann Sum in R

I am dealing with a dataset with dates and various response values at different time intervals as shown below
Id Date Response
1 2008-03-12 4.88
1 2009-06-06 5.39
2 2015-10-22 8.61
2 2019-09-26 6.20
3 2006-09-28 7.40
3 2009-07-15 7.25
3 2011-01-19 9.50
Dates are X values, Response y values.
I am interested in estimating the AUC for each Id. Any suggestions for accomplishing this is much appreciated.

R - find change across same days of week for multiple variables and agrregate

With data like below
text = "
date,weekday,hour,a,b
12/2/2019,Mon,8,18.17183824,0.017741935
12/2/2019,Mon,9,18.11228506,0.020967742
12/9/2019,Mon,8,16.77932274,0.020322581
12/9/2019,Mon,9,16.97327971,0.019677419
12/3/2019,Tue,8,18.17183824,0.017741935
12/3/2019,Tue,10,18.11228506,0.020967742
12/10/2019,Tue,8,16.77932274,0.020322581
12/10/2019,Tue,10,16.97327971,0.019677419
"
df = read.table(textConnection(text), sep=",", header = T)
Need to find the change in the variables a and b on a weekday to weekday basis.
Example for a, the change would be calculated as follows
Change for hour 8 on Mondays = (16.77932274 - 18.17183824)/18.17183824
Change for hour 9 on Mondays = (16.97327971 - 18.11228506)/18.11228506
Change for hour 8 on Tuesdays = (16.77932274 - 18.17183824)/18.17183824
Change for hour 10 on Tuesdays = (16.97327971 - 18.11228506)/18.11228506
Average change for variable a in the dataset = Avg of 1,2,3,4
Would appreciate help
For one variable, I would have converted from long to wide format and computed gain for each pair of same weekdays by adding week+number as a label for values for a. I find the challenge with doing it for multiple variables - a and b here. My real data has more than these 2 variables
We can group_by weekday and hour, use lead/lag to get next/previous value and use mutate_at to apply it for multiple columns.
library(dplyr)
df %>%
group_by(weekday, hour) %>%
mutate_at(vars(a:b), list(change = ~(lead(.) - .)/.))
# date weekday hour a b a_change b_change
# <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#1 12/2/2019 Mon 8 18.2 0.0177 -0.0766 0.145
#2 12/2/2019 Mon 9 18.1 0.0210 -0.0629 -0.0615
#3 12/9/2019 Mon 8 16.8 0.0203 NA NA
#4 12/9/2019 Mon 9 17.0 0.0197 NA NA
#5 12/3/2019 Tue 8 18.2 0.0177 -0.0766 0.145
#6 12/3/2019 Tue 10 18.1 0.0210 -0.0629 -0.0615
#7 12/10/2019 Tue 8 16.8 0.0203 NA NA
#8 12/10/2019 Tue 10 17.0 0.0197 NA NA
Here is an option with data.table
library(data.table)
setDT(df)[, c('a_change', 'b_change') :=
(shift(.SD, type = 'lead') - .SD)/.SD , .(weekday, hour), .SDcols = a:b]

Select rows of a certain level of a category conditioned to that there is a previous row of another level of the same category in less than 180s

I have a dataframe df that summarizes activity or depth for several fish individuals (ID) over time (DateTime). Here an example:
df1<- data.frame(ID=c(1,1,2,3,1,2,3,1,2,2,3,1,3,2,3),
DateTime=c("2017-05-08 10:15:23","2017-05-08 10:19:31","2017-05-08 10:11:12","2017-05-08 10:02:23","2017-05-08 10:21:32","2017-05-08 10:15:52","2017-05-08 10:13:23","2017-05-08 10:22:19","2017-05-08 10:19:42","2017-05-08 10:21:27","2017-05-08 10:16:07","2017-05-08 10:24:53","2017-05-08 10:28:39","2017-05-08 10:23:48","2017-05-08 10:33:01"),
DataType=c("Activity","Depth","Depth","Activity","Activity","Activity","Depth","Depth","Activity","Depth","Activity","Depth","Depth","Activity","Activity"),
Value=c(0.89,24,19,1.8,1.1,0.7,17,28,2.1,20,1.35,12,19,0.4,0.97))
df1
ID DateTime DataType Value
1 1 2017-05-08 10:15:23 Activity 0.89
2 1 2017-05-08 10:19:31 Depth 24.00
3 2 2017-05-08 10:11:12 Depth 19.00
4 3 2017-05-08 10:02:23 Activity 1.80
5 1 2017-05-08 10:21:32 Activity 1.10
6 2 2017-05-08 10:15:52 Activity 0.70
7 3 2017-05-08 10:13:23 Depth 17.00
8 1 2017-05-08 10:22:19 Depth 28.00
9 2 2017-05-08 10:19:42 Activity 2.10
10 2 2017-05-08 10:21:27 Depth 20.00
11 3 2017-05-08 10:16:07 Activity 1.35
12 1 2017-05-08 10:24:53 Depth 12.00
13 3 2017-05-08 10:28:39 Depth 19.00
14 2 2017-05-08 10:23:48 Activity 0.40
15 3 2017-05-08 10:33:01 Activity 0.97
For methodological reasons, I need to select activity values that match one condition: there is a previous depth data in less than 3 minutes for the same individual. That is, I need activity data for which I have previous depth data in less than 3 minutes. I would need the resulting dataframe to have those activity values that meet this condition as well as the previous depth values.
I would expect something like this:
> df2
ID DateTime DataType Value
1 1 2017-05-08 10:19:31 Depth 24.00
2 1 2017-05-08 10:21:32 Activity 1.10 # Activity value in less than 3 minutes with regard a depth data
3 2 2017-05-08 10:21:27 Depth 20.00
4 2 2017-05-08 10:23:48 Activity 0.40 # Activity value in less than 3 minutes with regard a depth data
5 3 2017-05-08 10:13:23 Depth 17.00
6 3 2017-05-08 10:16:07 Activity 1.35 # Activity value in less than 3 minutes with regard a depth data
Does anyone know how to do it?
We first convert DateTime to POSIXct type, create a new column which has latest "Depth" time, subtract the "Depth" time with current DateTime for each group (ID) and select rows where DataType == 'Activity' and the time difference is less than 180 seconds.
library(dplyr)
df1 %>%
mutate(DateTime = as.POSIXct(DateTime),
diffTime = replace(DateTime, DataType != "Depth", NA)) %>%
arrange(ID, DateTime) %>%
group_by(ID) %>%
tidyr::fill(diffTime) %>%
mutate(diffTime = difftime(DateTime, diffTime, units = "secs")) %>%
slice({i1 <- which(DataType == 'Activity' & diffTime < 180);c(i1-1, i1)}) %>%
select(-diffTime)
# ID DateTime DataType Value
# <dbl> <dttm> <fct> <dbl>
#1 1 2017-05-08 10:19:31 Depth 24
#2 1 2017-05-08 10:21:32 Activity 1.1
#3 2 2017-05-08 10:21:27 Depth 20
#4 2 2017-05-08 10:23:48 Activity 0.4
#5 3 2017-05-08 10:13:23 Depth 17
#6 3 2017-05-08 10:16:07 Activity 1.35
Here is an option using a non-equi join in data.table
and for each row of Depth with a match, rbind the Depth row with an Activity row that is within 3mins:
library(data.table)
cols <- names(df1)
setDT(df1)[, DateTime := as.POSIXct(DateTime, format="%Y-%m-%d %H:%M:%S")][,
c("start", "end") := .(DateTime, DateTime + 3*60)]
ans <- df1[DataType=="Activity"][df1[DataType=="Depth"],
on=.(ID, start>=start, start<=end), nomatch=0L,
by=.EACHI, rbindlist(use.names=FALSE,
list(mget(paste0("i.", cols)), mget(cols)))
][, (1:3) := NULL] #remove unwanted columns
#set column names as desired
setnames(ans, gsub("i.","", names(ans), fixed=TRUE))[]
output:
ID DateTime DataType Value
1: 1 2017-05-08 10:19:31 Depth 24.00
2: 1 2017-05-08 10:21:32 Activity 1.10
3: 3 2017-05-08 10:13:23 Depth 17.00
4: 3 2017-05-08 10:16:07 Activity 1.35
5: 2 2017-05-08 10:21:27 Depth 20.00
6: 2 2017-05-08 10:23:48 Activity 0.40

How to convert daywise(daily) data to monthly data using R? [duplicate]

This question already has answers here:
Aggregate Daily Data to Month/Year intervals
(9 answers)
Closed 7 years ago.
I have day-wise data of interest rate of 15 years from 01-01-2000 to 01-01-2015.
I want to convert this data to monthly data, which only having month and year.
I want to take mean of the values of all the days in a month and make it one value of that month.
How can I do this in R.
> str(mibid)
'data.frame': 4263 obs. of 6 variables:
$ Days: int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: Date, format: "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
$ BID : num 8.82 8.82 8.88 8.79 8.78 8.8 8.81 8.82 8.86 8.78 ...
$ I.S : num 0.092 0.0819 0.0779 0.0801 0.074 0.0766 0.0628 0.0887 0.0759 0.073 ...
$ BOR : num 9.46 9.5 9.52 9.36 9.33 9.37 9.42 9.39 9.4 9.33 ...
$ R.S : num 0.0822 0.0817 0.0828 0.0732 0.084 0.0919 0.0757 0.0725 0.0719 0.0564 ...
> head(mibid)
Days Date BID I.S BOR R.S
1 1 2000-01-03 8.82 0.0920 9.46 0.0822
2 2 2000-01-04 8.82 0.0819 9.50 0.0817
3 3 2000-01-05 8.88 0.0779 9.52 0.0828
4 4 2000-01-06 8.79 0.0801 9.36 0.0732
5 5 2000-01-07 8.78 0.0740 9.33 0.0840
6 6 2000-01-08 8.80 0.0766 9.37 0.0919
>
I'd do this with xts:
set.seed(21)
mibid <- data.frame(Date=Sys.Date()-100:1,
BID=rnorm(100, 8, 0.1), I.S=rnorm(100, 0.08, 0.01),
BOR=rnorm(100, 9, 0.1), R.S=rnorm(100, 0.08, 0.01))
require(xts)
# convert to xts
xmibid <- xts(mibid[,-1], mibid[,1])
# aggregate
agg_xmibid <- apply.monthly(xmibid, colMeans)
# convert back to data.frame
agg_mibid <- data.frame(Date=index(agg_xmibid), agg_xmibid, row.names=NULL)
head(agg_mibid)
# Date BID I.S BOR R.S
# 1 2015-04-30 8.079301 0.07189111 9.074807 0.06819096
# 2 2015-05-31 7.987479 0.07888328 8.999055 0.08090253
# 3 2015-06-30 8.043845 0.07885779 9.018338 0.07847999
# 4 2015-07-31 7.990822 0.07799489 8.980492 0.08162038
# 5 2015-08-07 8.000414 0.08535749 9.044867 0.07755017
A small example of how this might be done using dplyr and lubridate
set.seed(321)
dat <- data.frame(day=seq.Date(as.Date("2010-01-01"), length.out=200, by="day"),
x = rnorm(200),
y = rexp(200))
head(dat)
day x y
1 2010-01-01 1.7049032 2.6286754
2 2010-01-02 -0.7120386 0.3916089
3 2010-01-03 -0.2779849 0.1815379
4 2010-01-04 -0.1196490 0.1234461
5 2010-01-05 -0.1239606 2.2237404
6 2010-01-06 0.2681838 0.3217511
require(dplyr)
require(lubridate)
dat %>%
mutate(year = year(day),
monthnum = month(day),
month = month(day, label=T)) %>%
group_by(year, month) %>%
arrange(year, monthnum) %>%
select(-monthnum) %>%
summarise(x = mean(x),
y = mean(y))
Source: local data frame [7 x 4]
Groups: year
year month x y
1 2010 Jan 0.02958633 0.9387509
2 2010 Feb 0.07711820 1.0985411
3 2010 Mar -0.06429982 1.2395438
4 2010 Apr -0.01787658 1.3627864
5 2010 May 0.19131861 1.1802712
6 2010 Jun -0.04894075 0.8224855
7 2010 Jul -0.22410057 1.1749863
Another option is using data.table which has several very convenient datetime functions. Using the data of #SamThomas:
library(data.table)
setDT(dat)[, lapply(.SD, mean), by=.(year(day), month(day))]
this gives:
year month x y
1: 2010 1 0.02958633 0.9387509
2: 2010 2 0.07711820 1.0985411
3: 2010 3 -0.06429982 1.2395438
4: 2010 4 -0.01787658 1.3627864
5: 2010 5 0.19131861 1.1802712
6: 2010 6 -0.04894075 0.8224855
7: 2010 7 -0.22410057 1.1749863
On the data of #JoshuaUlrich:
setDT(mibid)[, lapply(.SD, mean), by=.(year(Date), month(Date))]
gives:
year month BID I.S BOR R.S
1: 2015 5 7.997178 0.07794925 8.999625 0.08062426
2: 2015 6 8.034805 0.07940600 9.019823 0.07823314
3: 2015 7 7.989371 0.07822263 8.996015 0.08195401
4: 2015 8 8.010541 0.08364351 8.982793 0.07748399
If you want the names of the months instead of numbers, you will have to include [, day:=as.IDate(day)] after the setDT() part and use months instead of month:
setDT(mibid)[, Date:=as.IDate(Date)][, lapply(.SD, mean), by=.(year(Date), months(Date))]
Note: Especially on larger datasets, data.table will probably be (a lot) faster then the other two solutions.

Resources