How to merge date and time into one variable - r

I want to have a date variable and a time variable as one variable like this 2012-05-02 07:30
This code does the job, but I need to get a new combined variable into the data frame, and this code shows it only in the console
as.POSIXct(paste(data$Date, data$Time), format="%Y-%m-%d %H:%M")
This code is supposed to combine time and date, but seemingly doesn't do that. In the column "Combined" only the date appears
data$Combined = as.POSIXct(paste0(data$Date,data$Time))
Here's the data
structure(list(Date = structure(c(17341, 18198, 17207, 17023,
17508, 17406, 18157, 17931, 17936, 18344), class = "Date"), Time = c("08:40",
"10:00", "22:10", "18:00", "08:00", "04:30", "20:00", "15:40",
"11:00", "07:00")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))

We could use ymd_hm function from lubridate package:
library(lubridate)
df$Date_time <- ymd_hm(paste0(df$Date, df$Time))
Date Time Date_time
<date> <chr> <dttm>
1 2017-06-24 08:40 2017-06-24 08:40:00
2 2019-10-29 10:00 2019-10-29 10:00:00
3 2017-02-10 22:10 2017-02-10 22:10:00
4 2016-08-10 18:00 2016-08-10 18:00:00
5 2017-12-08 08:00 2017-12-08 08:00:00
6 2017-08-28 04:30 2017-08-28 04:30:00
7 2019-09-18 20:00 2019-09-18 20:00:00
8 2019-02-04 15:40 2019-02-04 15:40:00
9 2019-02-09 11:00 2019-02-09 11:00:00
10 2020-03-23 07:00 2020-03-23 07:00:00

Related

R Populate a datetime column starting from last row

I have a dataframe that I need to add a column of datetime to. It is recording water levels every hour for 2 years. The original data frame has the wrong dates and times. i.e. the dates say 2015 instead of 2020. The date and month are also wrong. I do not know the original start date and time. However, I know the date and time of the very last recording (28-03-2022 14:00:00). I need to calculate a column from the bottom to the top to figure out the original start date.
Current Code
I have this code which populates the dates from a known start date (i.e. top down), but I want to population the data from down up. Is these a way to alter this or another solution??
# recalculate date to correct date
# set start dates
startDate5 <- as.POSIXct("2020-03-05 17:00:00")
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
# find length of dataframe to populate required rows.
len5 <- max(dataList$`HMB 5`$Rec)
lenMere <- max(dataList$`HM SSSI 4`$Rec)
# calculate new date column
dataList$`HMB 5`$DateTimeNew <- seq(startDate5, by='hour', length.out=len5)
dataList$`HM SSSI 4`$DateTimeNew <-seq(startDateMere, by='hour', length.out=lenMere)
Current dataframe - top 10 rows
structure(list(Rec = 1:10, DateTime = structure(c(1436202000,
1436205600, 1436209200, 1436212800, 1436216400, 1436220000, 1436223600,
1436227200, 1436230800, 1436234400), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Temperature = c(16.59, 16.49, 16.74, 17.14,
17.47, 17.71, 18.43, 18.78, 19.06, 19.18), Pressure = c(1050.64,
1050.86, 1051.28, 1051.56, 1051.48, 1051.2, 1051.12, 1050.83,
1050.83, 1050.76), DateTimeNew = structure(c(1594051200L, 1594054800L,
1594058400L, 1594062000L, 1594065600L, 1594069200L, 1594072800L,
1594076400L, 1594080000L, 1594083600L), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
Desired Output
This is what the desired output looks like: The date I know is correct for example is '2020-07-07 02:00:00' (e.g. value in 10th row, final column). And I need to figure out the rest of the column from this value.
NB: I do not actually know what the original start date is (2020-07-06 17:00:00) should be. Its just illustrative.
Here's a sequence method:
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
new_date = seq(startDateMere, length.out = nrow(data), by = "-1 hour")
data$result = rev(new_date)
data
# Rec DateTime Temperature Pressure DateTimeNew result
# 1 1 2015-07-06 17:00:00 16.59 1050.64 2020-07-06 12:00:00 2020-07-06 08:00:00
# 2 2 2015-07-06 18:00:00 16.49 1050.86 2020-07-06 13:00:00 2020-07-06 09:00:00
# 3 3 2015-07-06 19:00:00 16.74 1051.28 2020-07-06 14:00:00 2020-07-06 10:00:00
# 4 4 2015-07-06 20:00:00 17.14 1051.56 2020-07-06 15:00:00 2020-07-06 11:00:00
# 5 5 2015-07-06 21:00:00 17.47 1051.48 2020-07-06 16:00:00 2020-07-06 12:00:00
# 6 6 2015-07-06 22:00:00 17.71 1051.20 2020-07-06 17:00:00 2020-07-06 13:00:00
# 7 7 2015-07-06 23:00:00 18.43 1051.12 2020-07-06 18:00:00 2020-07-06 14:00:00
# 8 8 2015-07-07 00:00:00 18.78 1050.83 2020-07-06 19:00:00 2020-07-06 15:00:00
# 9 9 2015-07-07 01:00:00 19.06 1050.83 2020-07-06 20:00:00 2020-07-06 16:00:00
# 10 10 2015-07-07 02:00:00 19.18 1050.76 2020-07-06 21:00:00 2020-07-06 17:00:00

How to sum or average mulitple overlapping time intervals with lubridate?

I have a log of many years of meditation sittings, each with a start and end time. I want to create nice plots of my most active times of the day. (In other words, how often relatively I am meditating at 7am versus other times of day?)
ID StartTime EndTime
1 2679 2019-03-23 07:00:00 2019-03-23 07:30:00
2 2678 2019-03-22 07:00:00 2019-03-22 07:30:00
3 2677 2019-03-21 07:00:00 2019-03-21 07:30:00
4 2676 2019-03-20 07:00:00 2019-03-20 07:30:00
5 2675 2019-03-19 07:00:00 2019-03-19 07:30:00
6 2674 2019-03-18 09:00:00 2019-03-18 09:30:00
7 2673 2019-03-18 09:00:00 2019-03-18 09:30:00
8 2672 2019-03-18 09:00:00 2019-03-18 10:00:00
9 2671 2019-03-15 07:00:00 2019-03-15 08:00:00
10 2670 2019-03-14 07:00:00 2019-03-14 08:00:00
dput version:
structure(list(ID = 2679:2670, StartTime = structure(c(1553324400,
1553238000, 1553151600, 1553065200, 1552978800, 1552899600, 1552899600,
1552899600, 1552633200, 1552546800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), EndTime = structure(c(1553326200, 1553239800,
1553153400, 1553067000, 1552980600, 1552901400, 1552901400, 1552903200,
1552636800, 1552550400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-10L), class = "data.frame")
I can hack this by turning each day into an 1440 element array of included/excluded minutes and summing these up. Getting something like this
But I feel like there must be a better way. Probably using Interval object from lubridate and/or dplyr. But I haven't worked out how to do this.

Calculation of the maximum duration over threshold in R (timeseries)

I have a xts-timeseries temperature data in 5 min resolution.
head(dataset)
Time Temp
2016-04-26 10:00:00 6.877
2016-04-26 10:05:00 6.877
2016-04-26 10:10:00 6.978
2016-04-26 10:15:00 6.978
2016-04-26 10:20:00 6.978
I want to calculate the longest duration the temperature exceeds a certain threshold. (let's say 20 °C)
I want to calculate all the periods with their duration the temperature exceeds a certain threshold.
I create a data.frame from my xts-data:
df=data.frame(Time=index(dataset),coredata(dataset))
head(df)
Time Temp
1 2016-04-26 10:00:00 6.877
2 2016-04-26 10:05:00 6.877
3 2016-04-26 10:10:00 6.978
4 2016-04-26 10:15:00 6.978
5 2016-04-26 10:20:00 6.978
6 2016-04-26 10:25:00 7.079
then I create a subset with only the data that exceeds the threshold:
sub=(subset(x=df,subset = df$Temp>20))
head(sub)
Time Temp
7514 2016-05-22 12:05:00 20.043
7515 2016-05-22 12:10:00 20.234
7516 2016-05-22 12:15:00 20.329
7517 2016-05-22 12:20:00 20.424
7518 2016-05-22 12:25:00 20.615
7519 2016-05-22 12:30:00 20.805
But now im having trouble to calculate the duration of the event the temperature exceeds the threshold. I dont know how to identify a connected period and calculate their duration?
I would be happy if you have a solution for this question (it's my first thread so please excuse minor mistakes) If you need more information on my data, feel free to ask.
This may work. I take as example this data:
df <- structure(list(Time = structure(c(1463911500, 1463911800, 1463912100,
1463912400, 1463912700, 1463913000), class = c("POSIXct", "POSIXt"
), tzone = ""), Temp = c(20.043, 20.234, 6.329, 20.424, 20.615,
20.805)), row.names = c(NA, -6L), class = "data.frame")
> df
Time Temp
1 2016-05-22 12:05:00 20.043
2 2016-05-22 12:10:00 20.234
3 2016-05-22 12:15:00 6.329
4 2016-05-22 12:20:00 20.424
5 2016-05-22 12:25:00 20.615
6 2016-05-22 12:30:00 20.805
library(dplyr)
df %>%
# add id for different periods/events
mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>%
# keep only periods with high temperature
filter(tmp_Temp) %>%
# for each period/event, get its duration
group_by(id) %>%
summarise(event_duration = difftime(last(Time), first(Time)))
id event_duration
<int> <time>
1 1 5 mins
2 3 10 mins

Changing quarterly data into hourly data

I have data as below. It is from 01.01.2015~31.12.2015.
The data is in quarterly base. But I want to add, for example, like 0:00, 0:15, 0:30, 0:45 together to make a hour data. How can I make this into hourly data?
Thank you in advance.
Date Hour Day-ahead Total Load Forecast [MW] - Germany (DE)
01.01.2015 0:00 42955
01.01.2015 0:15 42412
01.01.2015 0:30 41901
01.01.2015 0:45 41355
01.01.2015 1:00 40710
01.01.2015 1:15 40204
01.01.2015 1:30 39640
01.01.2015 1:45 39324
01.01.2015 2:00 39002
01.01.2015 2:15 38869
01.01.2015 2:30 38783
01.01.2015 2:45 38598
01.01.2015 3:00 38626
01.01.2015 3:15 38459
01.01.2015 3:30 38414
...
> dput(head(new3))
structure(list(Date = structure(c(16436, 16436, 16436, 16436,
16436, 16436), class = "Date"), Hour = c("0:00", "0:15", "0:30",
"0:45", "1:00", "1:15"), Dayahead = c("42955", "42412", "41901",
"41355", "40710", "40204"), Actual = c(42425L, 42021L, 42068L,
41874L, 41230L, 40810L), Difference = c("530", "391", "-167",
"-519", "-520", "-606")), .Names = c("Date", "Hour", "Dayahead",
"Actual", "Difference"), row.names = c(NA, 6L), class = "data.frame")
I've created a small data set for example.
df <- read.csv(text = "Date,Hour,Val
2013-06-03,06:01,0
2013-06-03,12:08,-1
2013-06-03,12:48,3.3
2013-06-03,13:58,2
2013-06-03,13:01,12
2013-06-03,13:08,3
2013-06-03,14:48,4
2013-06-03,14:58,8
2013-06-03,15:01,9.2
2013-06-03,15:08,12.3
2013-06-03,16:48,0
2013-06-03,19:58,-10", stringsAsFactors = FALSE)
With group_by and summarize from dplyr and floor_date from lubridate this can be done:
library(dplyr)
library(lubridate)
df %>%
group_by(Hours=floor_date(ymd_hm(paste(Date, Hour)), "1 hour")) %>%
summarize(Val=sum(Val))
# # A tibble: 7 x 2
# Hours Val
# <dttm> <dbl>
# 1 2013-03-06 06:00:00 0
# 2 2013-03-06 12:00:00 2.30
# 3 2013-03-06 13:00:00 17.0
# 4 2013-03-06 14:00:00 12.0
# 5 2013-03-06 15:00:00 21.5
# 6 2013-03-06 16:00:00 0
# 7 2013-03-06 19:00:00 -10.0
lets say your data frame is called df
> head(df)
Date Hour Forecast
1 01.01.2015 12:00:00 AM 42955
2 01.01.2015 12:15:00 AM 42412
3 01.01.2015 12:30:00 AM 41901
4 01.01.2015 12:45:00 AM 41355
5 01.01.2015 01:00:00 AM 40710
6 01.01.2015 01:15:00 AM 40204
you can aggregate your forecast to hourly basis by the following code
library(lubridate)
df$DateTime=paste(df$Date,df$Hour,sep=" ")%>%dmy_hms%>%floor_date(unit="hour")
result<-ddply(df,.(DateTime),summarize,x=sum(Forecast))
> result
DateTime x
1 2015-01-01 00:00:00 168623
2 2015-01-01 01:00:00 159878
3 2015-01-01 02:00:00 155252
4 2015-01-01 03:00:00 115499
variable x has the sum of forecasts for every hour. Timestamp 00:00:00 aggregates times 00:00, 00:15, 00:30, 00:45.

Check if Posixct time is within interval

The problem:
I have two dataframes that I would like to merge depending on the date/time of one dataframe being in the interval of the other dataframe.
traffic: Date and Time (Posixct), Frequency
mydata: Interval, Sum of Frequency
I would now like to calculate if the Posixct time from traffic is within the interval of mydata and if this is TRUE I would like to count the frequency in the column "Sum of Frequencies" in mydata.
The two problems, that I encountered:
1. traffic data frame has significantly more rows than mydata. I dont know how to tell R to loop through every observation in traffic to check for one row in mydata.
There can be more than one observation fitting in the frequency interval of mydata. I want R to add up all frequencies of the different traffic observations to get a total score of frequencies. Also the intervals are overlapping.
Here is the data:
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.frame(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
mydata$SumFrequency <- NA
The expected outcome should be something like this:
mydata$SumFrequency <- c(24, 2, 2)
head(mydata)
I tried int_overlaps from package lubridate.
Any tips on how to solve this are higly appreciated!
A short solution with foverlaps from the data.table package:
mydata <- data.table(DateTime1, DateTime2, key = c("DateTime1", "DateTime2"))
traffic <- data.table(start = DateTime, end = DateTime, Frequency, key = c("start","end"))
foverlaps(traffic, mydata, type="within", nomatch=0L)[, .(sumFreq = sum(Frequency)),
by = .(DateTime1, DateTime2)]
which gives:
DateTime1 DateTime2 sumFreq
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2
On a data.table approach with between to filter traffic dataset on time:
setDT(traffic)
setDT(mydata)
mydata[,SumFrequency := as.numeric(SumFrequency)] # coerce logical to numeric for next step.
mydata[,SumFrequency := sum( traffic[ DateTime %between% c(DateTime1, DateTime2), Frequency] ), by=1:nrow(mydata)]
which give:
DateTime1 DateTime2 Interval SumFrequency
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2
If there's a lot of row in mydata, it could be better to create an index column and use it in by clause:
mydata[, idx := .I]
mydata[, SumFrequency := sum( traffic[DateTime %between% c(DateTime1, DateTime2),Frequency] ),by=idx]
And this gives:
DateTime1 DateTime2 Interval SumFrequency idx
1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24 1
2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2 2
3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2 3
I see two solutions :
With data.frame and plyr
You could do it using %within% function in lubridate and with a for-loop or using plyr loop functions like dlply
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.frame(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
DateTime1 <- as.POSIXct(DateTime1)
DateTime2 <- as.POSIXct(DateTime2)
mydata <- data.frame(DateTime1, DateTime2)
mydata$Interval <- as.interval(DateTime1, DateTime2)
library(plyr)
# Create a group-by variable
mydata$NumInt <- 1:nrow(mydata)
mydata$SumFrequency <- dlply(mydata, .(NumInt),
function(row){
sum(
traffic[traffic$DateTime %within% row$Interval, "Frequency"]
)
})
mydata
#> DateTime1 DateTime2
#> 1 2014-11-01 04:00:00 2014-11-01 04:15:00
#> 2 2015-08-01 04:03:00 2015-08-01 04:13:00
#> 3 2015-08-01 14:00:00 2015-08-01 14:15:00
#> Interval NumInt SumFrequency
#> 1 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 1 24
#> 2 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2 2
#> 3 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 3 2
With data.table and functions foverlaps
data.table has implemented a function for overlapping joins that you could use in your case with a little trick.
This functions is foverlaps (I uses below data.table 1.9.6)
(see How to perform join over date ranges using data.table? and this presentation)
Notice that you do not need to create interval with lubridate
DateTime <- c("2014-11-01 04:00:00", "2014-11-01 04:03:00", "2014-11-01 04:06:00", "2014-11-01 04:08:00", "2014-11-01 04:10:00", "2014-11-01 04:12:00", "2015-08-01 04:13:00", "2015-08-01 04:45:00", "2015-08-01 14:15:00", "2015-08-01 14:13:00")
DateTime <- as.POSIXct(DateTime)
Frequency <- c(1,2,3,5,12,1,2,2,1,1)
traffic <- data.table(DateTime, Frequency)
library(lubridate)
DateTime1 <- c("2014-11-01 04:00:00", "2015-08-01 04:03:00", "2015-08-01 14:00:00")
DateTime2 <- c("2014-11-01 04:15:00", "2015-08-01 04:13:00", "2015-08-01 14:15:00")
mydata <- data.table(DateTime1 = as.POSIXct(DateTime1), DateTime2 = as.POSIXct(DateTime2))
# Use function `foverlaps` for overlapping joins
# Here's the trick : create a dummy variable to artificially have an interval
traffic[, dummy:=DateTime]
setkey(mydata, DateTime1, DateTime2)
# do the join
mydata2 <- foverlaps(traffic, mydata, by.x=c("DateTime", "dummy"), type ="within", nomatch=0L)[, dummy := NULL][]
mydata2
#> DateTime1 DateTime2 DateTime Frequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 1
#> 2: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:03:00 2
#> 3: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:06:00 3
#> 4: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:08:00 5
#> 5: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:10:00 12
#> 6: 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:12:00 1
#> 7: 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:13:00 2
#> 8: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:15:00 1
#> 9: 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:13:00 1
# summarise with a sum by grouping by each line of mydata
setkeyv(mydata2, key(mydata))
mydata2[mydata, .(SumFrequency = sum(Frequency)), by = .EACHI]
#> DateTime1 DateTime2 SumFrequency
#> 1: 2014-11-01 04:00:00 2014-11-01 04:15:00 24
#> 2: 2015-08-01 04:03:00 2015-08-01 04:13:00 2
#> 3: 2015-08-01 14:00:00 2015-08-01 14:15:00 2
As far as point 2 is concerned you can use aggregate for instance
aggData <- aggregate(traffic$Frequency~format(traffic$DateTime, "%Y%m%d h:m"), data=traffic, sum)
This sums all frequencies in minute intervals.
And for point 1. Wouldn't a merge work?
merge(x = myData, y = aggData, by = "DateTime", all.x = TRUE)
The outer merge is explained here
Using a for.loop we could do something like this:
for(i in 1:nrow(mydata)) {
mydata$SumFrequency[i] <- sum(traffic$Frequency[traffic$DateTime %within% mydata$Interval[i]])
}
> mydata
# DateTime1 DateTime2 Interval SumFrequency
#1 2014-11-01 04:00:00 2014-11-01 04:15:00 2014-11-01 04:00:00 CET--2014-11-01 04:15:00 CET 24
#2 2015-08-01 04:03:00 2015-08-01 04:13:00 2015-08-01 04:03:00 CEST--2015-08-01 04:13:00 CEST 2
#3 2015-08-01 14:00:00 2015-08-01 14:15:00 2015-08-01 14:00:00 CEST--2015-08-01 14:15:00 CEST 2

Resources