Convert datetime column time zonefrom CET to CEST in R - r

I have two columns with DateTime (e.g., 2021-03-01 01:30:26) and Temperature from start of March to end of May, however the time is all in UTC+1, and I need the DateTime to be converted to CEST (summertime; UTC+2) at the appropriate time shift (which is October 28th 2 AM turns into 3 AM).
Anybody know how to code for this in R?
Example data:
#dput(head(Re1292))
Re1292 <- structure(list(DateTime = structure(c(1615899000, 1615899600, 1615900200, 1615900800, 1615901400, 1615902000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Temp = c(5.9, 5.1, 4.9, 4.9, 4.9, 4.9)),
row.names = c(NA, 6L), class = "data.frame")
# DateTime Temp
# 1 2021-03-16 12:50:00 5.9
# 2 2021-03-16 13:00:00 5.1
# 3 2021-03-16 13:10:00 4.9
# 4 2021-03-16 13:20:00 4.9
# 5 2021-03-16 13:30:00 4.9
# 6 2021-03-16 13:40:00 4.9

I found out that adding tz = Copenhagen within lubridate fixed this problem!
E.g.
mutate(DateTime=lubridate::ymd_hms(DateTime, tz="Europe/Copenhagen"))

Related

R Populate a datetime column starting from last row

I have a dataframe that I need to add a column of datetime to. It is recording water levels every hour for 2 years. The original data frame has the wrong dates and times. i.e. the dates say 2015 instead of 2020. The date and month are also wrong. I do not know the original start date and time. However, I know the date and time of the very last recording (28-03-2022 14:00:00). I need to calculate a column from the bottom to the top to figure out the original start date.
Current Code
I have this code which populates the dates from a known start date (i.e. top down), but I want to population the data from down up. Is these a way to alter this or another solution??
# recalculate date to correct date
# set start dates
startDate5 <- as.POSIXct("2020-03-05 17:00:00")
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
# find length of dataframe to populate required rows.
len5 <- max(dataList$`HMB 5`$Rec)
lenMere <- max(dataList$`HM SSSI 4`$Rec)
# calculate new date column
dataList$`HMB 5`$DateTimeNew <- seq(startDate5, by='hour', length.out=len5)
dataList$`HM SSSI 4`$DateTimeNew <-seq(startDateMere, by='hour', length.out=lenMere)
Current dataframe - top 10 rows
structure(list(Rec = 1:10, DateTime = structure(c(1436202000,
1436205600, 1436209200, 1436212800, 1436216400, 1436220000, 1436223600,
1436227200, 1436230800, 1436234400), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Temperature = c(16.59, 16.49, 16.74, 17.14,
17.47, 17.71, 18.43, 18.78, 19.06, 19.18), Pressure = c(1050.64,
1050.86, 1051.28, 1051.56, 1051.48, 1051.2, 1051.12, 1050.83,
1050.83, 1050.76), DateTimeNew = structure(c(1594051200L, 1594054800L,
1594058400L, 1594062000L, 1594065600L, 1594069200L, 1594072800L,
1594076400L, 1594080000L, 1594083600L), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
Desired Output
This is what the desired output looks like: The date I know is correct for example is '2020-07-07 02:00:00' (e.g. value in 10th row, final column). And I need to figure out the rest of the column from this value.
NB: I do not actually know what the original start date is (2020-07-06 17:00:00) should be. Its just illustrative.
Here's a sequence method:
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
new_date = seq(startDateMere, length.out = nrow(data), by = "-1 hour")
data$result = rev(new_date)
data
# Rec DateTime Temperature Pressure DateTimeNew result
# 1 1 2015-07-06 17:00:00 16.59 1050.64 2020-07-06 12:00:00 2020-07-06 08:00:00
# 2 2 2015-07-06 18:00:00 16.49 1050.86 2020-07-06 13:00:00 2020-07-06 09:00:00
# 3 3 2015-07-06 19:00:00 16.74 1051.28 2020-07-06 14:00:00 2020-07-06 10:00:00
# 4 4 2015-07-06 20:00:00 17.14 1051.56 2020-07-06 15:00:00 2020-07-06 11:00:00
# 5 5 2015-07-06 21:00:00 17.47 1051.48 2020-07-06 16:00:00 2020-07-06 12:00:00
# 6 6 2015-07-06 22:00:00 17.71 1051.20 2020-07-06 17:00:00 2020-07-06 13:00:00
# 7 7 2015-07-06 23:00:00 18.43 1051.12 2020-07-06 18:00:00 2020-07-06 14:00:00
# 8 8 2015-07-07 00:00:00 18.78 1050.83 2020-07-06 19:00:00 2020-07-06 15:00:00
# 9 9 2015-07-07 01:00:00 19.06 1050.83 2020-07-06 20:00:00 2020-07-06 16:00:00
# 10 10 2015-07-07 02:00:00 19.18 1050.76 2020-07-06 21:00:00 2020-07-06 17:00:00

How to create new column with timestamps, using old column of date/time in 24 hour format

I have a data frame with a column that has date/time. I have extracted the month and day to create 2 new columns, but I am trying to create another column with time and it won't work now that I've converted my time to 24 hour format
Here is what the column in my data frame looks like
# A tibble: 6 x 1
df
<dttm>
1 2021-06-01 08:00:00
2 2021-06-01 08:15:00
3 2021-06-01 08:30:00
4 2021-06-01 08:45:00
5 2021-11-25 21:46:40
6 2021-11-25 22:01:40
Here is the data frame column
df = structure(list(df = structure(c(1622552400, 1622553300, 1622554200,
1622555100, 1637894800, 1637895700), class = c("POSIXct", "POSIXt"
), tzone = "EST")), row.names = c(NA, 6L), class = "data.frame")
when my time was in 12 hour format I used this code and it worked
mutate(month = lubridate::month(Date_Time_GMT_3),
year = lubridate::year(Date_Time_GMT_3),
day = lubridate::day(Date_Time_GMT_3),
#CODE FOR TIME COLUMN
time = lubridate::hms(substr(Date_Time_GMT_3, 11,
nchar(Date_Time_GMT_3))))
now that I've changed my time format to 24 hour I get this error
Warning message:
Problem with `mutate()` column `time`.
i `time = lubridate::hms(substr(Date_Time_GMT_3, 11, nchar(Date_Time_GMT_3)))`.
i Some strings failed to parse, or all strings are NAs
Any idea how to fix this?
Try sub instead of substr.
df %>% mutate(month = lubridate::month(df),
year = lubridate::year(df),
day = lubridate::day(df),
time = lubridate::hms(sub(".* ","",df)))
df month year day time
1 2021-06-01 08:00:00 6 2021 1 8H 0M 0S
2 2021-06-01 08:15:00 6 2021 1 8H 15M 0S
3 2021-06-01 08:30:00 6 2021 1 8H 30M 0S
4 2021-06-01 08:45:00 6 2021 1 8H 45M 0S
5 2021-11-25 21:46:40 11 2021 25 21H 46M 40S
6 2021-11-25 22:01:40 11 2021 25 22H 1M 40S
Data
df <- structure(list(df = structure(c(1622552400, 1622553300, 1622554200,
1622555100, 1637894800, 1637895700), class = c("POSIXct", "POSIXt"
), tzone = "EST")), row.names = c(NA, 6L), class = "data.frame")

R xts does not recognize my POSIXct dates [duplicate]

I have created a dataframe with two columns.
> head(data_frame)
Date Rainfall
1 1992-01-06 14:00:00 0.3
2 1992-01-06 15:00:00 0.2
3 1992-01-06 16:00:00 0.3
4 1992-01-06 18:00:00 0.1
5 1992-01-06 19:00:00 0.3
6 1992-01-06 20:00:00 0.8
Rainfall is numeric and Date is POSIXct.
> class(data_frame$Date)
[1] "POSIXct"
> class(data_frame$Rainfall)
[1] "numeric"
When I try to create a time series using xts function, I get the following error:
> time_series <- xts::xts(data_frame$Rainfall, order.by = data_frame$Date)
Error in xts::xts(data_frame$Rainfall, order.by = data_frame$Date) :
order.by requires an appropriate time-based object
xts should be able to handle POSIXct. I went through a similar question posted here, where the solution was to convert date into the above format. Looking at those answers, my code should work. I can't figure out why it is not.
Reproducible example:
head_data_frame = structure(list(
Date = structure(
c(
694659600,
694663200,
694666800,
694674000,
694677600,
694681200
),
class = "POSIXct"
),
Rainfall = c(0.3,
0.2, 0.3, 0.1, 0.3, 0.8)
),
row.names = c(NA, 6L),
class = "data.frame")
The class appears to be broken, did you use a package? Normally it's c("POSIXct", "POSIXt") but yours is just "POSIXt".
class(head_data_frame$Date)
# [1] "POSIXct"
Fix:
class(head_data_frame$Date) <- c("POSIXct", "POSIXt")
Test:
xts::xts(head_data_frame$Rainfall, order.by = head_data_frame$Date)
# [,1]
# 1992-01-06 02:00:00 0.3
# 1992-01-06 03:00:00 0.2
# 1992-01-06 04:00:00 0.3
# 1992-01-06 06:00:00 0.1
# 1992-01-06 07:00:00 0.3
# 1992-01-06 08:00:00 0.8
Works! :)
I can get it to work if I change the timezone to UTC.
head_data_frame$Date <- lubridate::force_tz(head_data_frame$Date, tzone = "UTC")
xts::xts(head_data_frame$Rainfall, order.by = head_data_frame$Date)
# [,1]
#1992-01-06 09:00:00 0.3
#1992-01-06 10:00:00 0.2
#1992-01-06 11:00:00 0.3
#1992-01-06 13:00:00 0.1
#1992-01-06 14:00:00 0.3
#1992-01-06 15:00:00 0.8
#Warning message:
#timezone of object (UTC) is different than current timezone ().

How to extract data from a time series based on start and end dates from a different dataframe?

I am working with water quality data and I have a list of storm events I extracted from the streamflow time series.
head(Storms)
PeakNumber PeakTime PeakHeight PeakStartTime PeakEndTime DurationHours
1 1 2019-07-21 22:15:00 81.04667 2019-07-21 21:30:00 2019-07-22 04:45:00 7.25
2 2 2019-07-22 13:45:00 66.74048 2019-07-22 13:00:00 2019-07-22 23:45:00 10.75
3 3 2019-07-11 11:30:00 49.08663 2019-07-11 10:45:00 2019-07-11 19:00:00 8.25
4 4 2019-05-29 18:45:00 37.27926 2019-05-29 18:30:00 2019-05-29 20:45:00 2.25
5 5 2019-06-27 16:30:00 33.12268 2019-06-27 16:00:00 2019-06-27 17:15:00 1.25
6 6 2019-07-11 08:15:00 31.59931 2019-07-11 07:45:00 2019-07-11 09:00:00 1.25
I would like to use these PeakStartTime and PeakEndTime points to subset my other data. The other data is 15-minute time series data in xts or data.table format (I am constantly going back and forth for various functions/plots)
> head(Nitrogen)
[,1]
2019-03-20 10:00:00 2.12306
2019-03-20 10:15:00 2.13538
2019-03-20 10:30:00 2.14180
2019-03-20 10:45:00 2.14704
2019-03-20 11:00:00 2.14464
2019-03-20 11:15:00 2.15548
So I would like to create a new dataframe for each storm that is just the Nitrogen data between those PeakStartTime and PeakEndTime points. And then hopefully loop this, so it will do so for each of the peaks in the Storms dataframe.
One option is to do the comparison on each corresponding StartTime, EndTime, and subset the data
library(xts)
do.call(rbind, Map(function(x, y) Nitrogen[paste( x, y, sep="/")],
Storms$PeakStartTime, Storms$PeakEndTime))
# [,1]
#2019-05-29 18:30:00 -0.07102752
#2019-05-29 18:45:00 -0.19454811
#2019-05-29 19:00:00 -1.69684540
#2019-05-29 19:15:00 1.09384970
#2019-05-29 19:30:00 0.20019572
#2019-05-29 19:45:00 -0.76086259
# ...
data
set.seed(24)
Nitrogen <- xts(rnorm(20000), order.by = seq(as.POSIXct('2019-03-20 10:00:00'),
length.out = 20000, by = '15 min'))
Storms <- structure(list(PeakNumber = 1:6, PeakTime = structure(c(1563761700,
1563817500, 1562859000, 1559169900, 1561667400, 1562847300), class = c("POSIXct",
"POSIXt"), tzone = ""), PeakHeight = c(81.04667, 66.74048, 49.08663,
37.27926, 33.12268, 31.59931), PeakStartTime = structure(c(1563759000,
1563814800, 1562856300, 1559169000, 1561665600, 1562845500), class = c("POSIXct",
"POSIXt"), tzone = ""), PeakEndTime = structure(c(1563785100,
1563853500, 1562886000, 1559177100, 1561670100, 1562850000), class = c("POSIXct",
"POSIXt"), tzone = ""), DurationHours = c(7.25, 10.75, 8.25,
2.25, 1.25, 1.25)), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")

How to sum or average mulitple overlapping time intervals with lubridate?

I have a log of many years of meditation sittings, each with a start and end time. I want to create nice plots of my most active times of the day. (In other words, how often relatively I am meditating at 7am versus other times of day?)
ID StartTime EndTime
1 2679 2019-03-23 07:00:00 2019-03-23 07:30:00
2 2678 2019-03-22 07:00:00 2019-03-22 07:30:00
3 2677 2019-03-21 07:00:00 2019-03-21 07:30:00
4 2676 2019-03-20 07:00:00 2019-03-20 07:30:00
5 2675 2019-03-19 07:00:00 2019-03-19 07:30:00
6 2674 2019-03-18 09:00:00 2019-03-18 09:30:00
7 2673 2019-03-18 09:00:00 2019-03-18 09:30:00
8 2672 2019-03-18 09:00:00 2019-03-18 10:00:00
9 2671 2019-03-15 07:00:00 2019-03-15 08:00:00
10 2670 2019-03-14 07:00:00 2019-03-14 08:00:00
dput version:
structure(list(ID = 2679:2670, StartTime = structure(c(1553324400,
1553238000, 1553151600, 1553065200, 1552978800, 1552899600, 1552899600,
1552899600, 1552633200, 1552546800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), EndTime = structure(c(1553326200, 1553239800,
1553153400, 1553067000, 1552980600, 1552901400, 1552901400, 1552903200,
1552636800, 1552550400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-10L), class = "data.frame")
I can hack this by turning each day into an 1440 element array of included/excluded minutes and summing these up. Getting something like this
But I feel like there must be a better way. Probably using Interval object from lubridate and/or dplyr. But I haven't worked out how to do this.

Resources