I have a log of many years of meditation sittings, each with a start and end time. I want to create nice plots of my most active times of the day. (In other words, how often relatively I am meditating at 7am versus other times of day?)
ID StartTime EndTime
1 2679 2019-03-23 07:00:00 2019-03-23 07:30:00
2 2678 2019-03-22 07:00:00 2019-03-22 07:30:00
3 2677 2019-03-21 07:00:00 2019-03-21 07:30:00
4 2676 2019-03-20 07:00:00 2019-03-20 07:30:00
5 2675 2019-03-19 07:00:00 2019-03-19 07:30:00
6 2674 2019-03-18 09:00:00 2019-03-18 09:30:00
7 2673 2019-03-18 09:00:00 2019-03-18 09:30:00
8 2672 2019-03-18 09:00:00 2019-03-18 10:00:00
9 2671 2019-03-15 07:00:00 2019-03-15 08:00:00
10 2670 2019-03-14 07:00:00 2019-03-14 08:00:00
dput version:
structure(list(ID = 2679:2670, StartTime = structure(c(1553324400,
1553238000, 1553151600, 1553065200, 1552978800, 1552899600, 1552899600,
1552899600, 1552633200, 1552546800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), EndTime = structure(c(1553326200, 1553239800,
1553153400, 1553067000, 1552980600, 1552901400, 1552901400, 1552903200,
1552636800, 1552550400), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-10L), class = "data.frame")
I can hack this by turning each day into an 1440 element array of included/excluded minutes and summing these up. Getting something like this
But I feel like there must be a better way. Probably using Interval object from lubridate and/or dplyr. But I haven't worked out how to do this.
Related
I have two columns with DateTime (e.g., 2021-03-01 01:30:26) and Temperature from start of March to end of May, however the time is all in UTC+1, and I need the DateTime to be converted to CEST (summertime; UTC+2) at the appropriate time shift (which is October 28th 2 AM turns into 3 AM).
Anybody know how to code for this in R?
Example data:
#dput(head(Re1292))
Re1292 <- structure(list(DateTime = structure(c(1615899000, 1615899600, 1615900200, 1615900800, 1615901400, 1615902000),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Temp = c(5.9, 5.1, 4.9, 4.9, 4.9, 4.9)),
row.names = c(NA, 6L), class = "data.frame")
# DateTime Temp
# 1 2021-03-16 12:50:00 5.9
# 2 2021-03-16 13:00:00 5.1
# 3 2021-03-16 13:10:00 4.9
# 4 2021-03-16 13:20:00 4.9
# 5 2021-03-16 13:30:00 4.9
# 6 2021-03-16 13:40:00 4.9
I found out that adding tz = Copenhagen within lubridate fixed this problem!
E.g.
mutate(DateTime=lubridate::ymd_hms(DateTime, tz="Europe/Copenhagen"))
I have a hms data for appointment dates. Here is the sample data:
library(hms)
library(dplyr)
dput(SampleData[1:5,])
structure(list(AppointmentDate = structure(c(1634097600, 1600228800,
1603080000, 1604552400, 1606107600), class = c("POSIXct", "POSIXt"
), tzone = ""), AppointmentTime = structure(c(5400, 28800, 28800,
28800, 28800), class = c("hms", "difftime"), units = "secs"),
CheckinTime = structure(c(48060, 29460, 28740, 28800, 29160
), class = c("hms", "difftime"), units = "secs")), row.names = c(NA,
5L), class = "data.frame")
There is a typo in the AppointmentTime (it is obvious from the CheckinTime). I want to detect and change it using ifelse and mutate in R. I tried using the following code but, it turns AppointmentTime column to numbers.
RawData %>%
mutate(AppointmentTime = ifelse(AppointmentTime == parse_hms("01:30:00"), parse_hms("13:30:00"), as_hms(AppointmentTime)))
I want the following result.
AppointmentDate AppointmentTime CheckinTime
1 2021-10-13 13:30:00 13:21:00
2 2020-09-16 08:00:00 08:11:00
3 2020-10-19 08:00:00 07:59:00
4 2020-11-05 08:00:00 08:00:00
5 2020-11-23 08:00:00 08:06:00
It is a known issue of type coersion with ifelse as it coerces to its storage mode which is numeric
> str(parse_hms("01:30:00"))
'hms' num 01:30:00
- attr(*, "units")= chr "secs"
> mode(parse_hms('01:30:00'))
[1] "numeric"
> as.numeric(parse_hms("01:30:00"))
[1] 5400
use case_when or if_else which would check the types and return accordingly
library(hms)
library(dplyr)
RawData %>%
mutate(AppointmentTime = if_else(AppointmentTime == parse_hms("01:30:00"),
parse_hms("13:30:00"), as_hms(AppointmentTime)))
-output
AppointmentDate AppointmentTime CheckinTime
1 2021-10-13 13:30:00 13:21:00
2 2020-09-16 08:00:00 08:11:00
3 2020-10-19 08:00:00 07:59:00
4 2020-11-05 08:00:00 08:00:00
5 2020-11-23 08:00:00 08:06:00
I have a dataframe that I need to add a column of datetime to. It is recording water levels every hour for 2 years. The original data frame has the wrong dates and times. i.e. the dates say 2015 instead of 2020. The date and month are also wrong. I do not know the original start date and time. However, I know the date and time of the very last recording (28-03-2022 14:00:00). I need to calculate a column from the bottom to the top to figure out the original start date.
Current Code
I have this code which populates the dates from a known start date (i.e. top down), but I want to population the data from down up. Is these a way to alter this or another solution??
# recalculate date to correct date
# set start dates
startDate5 <- as.POSIXct("2020-03-05 17:00:00")
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
# find length of dataframe to populate required rows.
len5 <- max(dataList$`HMB 5`$Rec)
lenMere <- max(dataList$`HM SSSI 4`$Rec)
# calculate new date column
dataList$`HMB 5`$DateTimeNew <- seq(startDate5, by='hour', length.out=len5)
dataList$`HM SSSI 4`$DateTimeNew <-seq(startDateMere, by='hour', length.out=lenMere)
Current dataframe - top 10 rows
structure(list(Rec = 1:10, DateTime = structure(c(1436202000,
1436205600, 1436209200, 1436212800, 1436216400, 1436220000, 1436223600,
1436227200, 1436230800, 1436234400), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Temperature = c(16.59, 16.49, 16.74, 17.14,
17.47, 17.71, 18.43, 18.78, 19.06, 19.18), Pressure = c(1050.64,
1050.86, 1051.28, 1051.56, 1051.48, 1051.2, 1051.12, 1050.83,
1050.83, 1050.76), DateTimeNew = structure(c(1594051200L, 1594054800L,
1594058400L, 1594062000L, 1594065600L, 1594069200L, 1594072800L,
1594076400L, 1594080000L, 1594083600L), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, 10L), class = "data.frame")
Desired Output
This is what the desired output looks like: The date I know is correct for example is '2020-07-07 02:00:00' (e.g. value in 10th row, final column). And I need to figure out the rest of the column from this value.
NB: I do not actually know what the original start date is (2020-07-06 17:00:00) should be. Its just illustrative.
Here's a sequence method:
startDateMere <- as.POSIXct("2020-07-06 17:00:00")
new_date = seq(startDateMere, length.out = nrow(data), by = "-1 hour")
data$result = rev(new_date)
data
# Rec DateTime Temperature Pressure DateTimeNew result
# 1 1 2015-07-06 17:00:00 16.59 1050.64 2020-07-06 12:00:00 2020-07-06 08:00:00
# 2 2 2015-07-06 18:00:00 16.49 1050.86 2020-07-06 13:00:00 2020-07-06 09:00:00
# 3 3 2015-07-06 19:00:00 16.74 1051.28 2020-07-06 14:00:00 2020-07-06 10:00:00
# 4 4 2015-07-06 20:00:00 17.14 1051.56 2020-07-06 15:00:00 2020-07-06 11:00:00
# 5 5 2015-07-06 21:00:00 17.47 1051.48 2020-07-06 16:00:00 2020-07-06 12:00:00
# 6 6 2015-07-06 22:00:00 17.71 1051.20 2020-07-06 17:00:00 2020-07-06 13:00:00
# 7 7 2015-07-06 23:00:00 18.43 1051.12 2020-07-06 18:00:00 2020-07-06 14:00:00
# 8 8 2015-07-07 00:00:00 18.78 1050.83 2020-07-06 19:00:00 2020-07-06 15:00:00
# 9 9 2015-07-07 01:00:00 19.06 1050.83 2020-07-06 20:00:00 2020-07-06 16:00:00
# 10 10 2015-07-07 02:00:00 19.18 1050.76 2020-07-06 21:00:00 2020-07-06 17:00:00
I want to have a date variable and a time variable as one variable like this 2012-05-02 07:30
This code does the job, but I need to get a new combined variable into the data frame, and this code shows it only in the console
as.POSIXct(paste(data$Date, data$Time), format="%Y-%m-%d %H:%M")
This code is supposed to combine time and date, but seemingly doesn't do that. In the column "Combined" only the date appears
data$Combined = as.POSIXct(paste0(data$Date,data$Time))
Here's the data
structure(list(Date = structure(c(17341, 18198, 17207, 17023,
17508, 17406, 18157, 17931, 17936, 18344), class = "Date"), Time = c("08:40",
"10:00", "22:10", "18:00", "08:00", "04:30", "20:00", "15:40",
"11:00", "07:00")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
We could use ymd_hm function from lubridate package:
library(lubridate)
df$Date_time <- ymd_hm(paste0(df$Date, df$Time))
Date Time Date_time
<date> <chr> <dttm>
1 2017-06-24 08:40 2017-06-24 08:40:00
2 2019-10-29 10:00 2019-10-29 10:00:00
3 2017-02-10 22:10 2017-02-10 22:10:00
4 2016-08-10 18:00 2016-08-10 18:00:00
5 2017-12-08 08:00 2017-12-08 08:00:00
6 2017-08-28 04:30 2017-08-28 04:30:00
7 2019-09-18 20:00 2019-09-18 20:00:00
8 2019-02-04 15:40 2019-02-04 15:40:00
9 2019-02-09 11:00 2019-02-09 11:00:00
10 2020-03-23 07:00 2020-03-23 07:00:00
I am working with water quality data and I have a list of storm events I extracted from the streamflow time series.
head(Storms)
PeakNumber PeakTime PeakHeight PeakStartTime PeakEndTime DurationHours
1 1 2019-07-21 22:15:00 81.04667 2019-07-21 21:30:00 2019-07-22 04:45:00 7.25
2 2 2019-07-22 13:45:00 66.74048 2019-07-22 13:00:00 2019-07-22 23:45:00 10.75
3 3 2019-07-11 11:30:00 49.08663 2019-07-11 10:45:00 2019-07-11 19:00:00 8.25
4 4 2019-05-29 18:45:00 37.27926 2019-05-29 18:30:00 2019-05-29 20:45:00 2.25
5 5 2019-06-27 16:30:00 33.12268 2019-06-27 16:00:00 2019-06-27 17:15:00 1.25
6 6 2019-07-11 08:15:00 31.59931 2019-07-11 07:45:00 2019-07-11 09:00:00 1.25
I would like to use these PeakStartTime and PeakEndTime points to subset my other data. The other data is 15-minute time series data in xts or data.table format (I am constantly going back and forth for various functions/plots)
> head(Nitrogen)
[,1]
2019-03-20 10:00:00 2.12306
2019-03-20 10:15:00 2.13538
2019-03-20 10:30:00 2.14180
2019-03-20 10:45:00 2.14704
2019-03-20 11:00:00 2.14464
2019-03-20 11:15:00 2.15548
So I would like to create a new dataframe for each storm that is just the Nitrogen data between those PeakStartTime and PeakEndTime points. And then hopefully loop this, so it will do so for each of the peaks in the Storms dataframe.
One option is to do the comparison on each corresponding StartTime, EndTime, and subset the data
library(xts)
do.call(rbind, Map(function(x, y) Nitrogen[paste( x, y, sep="/")],
Storms$PeakStartTime, Storms$PeakEndTime))
# [,1]
#2019-05-29 18:30:00 -0.07102752
#2019-05-29 18:45:00 -0.19454811
#2019-05-29 19:00:00 -1.69684540
#2019-05-29 19:15:00 1.09384970
#2019-05-29 19:30:00 0.20019572
#2019-05-29 19:45:00 -0.76086259
# ...
data
set.seed(24)
Nitrogen <- xts(rnorm(20000), order.by = seq(as.POSIXct('2019-03-20 10:00:00'),
length.out = 20000, by = '15 min'))
Storms <- structure(list(PeakNumber = 1:6, PeakTime = structure(c(1563761700,
1563817500, 1562859000, 1559169900, 1561667400, 1562847300), class = c("POSIXct",
"POSIXt"), tzone = ""), PeakHeight = c(81.04667, 66.74048, 49.08663,
37.27926, 33.12268, 31.59931), PeakStartTime = structure(c(1563759000,
1563814800, 1562856300, 1559169000, 1561665600, 1562845500), class = c("POSIXct",
"POSIXt"), tzone = ""), PeakEndTime = structure(c(1563785100,
1563853500, 1562886000, 1559177100, 1561670100, 1562850000), class = c("POSIXct",
"POSIXt"), tzone = ""), DurationHours = c(7.25, 10.75, 8.25,
2.25, 1.25, 1.25)), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")