subset data with timestamp irrespective of date in R - r

The Dataset
head(data)
Date OPEN
2015-11-30 10:00:00 951.15
2015-11-30 10:30:00 949.90
2015-11-30 11:00:00 943.45
2015-11-30 11:30:00 944.30
2015-11-30 12:00:00 942.00
2015-11-30 12:30:00 940.60
2015-01-01 10:00:00 951.15
2015-01-01 10:30:00 949.90
2015-01-02 10:30:00 943.45
2015-01-02 11:30:00 944.30
2015-01-03 10:00:00 943.45
2015-01-03 10:30:00 943.45
2015-01-03 11:30:00 944.30
2015-01-06 10:00:00 942.00
2015-01-06 10:30:00 940.60
2015-01-06 11:00:00 940.60
2015-01-06 11:30:00 942.00
str(data)
'data.frame': 32023 obs. of 2 variables:
$ Date : POSIXct, format: "2015-11-30 10:00:00" "2015-11-30 10:30:00" "2015-11-30 11:00:00" ...
$ OPEN : num 951 950 943 944 942 ...
Hi,
Dataframe is mentioned above. I want to extract OPEN prices with timestamps 10:00 and 10:30 for all the dates available. I only need to keep timestamps 10:00 to 10:30 in filter condition irrespective of dates. Please suggest in R.
Thanks.

We can format the 'Date' to extract the HH:MM part, use %in% to get a logical vector and subset based on that.
subset(data, format(Date, "%H:%M") %in% c("10:00", "10:30"), select="OPEN")
# OPEN
#1 951.15
#2 949.90
#7 951.15
#8 949.90
#9 943.45
#11 943.45
#12 943.45
#14 942.00
#15 940.60
If it is between those intervals
library(chron)
subset(data, between(times(format(Date, "%H:%M:%S")) ,
times("10:00:00"), times("10:30:00")))

you can use lubridate package to make a friendly subset:
library(lubridate)
res <- subset(data, minute(Date) <=30 & hour(Date) == 10)

Related

How to plot 24 hour for 365

How can I plot time series data hourly so that x-axis is 1:24. If I hav let's say one year of data so 365 days and 8000+ rows?
Tried with ggplot2 but didn't get it to work.
head looks like this
Value DateTime
1 104 2018-01-01 01:00:00
2 104 2018-01-01 02:00:00
3 108 2018-01-01 03:00:00
4 106 2018-01-01 04:00:00
5 117 2018-01-01 05:00:00
6 166 2018-01-01 06:00:00
And Tail
Value DateTime
8754 160.10 2018-12-31 19:00:00
8755 156.11 2018-12-31 20:00:00
8756 139.11 2018-12-31 21:00:00
8757 112.11 2018-12-31 22:00:00
8758 96.10 2018-12-31 23:00:00
8759 90.11 2019-01-01 00:00:00
Here is an image what I'm trying to achieve
What about having time of the day and date as seperate variables? You can use the package hms to do this.
timeOfDay <- as.hms(df$DateTime)
date <- as.Date(df$DateTime)
Now, you can use timeOfDay on the x-axis and date as your grouping aesthetics.
This works for me:
ggplot(df, aes(x = timeOfDay, y = value)) +
geom_line(aes(group = date))

R: calculate number of occurrences which have started but not ended - count if within a datetime range

I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00

Erase space in splitting - R

I have a dataframe where I splitted the datetime column by date and time (two columns). However, when I group by time it gives me duplicates in time. So, to analyze it I used table() on time column, and it gave me duplicates also. This is a sample of it:
> table(df$time)
00:00:00 00:00:00 00:15:00 00:15:00 00:30:00 00:30:00
2211 1047 2211 1047 2211 1047
As you may see, when I splitted one of the "unique" values kept a " " inside. Is there a easy way to solve this?
PS: The datatype of the time column is character.
EDIT: Code added
df$datetime <- as.character.Date(df$datetime)
x <- colsplit(df$datetime, ' ', names = c('Date','Time'))
df <- cbind(df, x)
There are a number of approaches. One of them is to use appropriate functions to extract Dates and Times from Datetime column:
df <- data.frame(datetime = seq(
from=as.POSIXct("2018-5-15 0:00", tz="UTC"),
to=as.POSIXct("2018-5-16 24:00", tz="UTC"),
by="30 min") )
head(df$datetime)
#[1] "2018-05-15 00:00:00 UTC" "2018-05-15 00:30:00 UTC" "2018-05-15 01:00:00 UTC" "2018-05-15 01:30:00 UTC"
#[5] "2018-05-15 02:00:00 UTC" "2018-05-15 02:30:00 UTC"
df$Date <- as.Date(df$datetime)
df$Time <- format(df$datetime,"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 01:30:00 2018-05-15 01:30:00
# 5 2018-05-15 02:00:00 2018-05-15 02:00:00
# 6 2018-05-15 02:30:00 2018-05-15 02:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 05:00:00 05:30:00
#3 2 2 2 2 2 2 2 2 2 2 2
#06:00:00 06:30:00 07:00:00 07:30:00 08:00:00 08:30:00 09:00:00 09:30:00 10:00:00 10:30:00 11:00:00 11:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#18:00:00 18:30:00 19:00:00 19:30:00 20:00:00 20:30:00 21:00:00 21:30:00 22:00:00 22:30:00 23:00:00 23:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#If the data were given as character strings and contain extra spaces the above approach will still work
df <- data.frame(datetime=c("2018-05-15 00:00:00","2018-05-15 00:30:00",
"2018-05-15 01:00:00", "2018-05-15 02:00:00",
"2018-05-15 00:00:00","2018-05-15 00:30:00"),
stringsAsFactors=FALSE)
df$Date <- as.Date(df$datetime)
df$Time <- format(as.POSIXct(df$datetime, tz="UTC"),"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 02:00:00 2018-05-15 02:00:00
# 5 2018-05-15 00:00:00 2018-05-15 00:00:00
# 6 2018-05-15 00:30:00 2018-05-15 00:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 02:00:00
# 2 2 1 1
reshape2::colsplit accepts regular expressions, so you could split on '\s+' which matches 1 or more whitespace characters.
You can find out more about regular expressions in R using ?base::regex. The syntax is generally constant between languages, so you can use pretty much any regex tutorial. Take a look at https://regex101.com/. This site evaluates your regular expressions in real time and shows you exactly what each part is matching. It is extremely helpful!
Keep in mind that in R, as compared to most other languages, you must double the number of backslashes \. So \s (to match 1 whitespace character) must be written as \\s in R.

Populating missing Date and Time in time-series data in R, with zoo package

I have a quarter- hour (15 min interval) frequency data.
sasan<-read.csv("sasanhz.csv", header = TRUE)
head(sasan)
Timestamp Avg.Hz
1 12/27/2017 12:15:00 AM 50.05
2 12/27/2017 12:30:00 AM 49.99
3 12/27/2017 12:45:00 AM 49.98
4 12/27/2017 01:00:00 AM 50.01
5 12/27/2017 01:15:00 AM 49.97
6 12/27/2017 01:30:00 AM 49.98
str(sasan)
'data.frame': 5501 obs. of 2 variables:
$ Timestamp: Factor w/ 5501 levels "01/01/2018 00:00:00 AM",..: 5112 5114 5116 5023 5025
5027 5029 5031 5033 5035 ...
$ Avg.Hz : num 50 50 50 50 50 ...
#change to posixct
sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")
Here in this time-series I have some missing data-time in the coloum "Timestamp" I want to impute the missing date-time.
I have tried with zoo.
z<-zoo(sasan)
> head(z[1489:1497])
Timestamp Avg.Hz
1489 2018-01-11 12:15:00 50.02
1490 2018-01-11 12:30:00 49.99
1491 2018-01-11 12:45:00 49.94
1492 <NA> 49.98
1493 <NA> 50.02
1494 <NA> 49.95
While imputing NA value of dates and time with "na.locf" function in zoo package I am getting following error.
sasan_mis<-seq(start(z), end(z), by = times("00:15:00"))
> na.locf(z, xout = sasan_mis)
Error in approx(x[!na], y[!na], xout, ...) : zero non-NA points
In addition: Warning message:
In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
How to overcome this error? How can I impute this missing date-time? Appreciate your suggestion.
dput(head(z))
structure(c("2017-12-27 00:15:00", "2017-12-27 00:30:00", "2017-12-27 00:45:00",
"2017-12-27 01:00:00", "2017-12-27 01:15:00", "2017-12-27 01:30:00",
"50.05", "49.99", "49.98", "50.01", "49.97", "49.98"), .Dim = c(6L,
2L), .Dimnames = list(NULL, c("Timestamp", "Avg.Hz")), index = 1:6, class = "zoo")
The library package I have used are
library(ggplot2)
library(forecast)
library(tseries)
library(xts)
library(zoo)
library(dplyr)
Assuming that OP have got missing values of Timestamp variables in data and looking for a way to populate it.
na.approx from zoo package comes very handy in such cases.
# na.approx from zoo to populate missing values of Timestamp
sasan$Timestamp <- as.POSIXct(na.approx(sasan$Timestamp), origin = "1970-1-1")
sasan
# 1 2017-12-27 00:15:00 50.05
# 2 2017-12-27 00:30:00 49.99
# 3 2017-12-27 00:45:00 49.98
# 4 2017-12-27 01:00:00 50.01
# 5 2017-12-27 01:15:00 49.97
# 6 2017-12-27 01:30:00 49.98
# 7 2017-12-27 01:45:00 49.98
# 8 2017-12-27 02:00:00 50.02
# 9 2017-12-27 02:15:00 49.95
# 10 2017-12-27 02:30:00 49.98
Data
# OP's data has been slightly modified to include NAs
sasan <- read.table(text =
"Timestamp Avg.Hz
1 '12/27/2017 12:15:00 AM' 50.05
2 '12/27/2017 12:30:00 AM' 49.99
3 '12/27/2017 12:45:00 AM' 49.98
4 '12/27/2017 01:00:00 AM' 50.01
5 '12/27/2017 01:15:00 AM' 49.97
6 '12/27/2017 01:30:00 AM' 49.98
7 <NA> 49.98
8 <NA> 50.02
9 <NA> 49.95
10 '12/27/2017 02:30:00 AM' 49.98",
header = TRUE, stringsAsFactors = FALSE)
# convert to POSIXct
sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")

as.POSIXct gives inexplicable NA value [duplicate]

This question already has answers here:
How do I clear an NA flag for a posix value?
(3 answers)
Closed 5 years ago.
I have a large dataset (21683 records) and I've managed to combine date and time to datetime in a correct way using asPOSIXct. Nevertheless, this did not work for 6 records (17463:17468). This is the dataset I'm using:
> head(solar.angle)
Date Time sol.elev.angle ID Datetime
1 2016-11-24 15:00:00 41.32397 1 2016-11-24 15:00:00
2 2016-11-24 15:10:00 39.11225 2 2016-11-24 15:10:00
3 2016-11-24 15:20:00 36.88180 3 2016-11-24 15:20:00
4 2016-11-24 15:30:00 34.63507 4 2016-11-24 15:30:00
5 2016-11-24 15:40:00 32.37418 5 2016-11-24 15:40:00
6 2016-11-24 15:50:00 30.10096 6 2016-11-24 15:50:00
> solar.angle[17460:17470,]
Date Time sol.elev.angle ID Datetime
17488 2017-03-26 01:30:00 -72.01821 17460 2017-03-26 01:30:00
17489 2017-03-26 01:40:00 -69.53832 17461 2017-03-26 01:40:00
17490 2017-03-26 01:50:00 -67.05409 17462 2017-03-26 01:50:00
17491 2017-03-26 02:00:00 -64.56682 17463 <NA>
17492 2017-03-26 02:10:00 -62.07730 17464 <NA>
17493 2017-03-26 02:20:00 -59.58609 17465 <NA>
17494 2017-03-26 02:30:00 -57.09359 17466 <NA>
17495 2017-03-26 02:40:00 -54.60006 17467 <NA>
17496 2017-03-26 02:50:00 -52.10572 17468 <NA>
17497 2017-03-26 03:00:00 -49.61071 17469 2017-03-26 03:00:00
17498 2017-03-26 03:10:00 -47.11515 17470 2017-03-26 03:10:00
This is the code I'm using:
solar.angle$Datetime <- as.POSIXct(paste(solar.angle$Date,solar.angle$Time), format="%Y-%m-%d %H:%M:%S")
I've already tried to fill them in manually but this did not make any difference:
> solar.angle$Datetime[17463] <- as.POSIXct('2017-03-26 02:00:00', format = "%Y-%m-%d %H:%M:%S")
> solar.angle$Datetime[17463]
[1] NA
Any help will be appreciated!
The problem here is that this is the time you switch to summer time, so you need to specify the time zone, otherwise there is ambiguity.
If you specify a time zone, it will work:
as.POSIXct('2017-03-26 02:00:00', format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
Which returns:
"2017-03-26 02:00:00 GMT"
You can check ?timezones for more information.

Resources