Date time am/pm in R - r

I'm having an issue with the datetime field of a timeseries:
> CO1temp[163:169,]
Date OPEN HIGH LOW CLOSE
163 7/11/2011 11:45:00 PM 116.30 116.30 116.09 116.18
164 7/11/2011 11:50:00 PM 116.16 116.78 116.13 116.70
165 7/11/2011 11:55:00 PM 116.69 116.83 116.51 116.65
166 7/12/2011 116.65 116.79 116.44 116.50
167 7/12/2011 12:05:00 AM 116.50 116.60 116.39 116.47
168 7/12/2011 12:10:00 AM 116.49 116.55 116.38 116.52
169 7/12/2011 12:15:00 AM 116.52 116.67 116.39 116.44
As you can see the midnight time (line 166) is not showing properly.
Which creates a NA when I create my xts object:
CO1 <- as.xts(CO1temp[, 2:5], order.by = as.POSIXct(CO1temp[,1],format='%m/%d/%Y %r'),frequency="5 minutes")
> CO1[163:169,]
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
<NA> 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
This later leads to more problem when I want to analyze this timeseries.
?strptime is quite specific about it:
The default for the format methods is "%Y-%m-%d %H:%M:%S" if any component has a time component which is not midnight, and "%Y-%m-%d" otherwise.
However my datetime is not in the standard format.
I would greatly appreciate any help.

This a kind of a hack but it works.
You just have to append "12:00:00 AM" to your vector of date: those which are lacking the hour information will be read correctly, and in the dates that already have the hour information it will just be ignored and only the one that was already there will be read.
CO1 <- as.xts(CO1temp[, 2:5],
order.by = as.POSIXct(paste(CO1temp$Date,"12:00:00 AM", sep=" "),
format='%m/%d/%Y %r'),
frequency="5 minutes")
CO1
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
2011-07-12 00:00:00 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
That being said, if you ended up with your dataframe as it is after using strptime then your date column is already in POSIXct format and therefore the following should work directly:
as.xts(CO1temp[, 2:5], order.by = CO1temp$Date, frequency = "5 minutes")

Related

dealing with "missing" times when setting data to xts

I have some data which looks like the following;
Dates Open Close
1000 06/06/2019 0:05 244.599 244.524
1001 06/06/2019 0:04 244.592 244.599
1002 06/06/2019 0:03 244.564 244.592
1003 06/06/2019 0:02 244.809 244.564
1004 06/06/2019 0:01 244.849 244.809
1005 06/06/2019 245.080 244.849
1006 05/06/2019 23:59 245.092 245.080
1007 05/06/2019 23:58 245.253 245.092
1008 05/06/2019 23:57 244.858 245.253
1009 05/06/2019 23:56 244.643 244.863
1010 05/06/2019 23:55 244.720 244.643
Where row 1005 doesn't have a time stamp. I try to set my dates to POSIXlt format.
data$Dates <- gsub("/", "-", data$Dates)
data$Dates <- as.POSIXlt(strptime(data$Dates, format="%d-%m-%Y %H:%M"))
Now my data looks like:
Dates Open Close
1000 2019-06-06 00:05:00 244.599 244.524
1001 2019-06-06 00:04:00 244.592 244.599
1002 2019-06-06 00:03:00 244.564 244.592
1003 2019-06-06 00:02:00 244.809 244.564
1004 2019-06-06 00:01:00 244.849 244.809
1005 <NA> 245.080 244.849
1006 2019-06-05 23:59:00 245.092 245.080
1007 2019-06-05 23:58:00 245.253 245.092
1008 2019-06-05 23:57:00 244.858 245.253
1009 2019-06-05 23:56:00 244.643 244.863
1010 2019-06-05 23:55:00 244.720 244.643
I am just wondering if there is a way around converting the times with no Hour or Minute data. It only occurs on the hour 0:00
Data:
data <- structure(list(Dates = c("06/06/2019 0:05", "06/06/2019 0:04",
"06/06/2019 0:03", "06/06/2019 0:02", "06/06/2019 0:01", "06/06/2019",
"05/06/2019 23:59", "05/06/2019 23:58", "05/06/2019 23:57", "05/06/2019 23:56",
"05/06/2019 23:55"), Open = c(244.599, 244.592, 244.564, 244.809,
244.849, 245.08, 245.092, 245.253, 244.858, 244.643, 244.72),
Close = c(244.524, 244.599, 244.592, 244.564, 244.809, 244.849,
245.08, 245.092, 245.253, 244.863, 244.643)), row.names = 1000:1010, class = "data.frame")
EDIT:
I just thought perhaps I should first split the column into two (one for dates and another for times) fill in the blank cells in the second column with 0:00 and paste back together.
parse_date_time in the lubridate package will successively check alternative formats until it succeeds if you give it a vector of formats. The separators and percent signs can be omitted from the format strings.
library(lubridate)
parse_date_time(data$Dates, c("dmYHM", "dmY"), tz = "")
giving:
[1] "2019-06-06 00:05:00 EDT" "2019-06-06 00:04:00 EDT"
[3] "2019-06-06 00:03:00 EDT" "2019-06-06 00:02:00 EDT"
[5] "2019-06-06 00:01:00 EDT" "2019-06-06 00:00:00 EDT"
[7] "2019-06-05 23:59:00 EDT" "2019-06-05 23:58:00 EDT"
[9] "2019-06-05 23:57:00 EDT" "2019-06-05 23:56:00 EDT"
[11] "2019-06-05 23:55:00 EDT"
Using dplyr, one possibility could be:
data %>%
mutate(Dates = ifelse(nchar(Dates) == 10, paste(Dates, "0:00", sep = " "), Dates),
Dates = as.POSIXct(Dates, format = "%d/%m/%Y %H:%M"))
Dates Open Close
1 2019-06-06 00:05:00 244.599 244.524
2 2019-06-06 00:04:00 244.592 244.599
3 2019-06-06 00:03:00 244.564 244.592
4 2019-06-06 00:02:00 244.809 244.564
5 2019-06-06 00:01:00 244.849 244.809
6 2019-06-06 00:00:00 245.080 244.849
7 2019-06-05 23:59:00 245.092 245.080
8 2019-06-05 23:58:00 245.253 245.092
9 2019-06-05 23:57:00 244.858 245.253
10 2019-06-05 23:56:00 244.643 244.863
11 2019-06-05 23:55:00 244.720 244.643
Here, for rows containing just the 10 characters, it combines the date with 0:00.
The same with base R:
data$Dates <- ifelse(nchar(data$Dates) == 10, paste(data$Dates, "0:00", sep = " "), data$Dates)
as.POSIXct(data$Dates, format = "%d/%m/%Y %H:%M")

Finding date based on time

I was thinking of how to find date(which does not exist in the table) based on time.
Example: Remember, I only have the time
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
I know for the fact that the start date is 2014-08-28, but how do I get the date which changes after midnight.
Expected outcome would be
9:44 2014-08-28
15:30 2014-08-28
23:48 2014-08-28
00:30 2014-08-29
05:30 2014-08-29
15:30 2014-08-29
22:00 2014-08-29
00:45 2014-08-30
Here's an example using data.table package ITime class which enables you to manipulate time (upon converting time to this class you can now subtract/add minutes/hours/etc.)
library(data.table)
time <- as.ITime(time)
Date <- as.IDate("2014-08-28") + c(0, cumsum(diff(time) < 0))
data.table(time, Date)
# time Date
# 1: 09:44:00 2014-08-28
# 2: 15:30:00 2014-08-28
# 3: 23:48:00 2014-08-28
# 4: 00:30:00 2014-08-29
# 5: 05:30:00 2014-08-29
# 6: 15:30:00 2014-08-29
# 7: 22:00:00 2014-08-29
# 8: 00:45:00 2014-08-30
Using the chron package we assume that a later time is on the same day and an earlier time is on the next day:
library(chron)
date <- as.Date("2014-08-28") + cumsum(c(0, diff(times(paste0(time, ":00"))) < 0))
data.frame(time, date)
giving:
time date
1 9:44 2014-08-28
2 15:30 2014-08-28
3 23:48 2014-08-28
4 00:30 2014-08-29
5 05:30 2014-08-29
6 15:30 2014-08-29
7 22:00 2014-08-29
8 00:45 2014-08-30
Here's one way to do it:
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
times <- sapply(strsplit(time, ":", TRUE), function(x) Reduce("+", as.numeric(x) * c(60, 1)))
as.POSIXct("2014-08-28") + times + 60*60*24*cumsum(c(F, tail(times < lag(times), -1)))
# [1] "2014-08-28 00:09:44 CEST" "2014-08-28 00:15:30 CEST" "2014-08-28 00:23:48 CEST" "2014-08-29 00:00:30 CEST" "2014-08-29 00:05:30 CEST" "2014-08-29 00:15:30 CEST" "2014-08-29 00:22:00 CEST" "2014-08-30 00:00:45 CEST"
You can concatenate system date with time and get result. For example, in Oracle we can get date with time as:
to_char(sysdate,'DD-MM-RRRR')|| ' ' || To_char(sysdate,'HH:MIAM')
This will result as eg. 12-09-2015 09:50 AM
For your requirement, use this as:
to_char(sysdate,'DD-MM-RRRR')|| ' 00:45' and so on.

Time series is throwing up error message

I am trying to conduct a time series analysis based on this dataset:
time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
It has been set as:
austriacus<-read.table("austriacus.txt",header=T).
The time series function: x.ts<-ts(POINT_X,time) is not working and is producing the following error message: Error in is.data.frame(data) : object 'POINT_X' not found
Any ideas on this?
Try the zoo and chron packages:
Lines <- "time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
"
library(zoo)
library(chron)
to.times <- function(x) times(paste0(x, ":00"))
# z <- read.zoo("myfile", header = TRUE, FUN = to.times)
z <- read.zoo(text = Lines, header = TRUE, FUN = to.times)
plot(z)

Error plotting a zoo class series data with R

I am new to the community of stackoverflow and this is the first question I ask. Please let me know if I did something wrong.
Here is the situation with my problem. I am dealing with Australian electricity prices, and my time series look like this. This is a high frequency data that is sampled every 30 minutes.
> head(t)
VIC NSW QLD SNOWY SA
1999-01-01 00:00:00 26.84 24.29 26.52 26.20 29.87
1999-01-01 00:30:00 30.52 27.64 19.34 29.74 36.01
1999-01-01 01:00:00 28.74 26.64 17.47 28.34 35.70
1999-01-01 01:30:00 27.94 25.81 17.08 27.43 31.67
1999-01-01 02:00:00 20.90 19.94 15.84 20.86 22.42
1999-01-01 02:30:00 20.26 19.48 15.68 20.28 21.38
> tail(t)
VIC NSW QLD SNOWY SA
2006-12-31 21:00:00 14.59 15.10 13.72 15.35 29.60
2006-12-31 21:30:00 14.77 15.42 14.12 15.61 28.79
2006-12-31 22:00:00 14.12 15.01 13.54 15.06 20.59
2006-12-31 22:30:00 15.15 16.19 15.10 16.21 17.44
2006-12-31 23:00:00 15.17 16.14 15.48 16.18 17.84
2006-12-31 23:30:00 16.96 17.14 16.37 17.63 20.20
> class(t)
[1] "xts" "zoo"
I was trying to come up with mean prices for each time point across a day. So I did this:
> half.hourly.means <- aggregate(t$VIC, list(format(index(t), "%H:%M")), FUN = mean)
> head(half.hourly.means)
00:00 26.99938
00:30 24.67273
01:00 21.78190
01:30 26.46662
02:00 21.27931
02:30 18.57727
> tail(half.hourly.means)
21:00 27.86881
21:30 26.65468
22:00 23.51793
22:30 25.68527
23:00 23.26385
23:30 30.01726
> class(half.hourly.means)
[1] "zoo"
This outcome is average prices of each time point across the sample period. Everything worked fine by far. But when I tried to plot it, an error occurred.
> plot(half.hourly.means)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
When I tried this
> plot(as.numeric(half.hourly.means), type = "l")
(I failed to post a image for lack of reputation, sorry)
It yields the correct plot, but with meaningless x axis values. So my question is: what could be done to produce the above graph with x axis values being "00:00", "00:30", "01:00", ... ?
Thanks for your patience!
Best regards,
Wei

Intraday high/low clustering

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30
You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Resources