dealing with "missing" times when setting data to xts - r

I have some data which looks like the following;
Dates Open Close
1000 06/06/2019 0:05 244.599 244.524
1001 06/06/2019 0:04 244.592 244.599
1002 06/06/2019 0:03 244.564 244.592
1003 06/06/2019 0:02 244.809 244.564
1004 06/06/2019 0:01 244.849 244.809
1005 06/06/2019 245.080 244.849
1006 05/06/2019 23:59 245.092 245.080
1007 05/06/2019 23:58 245.253 245.092
1008 05/06/2019 23:57 244.858 245.253
1009 05/06/2019 23:56 244.643 244.863
1010 05/06/2019 23:55 244.720 244.643
Where row 1005 doesn't have a time stamp. I try to set my dates to POSIXlt format.
data$Dates <- gsub("/", "-", data$Dates)
data$Dates <- as.POSIXlt(strptime(data$Dates, format="%d-%m-%Y %H:%M"))
Now my data looks like:
Dates Open Close
1000 2019-06-06 00:05:00 244.599 244.524
1001 2019-06-06 00:04:00 244.592 244.599
1002 2019-06-06 00:03:00 244.564 244.592
1003 2019-06-06 00:02:00 244.809 244.564
1004 2019-06-06 00:01:00 244.849 244.809
1005 <NA> 245.080 244.849
1006 2019-06-05 23:59:00 245.092 245.080
1007 2019-06-05 23:58:00 245.253 245.092
1008 2019-06-05 23:57:00 244.858 245.253
1009 2019-06-05 23:56:00 244.643 244.863
1010 2019-06-05 23:55:00 244.720 244.643
I am just wondering if there is a way around converting the times with no Hour or Minute data. It only occurs on the hour 0:00
Data:
data <- structure(list(Dates = c("06/06/2019 0:05", "06/06/2019 0:04",
"06/06/2019 0:03", "06/06/2019 0:02", "06/06/2019 0:01", "06/06/2019",
"05/06/2019 23:59", "05/06/2019 23:58", "05/06/2019 23:57", "05/06/2019 23:56",
"05/06/2019 23:55"), Open = c(244.599, 244.592, 244.564, 244.809,
244.849, 245.08, 245.092, 245.253, 244.858, 244.643, 244.72),
Close = c(244.524, 244.599, 244.592, 244.564, 244.809, 244.849,
245.08, 245.092, 245.253, 244.863, 244.643)), row.names = 1000:1010, class = "data.frame")
EDIT:
I just thought perhaps I should first split the column into two (one for dates and another for times) fill in the blank cells in the second column with 0:00 and paste back together.

parse_date_time in the lubridate package will successively check alternative formats until it succeeds if you give it a vector of formats. The separators and percent signs can be omitted from the format strings.
library(lubridate)
parse_date_time(data$Dates, c("dmYHM", "dmY"), tz = "")
giving:
[1] "2019-06-06 00:05:00 EDT" "2019-06-06 00:04:00 EDT"
[3] "2019-06-06 00:03:00 EDT" "2019-06-06 00:02:00 EDT"
[5] "2019-06-06 00:01:00 EDT" "2019-06-06 00:00:00 EDT"
[7] "2019-06-05 23:59:00 EDT" "2019-06-05 23:58:00 EDT"
[9] "2019-06-05 23:57:00 EDT" "2019-06-05 23:56:00 EDT"
[11] "2019-06-05 23:55:00 EDT"

Using dplyr, one possibility could be:
data %>%
mutate(Dates = ifelse(nchar(Dates) == 10, paste(Dates, "0:00", sep = " "), Dates),
Dates = as.POSIXct(Dates, format = "%d/%m/%Y %H:%M"))
Dates Open Close
1 2019-06-06 00:05:00 244.599 244.524
2 2019-06-06 00:04:00 244.592 244.599
3 2019-06-06 00:03:00 244.564 244.592
4 2019-06-06 00:02:00 244.809 244.564
5 2019-06-06 00:01:00 244.849 244.809
6 2019-06-06 00:00:00 245.080 244.849
7 2019-06-05 23:59:00 245.092 245.080
8 2019-06-05 23:58:00 245.253 245.092
9 2019-06-05 23:57:00 244.858 245.253
10 2019-06-05 23:56:00 244.643 244.863
11 2019-06-05 23:55:00 244.720 244.643
Here, for rows containing just the 10 characters, it combines the date with 0:00.
The same with base R:
data$Dates <- ifelse(nchar(data$Dates) == 10, paste(data$Dates, "0:00", sep = " "), data$Dates)
as.POSIXct(data$Dates, format = "%d/%m/%Y %H:%M")

Related

min and max over time range on each day of xts

I have an xts object with intraday OHLC price data over several years. I'd like to be able to write a function that calculates the min and max value between 04:00:00 and 05:00:00 every day and include that as a column in the xts object. Im not really familiar with manipulating xts objects. Can anyone point me in the right direction? Here's a head of the xts object.
Open High Low Close Volume
2017-01-01 00:00:00 968.29 968.76 966.74 966.97 106562
2017-01-01 00:05:00 966.97 967.00 966.89 966.89 13731
2017-01-01 00:10:00 966.89 966.89 964.86 964.86 124137
2017-01-01 00:15:00 964.86 964.99 964.80 964.80 3001
2017-01-01 00:20:00 964.80 964.80 964.80 964.80 0
2017-01-01 00:25:00 964.80 965.09 964.54 964.91 48000
2017-01-01 00:30:00 964.91 965.01 964.91 965.01 2501
2017-01-01 00:35:00 965.01 967.82 965.57 967.82 71501
2017-01-01 00:40:00 967.82 967.82 967.08 967.08 50
2017-01-01 00:45:00 967.08 967.40 967.40 967.40 50
2017-01-01 00:50:00 967.40 968.08 967.40 968.08 14000
2017-01-01 00:55:00 968.08 968.08 966.89 968.00 1008
2017-01-01 01:00:00 968.00 968.10 968.00 968.10 1002
2017-01-01 01:05:00 968.10 968.10 967.62 967.62 5200
2017-01-01 01:10:00 967.62 967.70 966.29 966.29 35476
2017-01-01 01:15:00 966.29 966.29 966.28 966.28 3068
2017-01-01 01:20:00 966.28 966.66 965.00 965.00 30471
2017-01-01 01:25:00 965.00 965.01 964.00 964.00 77884
2017-01-01 01:30:00 964.00 964.76 964.76 964.76 500
2017-01-01 01:35:00 964.76 967.48 964.69 965.00 134129
2017-01-01 01:40:00 965.00 965.00 963.67 963.67 59676
2017-01-01 01:45:00 963.67 963.67 963.67 963.67 0
2017-01-01 01:50:00 963.67 964.56 963.66 964.55 5531
2017-01-01 01:55:00 964.55 963.43 963.40 963.40 3000
2017-01-01 02:00:00 963.40 964.60 963.40 964.60 1301
2017-01-01 02:05:00 964.60 964.60 964.60 964.60 0
2017-01-01 02:10:00 964.60 964.60 964.00 964.11 49954
2017-01-01 02:15:00 964.11 964.60 964.59 964.60 5000
2017-01-01 02:20:00 964.60 964.60 964.60 964.60 0
2017-01-01 02:25:00 964.60 964.60 964.51 964.51 2000
2017-01-01 02:30:00 964.51 964.51 964.51 964.51 0
2017-01-01 02:35:00 964.51 964.51 963.23 963.99 16667
2017-01-01 02:40:00 963.99 963.99 963.65 963.66 10000
2017-01-01 02:45:00 963.66 964.26 963.16 964.26 75500
2017-01-01 02:50:00 964.26 964.26 964.26 964.26 0
2017-01-01 02:55:00 964.26 964.26 964.26 964.26 0
2017-01-01 03:00:00 964.26 964.61 963.98 964.61 13000
2017-01-01 03:05:00 964.61 964.61 964.61 964.61 0
2017-01-01 03:10:00 964.61 964.61 964.61 964.61 0
2017-01-01 03:15:00 964.61 964.61 964.61 964.61 0
2017-01-01 03:20:00 964.61 964.82 964.48 964.82 16666
2017-01-01 03:25:00 964.82 965.00 963.99 964.97 50500
2017-01-01 03:30:00 964.97 964.97 964.02 964.02 56000
2017-01-01 03:35:00 964.02 964.29 964.29 964.29 500
2017-01-01 03:40:00 964.29 963.53 963.52 963.52 24000
2017-01-01 03:45:00 963.52 963.52 963.43 963.43 16500
2017-01-01 03:50:00 963.43 963.67 963.42 963.42 25002
2017-01-01 03:55:00 963.42 963.42 961.69 961.69 84507
2017-01-01 04:00:00 961.69 961.69 960.90 960.93 57909
2017-01-01 04:05:00 960.93 960.93 960.93 960.93 0
2017-01-01 04:10:00 960.93 961.19 961.19 961.19 400
2017-01-01 04:15:00 961.19 962.09 961.19 962.09 7001
2017-01-01 04:20:00 962.09 962.09 962.09 962.09 0
2017-01-01 04:25:00 962.09 962.10 961.14 961.14 32000
2017-01-01 04:30:00 961.14 961.14 960.93 960.93 41900
2017-01-01 04:35:00 960.93 961.94 960.93 961.64 640
2017-01-01 04:40:00 961.64 961.71 961.64 961.71 1
2017-01-01 04:45:00 961.71 962.00 961.90 961.99 5499
2017-01-01 04:50:00 961.99 961.99 961.99 961.99 0
2017-01-01 04:55:00 961.99 961.99 961.99 961.99 1
2017-01-01 05:00:00 961.99 961.99 961.99 961.99 0
2017-01-01 05:05:00 961.99 961.99 961.99 961.99 40
2017-01-01 05:10:00 961.99 961.99 961.99 961.99 0
2017-01-01 05:15:00 961.99 961.99 961.99 961.99 0
2017-01-01 05:20:00 961.99 961.99 961.99 961.99 0
2017-01-01 05:25:00 961.99 961.99 961.99 961.99 0
2017-01-01 05:30:00 961.99 962.10 961.99 962.10 1382
2017-01-01 05:35:00 962.10 968.84 962.10 968.84 122909
2017-01-01 05:40:00 968.84 968.86 963.78 965.53 161263
2017-01-01 05:45:00 965.53 964.81 963.11 963.81 18021
2017-01-01 05:50:00 963.81 964.39 963.85 964.39 40006
2017-01-01 05:55:00 964.39 964.47 964.00 964.47 39966
2017-01-01 06:00:00 964.47 964.47 964.47 964.47 0
You can do this by filtering on the hours of the index and then using period.max and period.min functions. The values will be put in last record of the chosen hour. See example below with intraday data of MSFT, max and min values for between 15:00 and 16:00.
library(xts)
# max of high values between 15 and 16. (excluding 16:00)
msft$max <- period.max(msft$high[.indexhour(msft) == 15], endpoints(msft$high[.indexhour(msft) == 15], on = "hour"))
# min of low values between 15 and 16. (excluding 16:00)
msft$min <- period.min(msft$low[.indexhour(msft) == 15], endpoints(msft$low[.indexhour(msft) == 15], on = "hour"))
head(msft[8:24], 16)
open high low close volume max min
2020-01-23 14:50:00 166.180 166.2300 166.090 166.1050 87934 NA NA
2020-01-23 14:55:00 166.105 166.2200 166.103 166.1700 92280 NA NA
2020-01-23 15:00:00 166.160 166.3500 166.160 166.3400 114359 NA NA
2020-01-23 15:05:00 166.335 166.3400 166.285 166.2850 102633 NA NA
2020-01-23 15:10:00 166.290 166.3050 166.170 166.2550 125558 NA NA
2020-01-23 15:15:00 166.250 166.2750 166.210 166.2400 103938 NA NA
2020-01-23 15:20:00 166.230 166.2500 166.180 166.2350 99649 NA NA
2020-01-23 15:25:00 166.240 166.3000 166.225 166.2850 93846 NA NA
2020-01-23 15:30:00 166.270 166.4164 166.175 166.3600 183154 NA NA
2020-01-23 15:35:00 166.360 166.5000 166.320 166.4600 177178 NA NA
2020-01-23 15:40:00 166.450 166.4650 166.380 166.3800 112174 NA NA
2020-01-23 15:45:00 166.385 166.4050 166.290 166.3875 152806 NA NA
2020-01-23 15:50:00 166.382 166.5200 166.362 166.4500 205667 NA NA
2020-01-23 15:55:00 166.450 166.6900 166.305 166.6700 508469 166.69 166.16
2020-01-23 16:00:00 166.660 166.7200 166.589 166.7200 934090 NA NA
2020-01-24 09:35:00 167.510 167.5300 166.890 166.8918 1152646 NA NA
data:
msft <- structure(c(166.224, 166.29, 166.29, 166.2456, 166.165, 166.1446,
166.1601, 166.18, 166.105, 166.16, 166.335, 166.29, 166.25, 166.23,
166.24, 166.27, 166.36, 166.45, 166.385, 166.382, 166.45, 166.66,
167.51, 167.03, 167.265, 167.325, 167.37, 167.16, 167.405, 167.35,
167.31, 167.39, 167.17, 167.1, 166.845, 167.03, 167.1223, 167.125,
167.21, 167.34, 167.235, 167.3, 167.37, 167.1977, 166.9814, 166.8499,
166.99, 166.93, 166.83, 166.64, 166.775, 166.85, 166.71, 166.6838,
166.46, 166.35, 165.765, 166.2269, 166.01, 166.19, 166.13, 166.31,
166.36, 166.42, 166.3682, 165.99, 166.1328, 165.85, 165.74, 165.8439,
165.655, 165.5434, 165.47, 165.3227, 165.0627, 165.03, 165.2546,
165.14, 165.1, 164.91, 164.75, 164.65, 164.53, 164.81, 164.8979,
164.6, 164.89, 164.94, 165.03, 165.12, 165.17, 165.24, 165.4,
165.335, 165.2734, 164.985, 164.9, 164.61, 164.93, 165.18, 166.315,
166.29, 166.3, 166.265, 166.22, 166.2201, 166.2, 166.23, 166.22,
166.35, 166.34, 166.305, 166.275, 166.25, 166.3, 166.4164, 166.5,
166.465, 166.405, 166.52, 166.69, 166.72, 167.53, 167.34, 167.39,
167.495, 167.47, 167.48, 167.4251, 167.3699, 167.42, 167.41,
167.2, 167.1, 167.03, 167.21, 167.23, 167.255, 167.35, 167.35,
167.33, 167.405, 167.38, 167.25, 167.01, 167, 167.02, 167.02,
166.8384, 166.9056, 166.86, 166.94, 166.75, 166.6844, 166.47,
166.42, 166.22, 166.4049, 166.221, 166.2003, 166.3749, 166.3999,
166.43, 166.43, 166.375, 166.175, 166.16, 165.96, 165.93, 165.86,
165.671, 165.64, 165.49, 165.4, 165.08, 165.27, 165.26, 165.34,
165.12, 165, 164.825, 164.765, 164.82, 164.89, 165, 164.89, 164.99,
165.041, 165.293, 165.23, 165.27, 165.44, 165.6046, 165.37, 165.295,
165.18, 164.93, 164.945, 165.185, 165.24, 166.22, 166.225, 166.25,
166.15, 166.145, 166.13, 166.1015, 166.09, 166.103, 166.16, 166.285,
166.17, 166.21, 166.18, 166.225, 166.175, 166.32, 166.38, 166.29,
166.362, 166.305, 166.589, 166.89, 167.03, 167.22, 167.32, 167.225,
167.16, 167.2, 167.23, 167.2801, 167.145, 167.05, 166.84, 166.77,
167, 167.1, 167.02, 167.18, 167.23, 167.223, 167.28, 167.1843,
166.862, 166.85, 166.821, 166.8121, 166.85, 166.55, 166.6303,
166.69, 166.7, 166.54, 166.4, 166.31, 165.76, 165.74, 165.8966,
165.91, 166.07, 166.09, 166.171, 166.32, 166.22, 165.96, 165.97,
165.82, 165.73, 165.72, 165.64, 165.49, 165.45, 165.32, 165.045,
164.89, 164.91, 165.09, 165.1, 164.91, 164.74, 164.53, 164.529,
164.53, 164.735, 164.59, 164.54, 164.88, 164.938, 165.01, 165.0792,
165.12, 165.22, 165.335, 165.263, 164.88, 164.89, 164.58, 164.58,
164.87, 164.87, 166.29, 166.275, 166.26, 166.16, 166.145, 166.155,
166.19, 166.105, 166.17, 166.34, 166.285, 166.255, 166.24, 166.235,
166.285, 166.36, 166.46, 166.38, 166.3875, 166.45, 166.67, 166.72,
166.8918, 167.27, 167.325, 167.371, 167.2251, 167.4, 167.34,
167.29, 167.3988, 167.2, 167.1047, 166.86, 167.025, 167.11, 167.12,
167.2, 167.345, 167.23, 167.29, 167.37, 167.1916, 167.0027, 166.85,
167, 166.94, 166.85, 166.64, 166.7738, 166.85, 166.72, 166.68,
166.4672, 166.3512, 165.79, 166.21, 165.9969, 166.18, 166.14,
166.2968, 166.36, 166.43, 166.36, 165.99, 166.13, 165.83, 165.73,
165.8405, 165.65, 165.545, 165.48, 165.33, 165.05, 165.0227,
165.26, 165.1425, 165.101, 164.91, 164.74, 164.6581, 164.5292,
164.805, 164.89, 164.59, 164.8801, 164.9498, 165.04, 165.12,
165.16, 165.2302, 165.4, 165.34, 165.28, 164.987, 164.89, 164.605,
164.94, 165.185, 165.04, 158120, 165333, 101115, 78491, 123999,
76037, 82733, 87934, 92280, 114359, 102633, 125558, 103938, 99649,
93846, 183154, 177178, 112174, 152806, 205667, 508469, 934090,
1152646, 558627, 277325, 321651, 255494, 333848, 272126, 395463,
194593, 211910, 193131, 242112, 210240, 193265, 139617, 204182,
179146, 159259, 237888, 410982, 213787, 233082, 188071, 193742,
132377, 118994, 264247, 182490, 109514, 138164, 221052, 194127,
169059, 458214, 247712, 169523, 115531, 161259, 263230, 155536,
82474, 87549, 109057, 101772, 130642, 171988, 117235, 134507,
236662, 219303, 217698, 219808, 420288, 208087, 149358, 197435,
218090, 267667, 320279, 422434, 340478, 273866, 258938, 212451,
268017, 323657, 267686, 214060, 222314, 293731, 288867, 219687,
304733, 251063, 425450, 455311, 741208, 1429645),
.Dim = c(100L, 5L),
.Dimnames = list(NULL, c("open", "high", "low", "close", "volume")),
index = structure(c(1579785300, 1579785600, 1579785900, 1579786200, 1579786500,
1579786800, 1579787100, 1579787400, 1579787700, 1579788000,
1579788300, 1579788600, 1579788900, 1579789200, 1579789500,
1579789800, 1579790100, 1579790400, 1579790700, 1579791000,
1579791300, 1579791600, 1579854900, 1579855200, 1579855500,
1579855800, 1579856100, 1579856400, 1579856700, 1579857000,
1579857300, 1579857600, 1579857900, 1579858200, 1579858500,
1579858800, 1579859100, 1579859400, 1579859700, 1579860000,
1579860300, 1579860600, 1579860900, 1579861200, 1579861500,
1579861800, 1579862100, 1579862400, 1579862700, 1579863000,
1579863300, 1579863600, 1579863900, 1579864200, 1579864500,
1579864800, 1579865100, 1579865400, 1579865700, 1579866000,
1579866300, 1579866600, 1579866900, 1579867200, 1579867500,
1579867800, 1579868100, 1579868400, 1579868700, 1579869000,
1579869300, 1579869600, 1579869900, 1579870200, 1579870500,
1579870800, 1579871100, 1579871400, 1579871700, 1579872000,
1579872300, 1579872600, 1579872900, 1579873200, 1579873500,
1579873800, 1579874100, 1579874400, 1579874700, 1579875000,
1579875300, 1579875600, 1579875900, 1579876200, 1579876500,
1579876800, 1579877100, 1579877400, 1579877700, 1579878000),
tzone = "",
tclass = c("POSIXct", "POSIXt")),
class = c("xts", "zoo"))

How to deal with one column of two formats and single class?

I have one column with two different formats but the same class 'factor'.
D$date
2009-05-12 11:30:00
2009-05-13 11:30:00
2009-05-14 11:30:00
2009-05-15 11:30:00
42115.652
2876
8765
class(D$date)
factor
What I need is to convert the number to date.
D$date <- as.character(D$date)
D$date=ifelse(!is.na(as.numeric(D$date)),
as.POSIXct(as.numeric(D$date) * (60*60*24), origin="1899-12-30", tz="UTC"),
D$date)
Now the number was converted but to a strange number "1429630800".
I tried without ifelse:
as.POSIXct(as.numeric(42115.652) * (60*60*24), origin="1899-12-30", tz="UTC")
[1] "2015-04-21 15:38:52 UTC"
It was converted nicely.
The problem is that you are mixing up classes in the true/false halves of your ifelse. You can fix this by adding as.character like this
D$date = ifelse(!is.na(as.numeric(D$date)),
as.character(as.POSIXct(as.numeric(D$date) * (60*60*24), origin="1899-12-30", tz="UTC")),
D$date)
#D
# date
#1 2009-05-12 11:30:00
#2 2009-05-13 11:30:00
#3 2009-05-14 11:30:00
#4 2009-05-15 11:30:00
#5 2015-04-21 15:38:52
#6 1907-11-15 00:00:00
#7 1923-12-30 00:00:00
You can also create a function which transforms each value in POSIX, then using lapply and do.call.
b <- c("2009-05-12 11:30:00", "2009-05-13 11:30:00", "2009-05-14 11:30:00",
"2009-05-15 11:30:00", "42115.652", "2876", "8765")
foo <- function(x){
if(!is.na(as.numeric(x))){
as.POSIXct(as.numeric(x) * (60*60*24), origin="1899-12-30", tz="UTC")
}else{
as.POSIXct(x, origin="1899-12-30", tz="UTC")
}
}
do.call("c", lapply(b, foo))
[1] "2009-05-12 13:30:00 CEST" "2009-05-13 13:30:00 CEST" "2009-05-14 13:30:00 CEST" "2009-05-15 13:30:00 CEST"
[5] "2015-04-21 17:38:52 CEST" "1907-11-15 01:00:00 CET" "1923-12-30 01:00:00 CET"

R time series missing values

I was working with a time series dataset having hourly data. The data contained a few missing values so I tried to create a dataframe (time_seq) with the correct time value and do a merge with the original data so the missing values become 'NA'.
> data
date value
7980 2015-03-30 20:00:00 78389
7981 2015-03-30 21:00:00 72622
7982 2015-03-30 22:00:00 65240
7983 2015-03-30 23:00:00 47795
7984 2015-03-31 08:00:00 37455
7985 2015-03-31 09:00:00 70695
7986 2015-03-31 10:00:00 68444
//converting the date in the data to POSIXct format.
> data$date <- format.POSIXct(data$date,'%Y-%m-%d %H:%M:%S')
// creating a dataframe with the correct sequence of dates.
> time_seq <- seq(from = as.POSIXct("2014-05-01 00:00:00"),
to = as.POSIXct("2015-04-30 23:00:00"), by = "hour")
> df <- data.frame(date=time_seq)
> df
date
8013 2015-03-30 20:00:00
8014 2015-03-30 21:00:00
8015 2015-03-30 22:00:00
8016 2015-03-30 23:00:00
8017 2015-03-31 00:00:00
8018 2015-03-31 01:00:00
8019 2015-03-31 02:00:00
8020 2015-03-31 03:00:00
8021 2015-03-31 04:00:00
8022 2015-03-31 05:00:00
8023 2015-03-31 06:00:00
8024 2015-03-31 07:00:00
// merging with the original data
> a <- merge(data,df, x.by = data$date, y.by = df$date ,all=TRUE)
> a
date value
4005 2014-07-23 07:00:00 37003
4006 2014-07-23 07:30:00 NA
4007 2014-07-23 08:00:00 37216
4008 2014-07-23 08:30:00 NA
The values I get after merging are incorrect and they contain half-hourly values. What would be the correct approach for solving this?
Why are is the merge result in 30 minute intervals when both my dataframes are hourly?
PS:I looked into this question : Fastest way for filling-in missing dates for data.table and followed the steps but it didn't help.
You can use the padr package to solve this problem.
library(padr)
library(dplyr) #for the pipe operator
data %>%
pad() %>%
fill_by_value()

Finding date based on time

I was thinking of how to find date(which does not exist in the table) based on time.
Example: Remember, I only have the time
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
I know for the fact that the start date is 2014-08-28, but how do I get the date which changes after midnight.
Expected outcome would be
9:44 2014-08-28
15:30 2014-08-28
23:48 2014-08-28
00:30 2014-08-29
05:30 2014-08-29
15:30 2014-08-29
22:00 2014-08-29
00:45 2014-08-30
Here's an example using data.table package ITime class which enables you to manipulate time (upon converting time to this class you can now subtract/add minutes/hours/etc.)
library(data.table)
time <- as.ITime(time)
Date <- as.IDate("2014-08-28") + c(0, cumsum(diff(time) < 0))
data.table(time, Date)
# time Date
# 1: 09:44:00 2014-08-28
# 2: 15:30:00 2014-08-28
# 3: 23:48:00 2014-08-28
# 4: 00:30:00 2014-08-29
# 5: 05:30:00 2014-08-29
# 6: 15:30:00 2014-08-29
# 7: 22:00:00 2014-08-29
# 8: 00:45:00 2014-08-30
Using the chron package we assume that a later time is on the same day and an earlier time is on the next day:
library(chron)
date <- as.Date("2014-08-28") + cumsum(c(0, diff(times(paste0(time, ":00"))) < 0))
data.frame(time, date)
giving:
time date
1 9:44 2014-08-28
2 15:30 2014-08-28
3 23:48 2014-08-28
4 00:30 2014-08-29
5 05:30 2014-08-29
6 15:30 2014-08-29
7 22:00 2014-08-29
8 00:45 2014-08-30
Here's one way to do it:
time = c("9:44","15:30","23:48","00:30","05:30", "15:30", "22:00", "00:45")
times <- sapply(strsplit(time, ":", TRUE), function(x) Reduce("+", as.numeric(x) * c(60, 1)))
as.POSIXct("2014-08-28") + times + 60*60*24*cumsum(c(F, tail(times < lag(times), -1)))
# [1] "2014-08-28 00:09:44 CEST" "2014-08-28 00:15:30 CEST" "2014-08-28 00:23:48 CEST" "2014-08-29 00:00:30 CEST" "2014-08-29 00:05:30 CEST" "2014-08-29 00:15:30 CEST" "2014-08-29 00:22:00 CEST" "2014-08-30 00:00:45 CEST"
You can concatenate system date with time and get result. For example, in Oracle we can get date with time as:
to_char(sysdate,'DD-MM-RRRR')|| ' ' || To_char(sysdate,'HH:MIAM')
This will result as eg. 12-09-2015 09:50 AM
For your requirement, use this as:
to_char(sysdate,'DD-MM-RRRR')|| ' 00:45' and so on.

Date time am/pm in R

I'm having an issue with the datetime field of a timeseries:
> CO1temp[163:169,]
Date OPEN HIGH LOW CLOSE
163 7/11/2011 11:45:00 PM 116.30 116.30 116.09 116.18
164 7/11/2011 11:50:00 PM 116.16 116.78 116.13 116.70
165 7/11/2011 11:55:00 PM 116.69 116.83 116.51 116.65
166 7/12/2011 116.65 116.79 116.44 116.50
167 7/12/2011 12:05:00 AM 116.50 116.60 116.39 116.47
168 7/12/2011 12:10:00 AM 116.49 116.55 116.38 116.52
169 7/12/2011 12:15:00 AM 116.52 116.67 116.39 116.44
As you can see the midnight time (line 166) is not showing properly.
Which creates a NA when I create my xts object:
CO1 <- as.xts(CO1temp[, 2:5], order.by = as.POSIXct(CO1temp[,1],format='%m/%d/%Y %r'),frequency="5 minutes")
> CO1[163:169,]
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
<NA> 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
This later leads to more problem when I want to analyze this timeseries.
?strptime is quite specific about it:
The default for the format methods is "%Y-%m-%d %H:%M:%S" if any component has a time component which is not midnight, and "%Y-%m-%d" otherwise.
However my datetime is not in the standard format.
I would greatly appreciate any help.
This a kind of a hack but it works.
You just have to append "12:00:00 AM" to your vector of date: those which are lacking the hour information will be read correctly, and in the dates that already have the hour information it will just be ignored and only the one that was already there will be read.
CO1 <- as.xts(CO1temp[, 2:5],
order.by = as.POSIXct(paste(CO1temp$Date,"12:00:00 AM", sep=" "),
format='%m/%d/%Y %r'),
frequency="5 minutes")
CO1
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
2011-07-12 00:00:00 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
That being said, if you ended up with your dataframe as it is after using strptime then your date column is already in POSIXct format and therefore the following should work directly:
as.xts(CO1temp[, 2:5], order.by = CO1temp$Date, frequency = "5 minutes")

Resources