Transforming data into xts format - r

I have some data, and the Date column includes the time too. I am trying to get this data into xts format. I have tried below, but I get an error. Can anyone see anything wrong with this code? TIA
Date Open High Low Close
1 2017.01.30 07:00 1.25735 1.25761 1.25680 1.25698
2 2017.01.30 08:00 1.25697 1.25702 1.25615 1.25619
3 2017.01.30 09:00 1.25618 1.25669 1.25512 1.25533
4 2017.01.30 10:00 1.25536 1.25571 1.25093 1.25105
5 2017.01.30 11:00 1.25104 1.25301 1.25093 1.25262
6 2017.01.30 12:00 1.25260 1.25479 1.25229 1.25361
7 2017.01.30 13:00 1.25362 1.25417 1.25096 1.25177
8 2017.01.30 14:00 1.25177 1.25219 1.24900 1.25071
9 2017.01.30 15:00 1.25070 1.25307 1.24991 1.25238
10 2017.01.30 16:00 1.25238 1.25358 1.25075 1.25159
df = read.table(file = "GBPUSD60.csv", sep="," , header = TRUE)
dates = as.character(df$Date)
df$Date = NULL
Sept17 = xts(df, as.POSIXct(dates, format="%Y-%m-%d %H:%M"))

Related

Converting mixed times into 24 hour format

I currently have a dataset with multiple different time formats(AM/PM, numeric, 24hr format) and I'm trying to turn them all into 24hr format. Is there a way to standardize mixed format columns?
Current sample data
time
12:30 PM
03:00 PM
0.961469907
0.913622685
0.911423611
09:10 AM
18:00
Desired output
new_time
12:30:00
15:00:00
23:04:31
21:55:37
21:52:27
09:10:00
18:00:00
I know how to do them all individually(an example below), but is there a way to do it all in one go because I have a large amount of data and can't go line by line?
#for numeric time
> library(chron)
> x <- c(0.961469907, 0.913622685, 0.911423611)
> times(x)
[1] 23:04:31 21:55:37 21:52:27
The decimal times are a pain but we can parse them first, feed them back as a character then use lubridate's parse_date_time to do them all at once
library(tidyverse)
library(chron)
# Create reproducible dataframe
df <-
tibble::tibble(
time = c(
"12:30 PM",
"03:00 PM",
0.961469907,
0.913622685,
0.911423611,
"09:10 AM",
"18:00")
)
# Parse times
df <-
df %>%
dplyr::mutate(
time_chron = chron::times(as.numeric(time)),
time_chron = if_else(
is.na(time_chron),
time,
as.character(time_chron)),
time_clean = lubridate::parse_date_time(
x = time_chron,
orders = c(
"%I:%M %p", # HH:MM AM/PM 12 hour format
"%H:%M:%S", # HH:MM:SS 24 hour format
"%H:%M")), # HH:MM 24 hour format
time_clean = hms::as_hms(time_clean)) %>%
select(-time_chron)
Which gives us
> df
# A tibble: 7 × 2
time time_clean
<chr> <time>
1 12:30 PM 12:30:00
2 03:00 PM 15:00:00
3 0.961469907 23:04:31
4 0.913622685 21:55:37
5 0.911423611 21:52:27
6 09:10 AM 09:10:00
7 18:00 18:00:00

Calculate time difference between 2 timestamps in hours using R

I'm trying to obtain the time difference between 2 timestamps in hours.
I have the data:
ID Lat Long Traffic Start_Time End_Time
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10
2 -80.114 40.4131 30 2018-01-01 02:30 2018-01-01 02:40
3 -80.784 40.1142 12 2018-01-01 06:15 2018-01-01 07:20
I want to get the data like this
ID Lat Long Traffic Start_Time End_Time differ_hrs
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10 00:50
2 -80.114 40.4131 30 2018-01-02 08:30 2018-01-02 08:40 01:10
3 -80.784 40.1142 12 2018-01-04 19:26 2018-01-04 20:11 01:15
I tried this code to capture the difference in hours:
df$differ_hrs<- difftime(df$End_Time, df$Start_Time, units = "hours")
However, it captures the difference like this:
ID Lat Long Traffic Start_Time End_Time differ_hrs
1 -80.424 40.4242 54 2018-01-01 01:00 2018-01-01 01:10 0.5
2 -80.114 40.4131 30 2018-01-02 08:30 2018-01-02 08:40 0.70
3 -80.784 40.1142 12 2018-01-04 19:26 2018-01-04 20:11 0.75
then I tried to set the difference in hours into format="%H%M" using the code:
df$differ_HHMM<- format(strptime(df$differ_hrs, format="%H%M"), format = "%H:%M")
But it produces all NAs.
So I decided to try a different way where I calculate the difference and set the format in the command itself adding "%H%M" like this:
df$differ_HHMM<- as.numeric(difftime(strptime(paste(df[,6]),"%Y-%m-%d %H:%M:%S"), strptime(paste(df[,5]),"%Y-%m-%d %H:%M:%S"),format="%H%M", units = "hours"))
but I keep getting the error message:
Error in difftime(strptime(paste(df[, 6]), "%Y-%m-%d %H:%M:%S"), strptime(paste(df[, :
unused argument (format = "%H:%M:%S")
Is there any way to calculate the time difference in %H:%M format?
I really appreciate your suggestions
The difference is a difftime class built on top of numeric. We could specify the units in difftime as seconds and use seconds_to_period from lubridate
library(lubridate)
df$differ_hrs<- as.numeric(difftime(df$End_Time, df$Start_Time,
units = "secs"))
out <- seconds_to_period(df$differ_hrs)
df$differ_HHMM <- sprintf('%02d:%02d', out#hour, out$minute)
NOTE: format works only on Date or Datetime class i.e. POSIXct, POSIXlt and not on numeric/difftime objects
data
df <- structure(list(ID = 1:3, Lat = c(-80.424, -80.114, -80.784),
Long = c(40.4242, 40.4131, 40.1142), Traffic = c(54L, 30L,
12L), Start_Time = structure(c(1514786400, 1514791800, 1514805300
), class = c("POSIXct", "POSIXt"), tzone = ""), End_Time = structure(c(1514787000,
1514792400, 1514809200), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-3L), class = "data.frame")

NA errors when converting datetime column in R

I'm trying to convert a column to a date-time format in R. I've tried the following conversion but it fills my output as NA:
migtimes$mig_start<- format(migtimes$mig_start, "%Y-%m-%d %H:%M:%S")
migtimes$mig_start<-strptime(x = as.character(migtimes$mig_start), format = "%Y-%m-%d %H:%M:%S")
migtimes$mig_start <- as.POSIXct(strptime(migtimes$mig_start , format = "%Y-%m-%d %H:%M:%S"), tz ="MST")
migtimes$mig_start<- strptime(x = as.character(migtimes$mig_start),
format = "%Y-%m-%d %H:%M:%S")
ymd_hms( as.character(migtimes$mig_start),tz ="MST" )
For the ymd_hmsconversion I also get an NA error :
Warning message:
All formats failed to parse. No formats found.
Here's what my dataframe looks like. When I read in my csv file it says the mig_start (which is my date field) is a factor. I want to convert this field to a 2018-12-13 22:00:00 format. I'm at a loss of what else I can try. Any suggestions?
X mig_start
1 3/20/2019 11:00
2 4/3/2019 15:00
3 3/17/2019 22:00
4 3/6/2019 12:00
5 3/6/2019 12:00
6 5/3/2019 5:01
I think it's just a matter of the format string you provided. You want it to match the strings you are converting, not the format you want the dates to print with. Try this:
migtimes <- data.frame(
X = 1:6,
mig_start = c("3/20/2019 11:00", "4/3/2019 15:00", "3/17/2019 22:00",
"3/6/2019 12:00", "3/6/2019 12:00", "5/3/2019 5:01")
)
migtimes$mig_start <- as.POSIXct(migtimes$mig_start, format = "%m/%d/%Y %H:%M",
tz = fill.this.in.with.whatever.is.appropriate.for.you)
You could also try as.POSIXlt instead of as.POSIXct, whichever you're more comfortable dealing with.
Your format string is wrong. You have month/day/year hour:minute.
Using lubridate, you can use mdy_hm():
library(lubridate)
library(dplyr)
migtimes<- migtimes %>%
mutate(dt = mdy_hm(mig_start))
Result:
X mig_start dt
1 1 3/20/2019 11:00 2019-03-20 11:00:00
2 2 4/3/2019 15:00 2019-04-03 15:00:00
3 3 3/17/2019 22:00 2019-03-17 22:00:00
4 4 3/6/2019 12:00 2019-03-06 12:00:00
5 5 3/6/2019 12:00 2019-03-06 12:00:00
6 6 5/3/2019 5:01 2019-05-03 05:01:00
Data:
migtimes <- structure(list(X = 1:6,
mig_start = c("3/20/2019 11:00", "4/3/2019 15:00", "3/17/2019 22:00",
"3/6/2019 12:00", "3/6/2019 12:00", "5/3/2019 5:01")),
class = "data.frame", row.names = c(NA, -6L))

Dataframe datetime value row filling

I have a CSV file that contain the following:
ts1<-read.table(header = TRUE, sep=",", text="
start, end, value
1,26/11/2014 13:00,26/11/2014 20:00,decreasing
2,26/11/2014 20:00,27/11/2014 09:00,increasing ")
I would like to transfer the above dataframe to a dataframe in which each row time column is opened and filled in with the value. The time gap is filled in from the start time to the end time - 1 (minus 1), as followed:
date hour value
1 26/11/2014 13:00 decreasing
2 26/11/2014 14:00 decreasing
3 26/11/2014 15:00 decreasing
4 26/11/2014 16:00 decreasing
5 26/11/2014 17:00 decreasing
6 26/11/2014 18:00 decreasing
7 26/11/2014 19:00 decreasing
8 26/11/2014 20:00 increasing
9 26/11/2014 21:00 increasing
10 26/11/2014 22:00 increasing
11 26/11/2014 23:00 increasing
12 26/11/2014 00:00 increasing
13 26/11/2014 01:00 increasing
14 26/11/2014 02:00 increasing
15 26/11/2014 03:00 increasing
16 26/11/2014 04:00 increasing
17 26/11/2014 05:00 increasing
18 26/11/2014 06:00 increasing
19 26/11/2014 07:00 increasing
20 26/11/2014 08:00 increasing
I tried to start with separating the hours from the dates:
> t <- strftime(ts1$end, format="%H:%M:%S")
> t
[1] "00:00:00" "00:00:00"
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(ts1)), grouped by the sequence of rows (1:nrow(ts1)), we convert the 'start' and 'end' columns to datetime class (using dmy_hm from lubridate), get the sequence by '1 hour', format the result to expected format, then split by space (tstrsplit), concatenate with the 'value' column, remove the 'rn' column by assigning to NULL. Finally, we can change the column names (if needed).
library(lubridate)
library(data.table)
res <- setDT(ts1)[,{st <- dmy_hm(start)
et <- dmy_hm(end)
c(tstrsplit(format(head(seq(st, et, by = "1 hour"),-1),
"%d/%m/%Y %H:%M"), "\\s+"), as.character(value))} ,
by = .(rn=1:nrow(ts1))
][, rn := NULL][]
setnames(res, c("date", "hour", "value"))[]
# date hour value
# 1: 26/11/2014 13:00 decreasing
# 2: 26/11/2014 14:00 decreasing
# 3: 26/11/2014 15:00 decreasing
# 4: 26/11/2014 16:00 decreasing
# 5: 26/11/2014 17:00 decreasing
# 6: 26/11/2014 18:00 decreasing
# 7: 26/11/2014 19:00 decreasing
# 8: 26/11/2014 20:00 increasing
# 9: 26/11/2014 21:00 increasing
#10: 26/11/2014 22:00 increasing
#11: 26/11/2014 23:00 increasing
#12: 27/11/2014 00:00 increasing
#13: 27/11/2014 01:00 increasing
#14: 27/11/2014 02:00 increasing
#15: 27/11/2014 03:00 increasing
#16: 27/11/2014 04:00 increasing
#17: 27/11/2014 05:00 increasing
#18: 27/11/2014 06:00 increasing
#19: 27/11/2014 07:00 increasing
#20: 27/11/2014 08:00 increasing
Here is a solution using lubridate and plyr. It processes each row of the data to make a sequence from the start to the end, and returns this with the value. Results from each row are combined into one data.frame. If you need to process the results further, you might be better off not separating the datetime into date and time
library(plyr)
library(lubridate)
ts1$start <- dmy_hm(ts1$start)
ts1$end <- dmy_hm(ts1$end)
adply(.data = ts1, .margin = 1, .fun = function(x){
datetime <- seq(x$start, x$end, by = "hour")
#data.frame(datetime, value = x$value)"
data.frame(date = as.Date(datetime), time = format(datetime, "%H:%M"), value = x$value)
})[, -(1:2)]

obtain hour from DateTime vector

I have a DateTime vector within a data.frame where the data frame is made up of 8760 observations representing hourly intervals throughout the year e.g.
2010-01-01 00:00
2010-01-01 01:00
2010-01-01 02:00
2010-01-01 03:00
and so on.
I would like to create a data.frame which has the original DateTime vector as the first column and then the hourly values in the second column e.g.
2010-01-01 00:00 00:00
2010-01-01 01:00 01:00
How can this be achieved?
Use format or strptime to extract the time information.
Create a POSIXct vector:
x <- seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5)
Extract the time:
data.frame(
date=x,
time=format(x, "%H:%M")
)
date time
1 2012-05-21 00:00:00 00:00
2 2012-05-21 01:00:00 01:00
3 2012-05-21 02:00:00 02:00
4 2012-05-21 03:00:00 03:00
5 2012-05-21 04:00:00 04:00
If the input vector is a character vector, then you have to convert to POSIXct first:
Create some data
dat <- data.frame(
DateTime=format(seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5), format="%Y-%m-%d %H:%M")
)
dat
DateTime
1 2012-05-21 00:00
2 2012-05-21 01:00
3 2012-05-21 02:00
4 2012-05-21 03:00
5 2012-05-21 04:00
Split time out:
data.frame(
DateTime=dat$DateTime,
time=format(as.POSIXct(dat$DateTime, format="%Y-%m-%d %H:%M"), format="%H:%M")
)
DateTime time
1 2012-05-21 00:00 00:00
2 2012-05-21 01:00 01:00
3 2012-05-21 02:00 02:00
4 2012-05-21 03:00 03:00
5 2012-05-21 04:00 04:00
Or generically, not treating them as dates, you can use the following provided that the time and dates are padded correctly.
library(stringr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- data.frame(df, Time = str_sub(df$DateTime, -5, -1))
It depends on your needs really.
Using lubridate
library(stringr)
library(lubridate)
library(plyr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- mutate(df, DateTime = ymd_hm(DateTime),
time = str_c(hour(DateTime), str_pad(minute(DateTime), 2, side = 'right', pad = '0'), sep = ':'))
On a more general note, for anyone that comes here from google and maybe wants to group by hour:
The key here is: lubridate::hour(datetime)
p22 in the cran doc here: https://cran.r-project.org/web/packages/lubridate/lubridate.pdf

Resources