I am using R. I have a tibble of values and a datetime index. I want to convert the tibble in an xts.
Here you are sample data and the code I use:
Date <- c("2010-01-04" , "2010-01-04")
Time <- c("04:00:00", "06:00:00")
value <- c(1, 2)
df <- as_tibble(value) %>% add_column(Date = Date, Time = Time)
df <- df %>% mutate(datetime = as.POSIXct(paste(Date, Time), format="%Y-%m-%d %H:%M:%S"))
library(xts)
dfxts <- as.xts(df[,1], order.by=df[,4])
Nevertheless, I get the following error:
Error in xts(x, order.by = order.by, frequency = frequency, ...) :
order.by requires an appropriate time-based object
Any idea what is driving this? Datetime should be an appropriate time-based object... Many thanks.
The argument to order_by must be a vector. When you extract from a tbl_df using foo[,bar] the class of the returned object is not a vector, it is a tbl_df. Use df[[4]].
You should re-examine each step and check what you are getting. I actually find that easiest to do in one container. You could use tbl, I happen to like data.frame.
So let's first build a data.frame from your data:
R> Date <- c("2010-01-04" , "2010-01-04")
R> Time <- c("04:00:00", "06:00:00")
R> value <- c(1, 2)
R> df <- data.frame(Date=Date, Time=Time, value=value)
R> df
Date Time value
1 2010-01-04 04:00:00 1
2 2010-01-04 06:00:00 2
R>
Let's then collate and parse the date and time info and check it:
R> df[,"pt"] <- as.POSIXct(paste(Date, Time))
R> df
Date Time value pt
1 2010-01-04 04:00:00 1 2010-01-04 04:00:00
2 2010-01-04 06:00:00 2 2010-01-04 06:00:00
R>
After that it is just a matter of calling xts with the correct components:
R> x <- xts(df[,"value"], order.by=df[,"pt"])
R> x
[,1]
2010-01-04 04:00:00 1
2010-01-04 06:00:00 2
R>
Edit Or you could it all in one step without any other package but forgoing to ability to step through intermediate steps:
R> x2 <- xts(value, order.by=as.POSIXct(paste(Date, Time)))
R> x2
V1
2010-01-04 04:00:00 1
2010-01-04 06:00:00 2
R> all.equal(x, x2)
[1] TRUE
R>
Related
I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
I have a time column in R as:
22:34:47
06:23:15
7:35:15
5:45
How to make all the time values in a column into hh:mm:ss format. I have used
as_date(a$time, tz=NULL) but I am not able to get the format which I wanted.
Here is an option with parse_date_time which can take multiple formats
library(lubridate)
format(parse_date_time(time, c("HMS", "HM"), tz = "GMT"), "%H:%M:%S")
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
data
time <- c("22:34:47", "06:23:15", "7:35:15", "5:45")
Nothing a bit of formatting can't take care of:
x <- c("22:34:47","06:23:15","7:35:15","5:45")
format(
pmax(
as.POSIXct(x, format="%T", tz="UTC"),
as.POSIXct(x, format="%R", tz="UTC"), na.rm=TRUE
),
"%T"
)
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
The pmax means any additional seconds will be taken in preference to just hh:mm.
You could get functional if you wanted to get a similar result with less typing, and more opportunity for turning it into a repeatable function.
do.call(pmax, c(lapply(c("%T","%R"), as.POSIXct, x=x, tz="UTC"), na.rm=TRUE))
Using a tidyverse approach with dplyr and hms verbs.
library(dplyr)
library(hms)
a <- tibble(time = c("22:34:47", "06:23:15", "7:35:15", "5:45"))
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ parse_hm(time),
TRUE ~ parse_hms(time)
)
)
# # A tibble: 4 x 1
# time
# <time>
# 1 22:34
# 2 06:23
# 3 07:35
# 4 05:45
Note that the use of case_when could be replaced with an ifelse. The reason for this conditional is that parse_hms will return NA for values without seconds.
You may also want the output to be a POSIX compliant value, you may adapt the previous solution to do so.
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ as.POSIXct(parse_hm(time)),
TRUE ~ as.POSIXct(parse_hms(time))
)
)
# # A tibble: 4 x 1
# time
# <dttm>
# 1 1970-01-01 22:34:47
# 2 1970-01-01 06:23:15
# 3 1970-01-01 07:35:15
# 4 1970-01-01 05:45:00
Note this will set the date to origin, which is 1970-01-01 by default.
I have time-series data in xts representation as
library(xts)
xtime <-timeBasedSeq('2015-01-01/2015-01-30 23')
df <- xts(rnorm(length(xtime),30,4),xtime)
Now I want to calculate co-orelation between different days, and hence I want to represent df in matrix form as:
To achieve this I used
p_mat= split(df,f="days",drop=FALSE,k=1)
Using this I get a list of days, but I am not able to arrange this list in matrix form. Also I used
p_mat<- df[.indexday(df) %in% c(1:30) & .indexhour(df) %in% c(1:24)]
With this I do not get any output.
Also I tried to use rollapply(), but was not able to arrange it properly.
May I get help to form the matrix using xts/zoo objects.
Maybe you could use something like this:
#convert to a data.frame with an hour column and a day column
df2 <- data.frame(value = df,
hour = format(index(df), '%H:%M:%S'),
day = format(index(df), '%Y:%m:%d'),
stringsAsFactors=FALSE)
#then use xtabs which ouputs a matrix in the format you need
tab <- xtabs(value ~ day + hour, df2)
Output:
hour
day 00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00
2015:01:01 28.15342 35.72913 27.39721 29.17048 28.42877 28.72003 28.88355 31.97675 29.29068 27.97617 35.37216 29.14168 29.28177
2015:01:02 23.85420 28.79610 27.88688 27.39162 29.77241 22.34256 34.70633 23.34011 28.14588 25.53632 26.99672 38.34867 30.06958
2015:01:03 37.47716 31.70040 29.04541 34.23393 33.54569 27.52303 38.82441 28.97989 24.30202 29.42240 30.83015 39.23191 30.42321
2015:01:04 24.13100 32.08409 29.36498 35.85835 26.93567 28.27915 26.29556 29.29158 31.60805 27.07301 33.32149 25.16767 25.80806
2015:01:05 32.16531 29.94640 32.04043 29.34250 31.68278 28.39901 24.51917 33.95135 36.07898 28.76504 24.98684 32.56897 29.82116
2015:01:06 18.44432 27.43807 32.28203 29.76111 29.60729 32.24328 25.25417 34.38711 29.97862 32.82924 34.13643 30.89392 26.48517
2015:01:07 34.58491 20.38762 32.29096 31.49890 28.29893 33.80405 28.44305 28.86268 33.42964 36.87851 31.08022 28.31126 25.24355
2015:01:08 33.67921 31.59252 28.36989 35.29703 27.19507 27.67754 25.99571 27.32729 33.78074 31.73481 34.02064 28.43953 31.50548
2015:01:09 28.46547 36.61658 36.04885 30.33186 32.26888 25.90181 31.29203 34.17445 30.39631 28.18345 27.37687 29.85631 34.27665
2015:01:10 30.68196 26.54386 32.71692 28.69160 23.72367 28.53020 35.45774 28.66287 32.93100 33.78634 30.01759 28.59071 27.88122
2015:01:11 32.70907 31.51985 29.22881 36.31157 32.38494 25.30569 29.37743 22.32436 29.21896 19.63069 35.25601 27.45783 28.28008
2015:01:12 29.96676 30.51542 29.41650 29.34436 37.05421 33.05035 34.44572 26.30717 30.65737 34.61930 29.77391 21.48256 31.37938
2015:01:13 33.46089 34.29776 37.58262 27.58801 28.43653 28.33511 28.49737 28.53348 28.81729 35.76728 27.20985 28.44733 32.61015
2015:01:14 22.96213 32.27889 36.44939 23.45088 26.88173 27.43529 27.27547 21.86686 32.00385 23.87281 29.90001 32.37194 29.20722
2015:01:15 28.30359 30.94721 20.62911 33.84679 27.58230 26.98849 23.77755 24.18443 30.22533 32.03748 21.60847 25.98255 32.14309
2015:01:16 23.52449 29.56138 31.76356 35.40398 24.72556 31.45754 30.93400 34.77582 29.88836 28.57080 25.41274 27.93032 28.55150
2015:01:17 25.56436 31.23027 25.57242 31.39061 26.50694 30.30921 28.81253 25.26703 30.04517 33.96640 36.37587 24.50915 29.00156
...and so on
Here's one way to do it using a helper function that will account for days that do not have 24 observations.
library(xts)
xtime <- timeBasedSeq('2015-01-01/2015-01-30 23')
set.seed(21)
df <- xts(rnorm(length(xtime),30,4), xtime)
tHourly <- function(x) {
# initialize result matrix for all 24 hours
dnames <- list(format(index(x[1]), "%Y-%m-%d"),
paste0("H", 0:23))
res <- matrix(NA, 1, 24, dimnames = dnames)
# transpose day's rows and set colnames
tx <- t(x)
colnames(tx) <- paste0("H", .indexhour(x))
# update result object and return
res[,colnames(tx)] <- tx
res
}
# split on days, apply tHourly to each day, rbind results
p_mat <- split(df, f="days", drop=FALSE, k=1)
p_list <- lapply(p_mat, tHourly)
p_hmat <- do.call(rbind, p_list)
I have a dataframe like the one given by:
x <- c(1:6)
y <- c("06/01/13 16:00:00",
"06/01/13 16:00:00",
"06/03/13 20:00:00",
"06/03/13 20:00:00",
"06/07/13 20:00:00",
"06/08/13 20:00:00")
dfrm <- data.frame(x,y)
dfrm
x y
1 06/01/13 16:00:00
2 06/01/13 16:00:00
3 06/03/13 20:00:00
4 06/03/13 20:00:00
5 06/07/13 20:00:00
6 06/08/13 20:00:00
I want to make y a chron object:
dfrm$y <- as.chron(dfrm$y, "%m/%d/%y %H:%M")
Then I have a vector of dates:
intensives <- c("06/01/13", "06/07/13")
Then I want to subset the data frame "dfrm" by the dates in the "intensives" vector.
What I would do it would something like:
subset(dfrm, y==dates(intensives))
or
subset(dfrm, y %in% dates(intensives))
but both give me a null result.
Note:In most person's setups where stringAsFactors=TRUE that conversion to chron would have failed. They would need to do this:
dfrm$y <- as.chron(as.character(dfrm$y), "%m/%d/%y %H:%M")
date-objects are not chron-objects, but chron objects can be coerced with the dates function
subset(dfrm, dates(y) %in% dates(intensives))
x y
1 1 (06/01/13 16:00:00)
2 2 (06/01/13 16:00:00)
5 5 (06/07/13 20:00:00)
That's because you're comparing datetimes to dates.
Do subset(dfrm, dates(y) %in% dates(intensives)) instead.
You first subset using == will never work, regardless of data type.
My Code is reading in a CSV file and converting the time stamp column to the R time format
DF <- read.csv("DF.CSV",head=TRUE,sep=",")
DF[51082,1]
[1] 03/01/2012 19:29
DF[1,1]
[1] 02/24/12 00:29
It reads it in properly and the above 2 rows are displayed as expected
DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%y %H:%M"))
DF[1,1]
[1] "2012-02-24 00:29:00 GMT"
DF[51082,1]
[1] NA
After converting them to the R time format using strptime and then displaying them again some of the values have NA and there was no error message displayed or reason for it that I can figure out
You have (at least) two different date formats,
one in %Y (4-digit years), one in %y (2-digit years).
Unless 12 really means 12AD, you need to try both.
DF <- data.frame(
START = c(
"03/01/2012 19:29",
"02/24/12 00:29"
),
stringsAsFactors = FALSE
)
coalesce <- function (x, ...) {
z <- class(x)
for (y in list(...)) {
x <- ifelse(is.na(x), y, x)
}
class(x) <- z
x
}
DF$START <- coalesce(
as.POSIXct(strptime(DF$START, format="%m/%d/%y %H:%M")),
as.POSIXct(strptime(DF$START, format="%m/%d/%Y %H:%M"))
)
# START
# 1 2012-03-01 19:29:00
# 2 2012-02-24 00:29:00
Try to use this:
> DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%Y %H:%M"))
This adds year with century.