My Code is reading in a CSV file and converting the time stamp column to the R time format
DF <- read.csv("DF.CSV",head=TRUE,sep=",")
DF[51082,1]
[1] 03/01/2012 19:29
DF[1,1]
[1] 02/24/12 00:29
It reads it in properly and the above 2 rows are displayed as expected
DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%y %H:%M"))
DF[1,1]
[1] "2012-02-24 00:29:00 GMT"
DF[51082,1]
[1] NA
After converting them to the R time format using strptime and then displaying them again some of the values have NA and there was no error message displayed or reason for it that I can figure out
You have (at least) two different date formats,
one in %Y (4-digit years), one in %y (2-digit years).
Unless 12 really means 12AD, you need to try both.
DF <- data.frame(
START = c(
"03/01/2012 19:29",
"02/24/12 00:29"
),
stringsAsFactors = FALSE
)
coalesce <- function (x, ...) {
z <- class(x)
for (y in list(...)) {
x <- ifelse(is.na(x), y, x)
}
class(x) <- z
x
}
DF$START <- coalesce(
as.POSIXct(strptime(DF$START, format="%m/%d/%y %H:%M")),
as.POSIXct(strptime(DF$START, format="%m/%d/%Y %H:%M"))
)
# START
# 1 2012-03-01 19:29:00
# 2 2012-02-24 00:29:00
Try to use this:
> DF$START <- as.POSIXct(strptime(paste(DF$START),format="%m/%d/%Y %H:%M"))
This adds year with century.
Related
I have a problem in R that is killing me! Can you help me?
I found a question in StackOverflow that gave me a very good explanation.
Here is the link: How to parse milliseconds?
I was able to implement the following code that works very well.
z2 <- strptime("10/2/20 11:16:17.682", "%d/%m/%y %H:%M:%OS")
z1 <- strptime("10/2/20 11:16:16.683", "%d/%m/%y %H:%M:%OS")
When I calculate z2-z1, I get
Time difference of 0.9989998 secs
Similarly, when I use
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
z4 <- strptime("130 11:16:18.682", "%j %H:%M:%OS")
When I calculate z4-z3, I get
Time difference of 1.999 secs
What is my problem?
The first column has the format 130 18:25:50.408, with millions of rows!!!
The second column has the format 2020 130 18:25:51.357 that is like the first column but has the year 2020.
The first column is also from 2020, but as the year is not there R uses the current year.
First question,
How can I substract both columns? I know how to substract columns.
What I do not know is to subtract these two times.
For example, second time is 2020 130 18:25:51.357
and first time is 130 18:25:50.408
I guess that I can do it programmatically converting it to a string, and eliminating the 2020. However, I am hoping that a quicker solution is available using base R or the lubridate package.
Second question,
"%j %H:%M:%OS" is the format for 130 11:16:16.683
What is the format for 2020 130 18:25:51.357?
As explained before this is working very well:
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
But, this is NOT working.
z7 <- strptime("2020 130 11:16:16.683", "%y %j %H:%M:%OS")
UPDATE 1
I solved the second question!
However, I have not figured out yet the first question.
For the second question, the mistake in the format was that instead of %y, I need to write %Y with upper case.
Here is one example:
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("2020 130 11:16:16.684", "%Y %j %H:%M:%OS")
difftime(later,earlier,units="secs")
The R results is:
Time difference of 0.9990001 secs
UPDATE 2
At this point, what is pending is the following:
I need to substract two times that were made the same day on 2020.
The second time does have the year, the first time does not.
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("130 11:16:16.684", "%j %H:%M:%OS")
difftime(later,earlier,units="secs")
R produces the following result:
Time difference of -31622399 secs
Why? As we are on 2021, R formats the vector earlier as the current year, 2021 because the year is not there.
My columns has millions of rows.
At this point, my guess is that I would need to add 2020 with a concatenation or something like that. Is there any other method?
Thank you for your help!
Your object z2 is a POSIX list object. What this means is that it is a list of the time elements of your time.
print.default(z2)
# $sec
# [1] 17.682
#
# $min
# [1] 16
#
# $hour
# [1] 11
#
# $mday
# [1] 10
#
# $mon
# [1] 1
#
# $year
# [1] 120
#
# $wday
# [1] 1
#
# $yday
# [1] 40
#
# $isdst
# [1] 0
#
# $zone
# [1] "GMT"
#
# $gmtoff
# [1] NA
#
# attr(,"class")
# [1] "POSIXlt" "POSIXt"
When you do a subtraction, z2 - z1 R dispatches this operation to a function called -.POSIXt, which itself calls difftime. This function converts z2 to a POSIX count object. What this means is that it gets converted to a count of seconds since the beginning of the epoch, by default "1970-01-01".
options("digits" = 16)
print.default(as.POSIXct(z2))
# [1] 1581333377.682
# attr(,"class")
# [1] "POSIXct" "POSIXt"
# attr(,"tzone")
# [1] ""
difftime(z2, z1)
# Time difference of 0.9989998340606689 secs
R, like most software, works with double precision numerics. This means that arithmetic is imprecise, although approximately true. Most software will try to hide this imprecision by reducing the number of digits shown. That said, different numbers will give you different imprecision, so you might prefer referring directly to the list element of z2.
print.default(z2$sec - z1$sec)
# [1] 0.9989999999999988
You could therefore apply the time difference using your favourite data.frame tools.
options("digits" = 6)
# character columns
df1 <- data.frame(
col1 = c("10/2/20 11:16:17.682", "10/2/20 11:16:16.683"),
col2 = c("130 11:16:16.683", "130 11:16:18.682"),
stringsAsFactors = FALSE)
library(dplyr)
# convert columns to POSIXlt
df2 <- mutate(df1,
col1 = strptime(col1, "%d/%m/%y %H:%M:%OS"),
col2 = strptime(stringr::str_c("2020 ", col2), "%Y %j %H:%M:%OS"),
diff_days = unclass(difftime(col2, col1, units = "days")))
df2
# col1 col2 diff_days
# 1 2020-02-10 11:16:17 2020-05-09 11:16:16 88.9583
# 2 2020-02-10 11:16:16 2020-05-09 11:16:18 88.9584
I am using R. I have a tibble of values and a datetime index. I want to convert the tibble in an xts.
Here you are sample data and the code I use:
Date <- c("2010-01-04" , "2010-01-04")
Time <- c("04:00:00", "06:00:00")
value <- c(1, 2)
df <- as_tibble(value) %>% add_column(Date = Date, Time = Time)
df <- df %>% mutate(datetime = as.POSIXct(paste(Date, Time), format="%Y-%m-%d %H:%M:%S"))
library(xts)
dfxts <- as.xts(df[,1], order.by=df[,4])
Nevertheless, I get the following error:
Error in xts(x, order.by = order.by, frequency = frequency, ...) :
order.by requires an appropriate time-based object
Any idea what is driving this? Datetime should be an appropriate time-based object... Many thanks.
The argument to order_by must be a vector. When you extract from a tbl_df using foo[,bar] the class of the returned object is not a vector, it is a tbl_df. Use df[[4]].
You should re-examine each step and check what you are getting. I actually find that easiest to do in one container. You could use tbl, I happen to like data.frame.
So let's first build a data.frame from your data:
R> Date <- c("2010-01-04" , "2010-01-04")
R> Time <- c("04:00:00", "06:00:00")
R> value <- c(1, 2)
R> df <- data.frame(Date=Date, Time=Time, value=value)
R> df
Date Time value
1 2010-01-04 04:00:00 1
2 2010-01-04 06:00:00 2
R>
Let's then collate and parse the date and time info and check it:
R> df[,"pt"] <- as.POSIXct(paste(Date, Time))
R> df
Date Time value pt
1 2010-01-04 04:00:00 1 2010-01-04 04:00:00
2 2010-01-04 06:00:00 2 2010-01-04 06:00:00
R>
After that it is just a matter of calling xts with the correct components:
R> x <- xts(df[,"value"], order.by=df[,"pt"])
R> x
[,1]
2010-01-04 04:00:00 1
2010-01-04 06:00:00 2
R>
Edit Or you could it all in one step without any other package but forgoing to ability to step through intermediate steps:
R> x2 <- xts(value, order.by=as.POSIXct(paste(Date, Time)))
R> x2
V1
2010-01-04 04:00:00 1
2010-01-04 06:00:00 2
R> all.equal(x, x2)
[1] TRUE
R>
I have a data frame containing dates as characters,dd.mm.yyyy format. want to convert those in date class, format yyyy-m-d. as.date() is not working returning error, do not know how to convert 'dates' to class “Date”
dates <- data.frame(cbind(c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014")))
colnames(dates) <- c("dates")
as.Date(dates, format = "%Y-%m-%d")
new_format_dates <- cbind(gsub("[[:punct:]]", "", dates[1:nrow(dates),1]))
as.Date(new_format_dates, format = "%Y-%m-%d")
So I tried to replace the . and reformat those dates under new_format_dates, returning result like [1] NA NA NA NA
Firstly, make your data.frames properly; don't use cbind in data.frame. Next, set the format argument of as.Date to the format you've got, including separators. If you don't know the symbol you need, check out ?strptime.
dates <- data.frame(dates = c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014"))
dates$dates_new <- as.Date(dates$dates, format = "%d.%m.%Y")
dates
# dates dates_new
# 1 5.1.2015 2015-01-05
# 2 6.1.2014 2014-01-06
# 3 17.2.2014 2014-02-17
# 4 28.10.2014 2014-10-28
dates <- data.frame(cbind(c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014")))
colnames(dates) <- c("dates")
dates$new_Dates <- gsub("[.]","-",dates$dates)
dates$dates <- NULL
dates_new <- as.Date(dates$new_Dates, format = "%d-%m-%Y")
dates_new <- data.frame(dates_new)
print(dates_new)
I have time-series data in xts representation as
library(xts)
xtime <-timeBasedSeq('2015-01-01/2015-01-30 23')
df <- xts(rnorm(length(xtime),30,4),xtime)
Now I want to calculate co-orelation between different days, and hence I want to represent df in matrix form as:
To achieve this I used
p_mat= split(df,f="days",drop=FALSE,k=1)
Using this I get a list of days, but I am not able to arrange this list in matrix form. Also I used
p_mat<- df[.indexday(df) %in% c(1:30) & .indexhour(df) %in% c(1:24)]
With this I do not get any output.
Also I tried to use rollapply(), but was not able to arrange it properly.
May I get help to form the matrix using xts/zoo objects.
Maybe you could use something like this:
#convert to a data.frame with an hour column and a day column
df2 <- data.frame(value = df,
hour = format(index(df), '%H:%M:%S'),
day = format(index(df), '%Y:%m:%d'),
stringsAsFactors=FALSE)
#then use xtabs which ouputs a matrix in the format you need
tab <- xtabs(value ~ day + hour, df2)
Output:
hour
day 00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00
2015:01:01 28.15342 35.72913 27.39721 29.17048 28.42877 28.72003 28.88355 31.97675 29.29068 27.97617 35.37216 29.14168 29.28177
2015:01:02 23.85420 28.79610 27.88688 27.39162 29.77241 22.34256 34.70633 23.34011 28.14588 25.53632 26.99672 38.34867 30.06958
2015:01:03 37.47716 31.70040 29.04541 34.23393 33.54569 27.52303 38.82441 28.97989 24.30202 29.42240 30.83015 39.23191 30.42321
2015:01:04 24.13100 32.08409 29.36498 35.85835 26.93567 28.27915 26.29556 29.29158 31.60805 27.07301 33.32149 25.16767 25.80806
2015:01:05 32.16531 29.94640 32.04043 29.34250 31.68278 28.39901 24.51917 33.95135 36.07898 28.76504 24.98684 32.56897 29.82116
2015:01:06 18.44432 27.43807 32.28203 29.76111 29.60729 32.24328 25.25417 34.38711 29.97862 32.82924 34.13643 30.89392 26.48517
2015:01:07 34.58491 20.38762 32.29096 31.49890 28.29893 33.80405 28.44305 28.86268 33.42964 36.87851 31.08022 28.31126 25.24355
2015:01:08 33.67921 31.59252 28.36989 35.29703 27.19507 27.67754 25.99571 27.32729 33.78074 31.73481 34.02064 28.43953 31.50548
2015:01:09 28.46547 36.61658 36.04885 30.33186 32.26888 25.90181 31.29203 34.17445 30.39631 28.18345 27.37687 29.85631 34.27665
2015:01:10 30.68196 26.54386 32.71692 28.69160 23.72367 28.53020 35.45774 28.66287 32.93100 33.78634 30.01759 28.59071 27.88122
2015:01:11 32.70907 31.51985 29.22881 36.31157 32.38494 25.30569 29.37743 22.32436 29.21896 19.63069 35.25601 27.45783 28.28008
2015:01:12 29.96676 30.51542 29.41650 29.34436 37.05421 33.05035 34.44572 26.30717 30.65737 34.61930 29.77391 21.48256 31.37938
2015:01:13 33.46089 34.29776 37.58262 27.58801 28.43653 28.33511 28.49737 28.53348 28.81729 35.76728 27.20985 28.44733 32.61015
2015:01:14 22.96213 32.27889 36.44939 23.45088 26.88173 27.43529 27.27547 21.86686 32.00385 23.87281 29.90001 32.37194 29.20722
2015:01:15 28.30359 30.94721 20.62911 33.84679 27.58230 26.98849 23.77755 24.18443 30.22533 32.03748 21.60847 25.98255 32.14309
2015:01:16 23.52449 29.56138 31.76356 35.40398 24.72556 31.45754 30.93400 34.77582 29.88836 28.57080 25.41274 27.93032 28.55150
2015:01:17 25.56436 31.23027 25.57242 31.39061 26.50694 30.30921 28.81253 25.26703 30.04517 33.96640 36.37587 24.50915 29.00156
...and so on
Here's one way to do it using a helper function that will account for days that do not have 24 observations.
library(xts)
xtime <- timeBasedSeq('2015-01-01/2015-01-30 23')
set.seed(21)
df <- xts(rnorm(length(xtime),30,4), xtime)
tHourly <- function(x) {
# initialize result matrix for all 24 hours
dnames <- list(format(index(x[1]), "%Y-%m-%d"),
paste0("H", 0:23))
res <- matrix(NA, 1, 24, dimnames = dnames)
# transpose day's rows and set colnames
tx <- t(x)
colnames(tx) <- paste0("H", .indexhour(x))
# update result object and return
res[,colnames(tx)] <- tx
res
}
# split on days, apply tHourly to each day, rbind results
p_mat <- split(df, f="days", drop=FALSE, k=1)
p_list <- lapply(p_mat, tHourly)
p_hmat <- do.call(rbind, p_list)
I have been provided a dataset with an ambiguous date format, e.g:
d_raw <- c("1102001 23:00", "1112001 0:00")
I would like to try to parse this date into a POSIXlt object in R. The source of the file assures me that the file is in chronological order, that the date format is month, then day, then year, and that there are no gaps in the time series.
Is there any way to parse this date format, using the ordering to resolve ambiguities? E.g. the first element above should parse to c("2001-01-10 23:00:00", "2001-01-11 00:00:00") rather than c("2001-01-10 23:00:00", "2001-11-01 00:00:00").
How about this (using regular expressions)
d_raw <- c("192001 16:00", "1102001 23:00", "1112001 0:00")
re <- "^(.+?)([1-9]|[1-3][0-9])(\\d{4}) (\\d{1,2}):(\\d{2})$"
m <- regexec(re, d_raw)
parts <- regmatches(d_raw, m)
lapply(parts, function(x) {
x<-as.numeric(x[-1])
ISOdate(x[3], x[1], x[2], x[4], x[5])
})
# [[1]]
# [1] "2001-01-09 16:00:00 GMT"
#
# [[2]]
# [1] "2001-01-10 23:00:00 GMT"
#
# [[3]]
# [1] "2001-01-11 GMT"
If you had more test cases that would be helpful just to make sure the regular expression correctly works.
I pity you for your horrible data vendor, so I decided to try and fix this for you.
# make up some horrid data
d_bad <- as.POSIXlt(seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by=1))
d_raw <- paste0(d_bad$mon+1, d_bad$mday, d_bad$year+1900)
d_new <- d_raw
# not ambiguous when nchar is 6
d_new <- ifelse(nchar(d_new)==6,
paste0("0", substr(d_new,1,1), "0", substr(d_new,2,nchar(d_new))), d_new)
# now not ambiguous when nchar is 7 and it doesn't begin with a "1"
d_new <- ifelse(nchar(d_new)==7 & substr(d_new,1,1) != "1",
paste0("0",d_new), d_new)
# now guess a leading zero and parse
d_new <- ifelse(nchar(d_new)==7, paste0("0",d_new), d_new)
d_try <- as.Date(d_new, "%m%d%Y")
# now only days in October, November, and December might be wrong
bad <- cumsum(c(1L,as.integer(diff(d_try)))-1L) < 0L
# put the leading zero in the day, but remember "bad" rows have an
# extra leading zero, so make sure to skip it
d_try2 <- ifelse(bad,
paste0(substr(d_new,2,3),"0", substr(d_new,4,nchar(d_new))), d_new)
# convert to Date, POSIXlt, whatever and do a happy dance
d_YAY <- as.Date(d_try2, "%m%d%Y")
data.frame(d_raw, d_new, d_try, bad, d_try2, d_YAY)
# d_raw d_new d_try bad d_try2 d_YAY
# 1 112014 01012014 2014-01-01 FALSE 01012014 2014-01-01
# 2 122014 01022014 2014-01-02 FALSE 01022014 2014-01-02
# 3 132014 01032014 2014-01-03 FALSE 01032014 2014-01-03
# 4 142014 01042014 2014-01-04 FALSE 01042014 2014-01-04
# 5 152014 01052014 2014-01-05 FALSE 01052014 2014-01-05
# 6 162014 01062014 2014-01-06 FALSE 01062014 2014-01-06
I only did this with Dates in order to keep the example data set small. Doing this for POSIXlt would be very similar, except you would need to change the as.Date calls to as.POSIxlt and adjust the format accordingly.