I have 2 data tables and I would like to find the rows that overlap using foverlaps. I think I am getting tripped up because some of the dates have fractional seconds.
library(data.table)
First create a data table of shift times
On <- as.POSIXct(c("2017-08-01 00:05:54", "2017-08-01 00:07:20", "2017-08-01 00:21:53"), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
Off <- as.POSIXct(c("2017-08-01 00:05:54", "2017-08-01 00:07:20", "2017-08-01 00:21:53"), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
shifts <- data.table(On, Off)
Now create a data table of observations times
The first bunch of observation times are from Matlab, so need to be converted to POSIXct first. These end up giving me fractional seconds
timestamp <- c(736908.0041, 736908.0051, 736908.009, 736908.012, 736908.0152)
Obs = data.table(SightingTime = as.POSIXct((timestamp-719529)*86400, origin = "1970-01-01", tz = "UTC"))
#add a variable for the "date type"
Obs$DateType = "Long"
Add a row to the data table that does not have fractional seconds (for the purpose of this example)
Obs <- rbind(Obs, data.table(SightingTime=as.POSIXct("2017-08-01 00:05:54", format = "%Y-%m-%d %H:%M:%S", tz = "UTC"), DateType = "Short"))
create point intervals so can use foverlaps
Obs[, SightingTime2 := SightingTime]
get ready for foverlaps
setkey(Obs, SightingTime, SightingTime2)
setkey(shifts, On, Off)
do the overlap join
Obs.ov <- foverlaps(shifts, Obs ,type="any",nomatch=0L)
This results in Obs.ov having a single row - the overlaps with the "Short" date format. Rows with the "Long" date format don't get included in the overlap. I would have expected that three rows would overlap (assuming that the fractional seconds would be rounded off, I would expect overlaps with the 00:05:54 and 00:21:53 "Long" timestamps as well.
I think this might be due to the fractional seconds in the dates I converted from Matlab, but I don't know how to get rid of the fractional bit. I did try using
attributes(Obs$SightingTime)$format <- "%Y-%m-%d %H:%M:%OS"
as well as including the "format" argument when the SightingTime variable was created from the "timestamp" variable early on. But have had no luck with either.
I did look here How to format fractional seconds in POSIXct in r, but can't quite figure out what change I need to make based on this.
I found what I needed here Remove seconds from time in R
I just needed to round off the seconds after creating the SightingTime variable, but before creating the "SightingTime2" variable.
Obs$SightingTime <- as.POSIXct(round(Obs$SightingTime, units="secs"))
Now when I do the overlaps, I get the 3 overlapping rows as expected.
Related
I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000
Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days
I want to create date-time index for my xts time series.
I am starting with a date in this format:
I cant share with you my entire dataset but basically what I am doing is:
data$date <- strptime(data$date,
format = "%Y%m%d %H:%M:%OS",
tz = "GMT")
and later
EURGBP.xts <- xts(EURGBP[,-1],
EURGBP$date,
tzone = "GMT")
When I do this in my date time index I have something like this (Index on the left, column I am modifying to create index is on the right:
Why do I have this strange X at the beginning? What may be the reason for that? Any ideas? What am I doing wrong?
EDIT:
My initial date format looks like this:
date
1 20170922 00:00:00.067
2 20170922 00:00:04.582
I have several years of data that I'm trying to work into a zoo object (.csv at Dropbox). I'm given an error once the data is coerced into a zoo object. I cannot find any duplicated in the index.
df <- read.csv(choose.files(default = "", caption = "Select data source", multi = FALSE), na.strings="*")
df <- read.zoo(df, format = "%Y/%m/%d %H:%M", regular = TRUE, row.names = FALSE, col.names = TRUE, index.column = 1)
Warning message:
In zoo(rval3, ix) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I've tried:
sum(duplicated(df$NST_DATI))
But the result is 0.
Thanks for your help!
You are using read.zoo(...) incorrectly. According to the documentation:
To process the index, read.zoo calls FUN with the index as the first
argument. If FUN is not specified then if there are multiple index
columns they are pasted together with a space between each. Using the
index column or pasted index column: 1. If tz is specified then the
index column is converted to POSIXct. 2. If format is specified then
the index column is converted to Date. 3. Otherwise, a heuristic
attempts to decide among "numeric", "Date" and "POSIXct". If format
and/or tz is specified then they are passed to the conversion function
as well.
You are specifying format=... so read.zoo(...) converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.
Simplistically, the correct solution is to use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") :
# index has bad entries at data rows: 507 9243 18147 26883 35619 44355
but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")
the data imports correctly.
EDIT:
As #G.Grothendieck points out, this would also work, and is simpler:
df <- read.zoo(df, tz="UTC")
You should set tz to whatever timezome is appropriate for the dataset.
Is there a way I can change the default format for how POSIXct labels appear when using plot and when they are part of a dataframe (Date HH:MM instead of just HH:MM)?
I would be nice if I could do this without having to issue an axis command each time or converting the dataframe to an xts object.
Answer goes to Vincent Zoonekynd.
You can use format argument in plot function to output the data in "%Y-%m-%d %H:%M" format.
Please see the code below:
df <- data.frame(
ms = c(10485849612, 10477641600, 10561104000, 10562745600),
value = 1:4
)
df$posix_time <- as.POSIXct(df$ms, origin = "1582-10-14", tz = "GMT")
plot(df$posix_time, df$value, format = "%Y-%m-%d %H:%M")
Output: