difftime for multiple dates in r - r

I have chemistry water data taken from a river. Normally, the sample dates were on a Wednesday every two weeks. The data record starts in 1987 and ends in 2013.
Now, I want to re-check if there are any inconsistencies within the data, that is if the samples are really taken every 14 days. For that task I want to use the r function difftime. But I have no idea on how to do that for multiple dates.
Here is some data:
Date Value
1987-04-16 12:00:00 1,5
1987-04-30 12:00:00 1,2
1987-06-25 12:00:00 1,7
1987-07-14 12:00:00 1,3
Can you tell me on how to use the function difftime properly in that case or any other function that does the job. The result should be the number of days between the samplings and/or a true and false for the 14 days.
Thanks to you guys in advance. Any google-fu was to no avail!

Assuming your data.frame is named dd, you'll want to verify that the Date column is being treated as a date. Most times R will read them as a character which gets converted to a factor in a data.frame. If class(df$Date) is "character" or "factor", run
dd$Date<-as.POSIXct(as.character(dd$Date), format="%Y-%m-%d %H:%M:%S")
Then you can so a simple diff() to get the time difference in days
diff(dd$Date)
# Time differences in days
# [1] 14 56 19
# attr(,"tzone")
# [1] ""
so you can check which ones are over 14 days.

Related

Getting the same day across different years in R

I have a dataset for a time series spanning a couple of years with daily observations. I'm trying to smooth some clearly wrong data inserted there (for example, negative values when the variable cannot take values below zero) and what I came up with was trying to smooth it or "interpolate" it by using both the mean of the days around that observation and the mean of the same day or couple of days from previous years, as I have yearly seasonality (I'm still unsure about this part, any comment would be greatly appreciated).
So my question is whether I can easily access the same day acrosss different years.
Here's a dummy example of my data:
library(tidyverse)
library(lubridate)
date value
2016-10-01 00:00:00 28
2016-10-02 00:00:00 25
2016-10-03 00:00:00 24
2016-10-04 00:00:00 22
2016-10-05 00:00:00 -6
2016-10-06 00:00:00 26
I have that for years 2016 through 2020. So in this example I would use the dates around 2016-10-05 AND I would like to use the dates around the 5th of October from years 2017 to 2020 to kind of maintain the seasonality, but maybe this is incorrect.
I tried to use +years() from lubridate but I still have to do things manually and I would like to kind of autimatize things.
If your question is solely "whether [you] can easily access the same day [across] different years", you could do that as follows:
# say your data frame is called df
library(lubridate)
day(df$date)
This will return the day part of the date for every entry in that column of your data frame.
Edit to reply to comment from asker:
This is a very basic way to specify the day and month for which you would like to obtain the corresponding rows in your data frame:
df[day(df$dates) == 5 & month(df$dates) == 10, ]

Extract time from factor column in R

I would like to extract the time from a table column sd_data$start in R with the following characteristics:
str(sd_data$start)
Factor w/ 122 levels "01/03/2017 08:00",..: 1 2 5 10 12 14 18 19 20 21 ...
I found similar questions on the forum but so far all the answers have only given me NAs or blank values (00:00:00) so I see no other option than raise the question again specifically for my dataset.
I have managed to extract the dates and move them to a new column in the table with little effort and I am very surprised how difficult it is (for me at least) to do the same for hours, minutes and seconds. I must be overlooking something.
sd_data$start_date <- as.Date(sd_data$start,format='%d/%m/%Y')
sd_data$start_time <-
Thanks in advance for helping me to find the right lines of code to complete this task.
Here an example of what I am trying to do and where I am failing to get the time out.
smpldata <- "01/03/2017 08:00"
smpltime <-as.Date(as.character(smpldata),format='%d/%m/%Y %M:%S')
smpltime
# [1] 08:00 = what I would like to see
# [1] "2017-03-01" = what I am seeing
Maybe using as.character() to convert to character before convert to date, because the factor type is not well transformed. And including the other string elements on the date format as suggested above by Sotos.
sd_data$start_date <-
as.Date(as.character(sd_data$start),
format='%d/%m/%Y %H:%M:%S')
Another tip is to take a look at lubridate package. It's very usefull for this kind of task.
library(lubridate)
smpldata <- as.factor("01/03/2017 08:00")
(smpltime <-dmy_hm(as.character(smpldata)))
[1] "2017-03-01 08:00:00 UTC"
Here you still see the date. You can handle just the time for plots and other needs using hour() and minute().
hour(smpltime)
[1] 8
minute(smpltime)
[1] 0
Or you can use the format() function to get exactly what you want.
format(smpltime, "%H:%M:%S")
[1] "08:00:00"
format(smpltime, "%H:%M")
[1] "08:00"

R: create index for xts time object from calendar week , e.g. 201501 ... 201553

I know how to get the week from an index, but don't know the other way around: how to create an index if I have the calendar weeks (in this case, from an SAP system with 0CALWEEK as 201501, 201502 ... 201552, 201553.
Found this:
How to Parse Year + Week Number in R?
but the day is needed and it's not clear how to set it, especially at the end of the year (Year - week - day: YEAR-53-01 does not always exist, since the first day of week 53 might be Monday, then 01 (Sunday) is not in that week.
I could try to get in the source system the first day of the corresponding week (through SQL) but thought R might do it easier...
Do you have any suggestions?
(Which first day of the week would be not important , since I will create all objects the same way and then merge/cbind them, then continue the analysis. If zoo is easier, I'll go with it)
Thanks!
The problem is that all indices end in 2015-07-29:
data <- 1:4
weeks <- c('201501','201502','201552','201553')
weeks_2 <- as.Date(weeks,format='%Y%w')
xts(data, order.by = weeks_2)
[,1]
2015-07-29 1
2015-07-29 2
2015-07-29 3
2015-07-29 4
test <- xts(data, order.by = weeks_2)
index(test)
[1] "2015-07-29" "2015-07-29" "2015-07-29" "2015-07-29"
You can use as.Date() function, I think is the easiest way:
weeks <- c('201501','201502','201552','201553')
as.Date(paste0(weeks,'1'),format='%Y%W%w') # paste a dummy day
## [1] "2015-01-05" "2015-01-12" "2015-12-28" NA
Where:
%W: Week 00-53 with Monday as first day of the week
or
%U: Week 01-53 with Sunday as first day of the week
%w: Weekday 0-6 Sunday is 0
For this year, week number 53 doesn't exist. And If you want to start with 2015-01-01, just set the right week day:
weeks <- c('201500','201501','201502','201551','201552')
as.Date(paste0(weeks,'4'),format='%Y%W%w')
## [1] "2015-01-01" "2015-01-08" "2015-01-15" "2015-12-24" "2015-12-31"
You may try with substr() and lubridate
library(lubridate)
# a number from your list: 201502
# set the year
x <- ymd("2015-01-1")
# retrieve second week
week(x) <- 2
x
[1] "2015-01-08"
you can use the result for your Index or rownames().
zoo and xts are great for time series once you have set the names,
be sure to remove any column with dates from your data frame

how to iterate based on a condition, and assign aggregated value to a row in new dataframe in R

I have a large dataset of stock prices with 203615 rows and 2 columns(price and Timestamp). in below format
price(USD) | Timestamp
3.5 | 2014-01-01 20:00:00
2 | 2014-01-01 20:15:00
5 | 2014-01-01 20:15:00
----
4 | 2014-01-31 23:00:00
5 | 2014-01-31 23:00:00
4.5 | 2014-01-31 23:00:00
203615 2.3 | 2014-01-31 23:00:00
Time stamp varies from "2014-01-01 20:00:00" to "2014-01-31 23:00:00" with intervals of 15min(rounded to 15min). i have several transactions on same timestamp.
I have to group rows based on timestamp with difference of one day, and caluclate min,max and mean of the price and no of rows within the timestamp limits and assign them to a row in a new dataframe for every iteration until it reaches the end timestamp("2014-01-31 23:00:00") from starting date('2014-01-02 20:00:00")
note: iteration has to be done for every 15min
i have tried while loop. please help me with this and suggest me if i can use any packages
This is my own code which I used as a way of creating a window of time (the prior 24 hours) to iterate over and create min and max values for a project I am working on...
inter is the inteval I worked on in the loop
raw is the data frame name
i is the specific row from which the datetime column was selected from raw
I started my intervals at 97th row ( (i in 97:nrow(raw) ) because the stamps were taken at 15 minute intervals and I wanted a 24 hour backward window, so I needed to leave 96 intervals to pull from...I could not reach back into time I had no data for...so I started far enough into my data to leave room for those intervals.
for (i in 97:nrow(raw)){
inter=raw$datetime[i] - as.difftime(24, unit='hours')
raw$deltaAirTemp_24[i] <-max(temp$Air.Temperature)- min(temp$Air.Temperature)
}
The key is getting into a real date time format. Run str() on the field with the dates, if the come back as anything but Factor, use:
as.POSIXct(yourdate$field, %Y-%m-%d %H:%M:%S)
If they come back from str(yourdatecolumn here) as FACTOR then wrap it in as.POSIXct(as.character(yourdate$field), %Y-%m-%d %H:%M:%S) to be sure it does not coerce the date into a Level number then time..
Get them into a consistent date format, then construct something like above to extract the periods you need. difftime is in the base package and works well you can use positive and negative intervals with it. I hope his helps!

How do i extract a specific, recurring time from 1 minute tick data in R?

For instance, let's say I want to extract the price at 09:04:00 everyday from a timeseries that is formatted as:
DateTime | Price
2011-04-09 09:01:00 | 100.00
2011-04-09 09:02:00 | 100.10
2011-04-09 09:03:00 | 100.13
(NB: there is no | in the actual data, i've just included it here to illustrate that the DateTime is the index and Price is the coredata and that the two are distinct within the xts object)
and put those extracted values into an xts vector...what is the most efficient way to do this?
Also, if i have a five year time series of a cross-border spread, where - due to time differences - the spread opens at different times during the year (say 9am during winter, and 10am during summer) how can I get R to take account of those time differences and recognise either 9am-16:30 or 10am-16:30 as the same "day" interval.
In other words, I want to convert an intraday, 1m tick data file to daily OHLC data. Normally would just use xts and to.period to do this, but - given the time difference noted above - gives odd / strange day start/end times due
Any advice greatly appreciated!
You can use the "T" prefix with xts subsetting to specify a time interval for each day. You must specify an interval; a single time will not work.
set.seed(21)
x <- xts(cumprod(1+rnorm(2*60*24)/100),
as.POSIXct("2011-04-09 09:01:00")+60*(1:(2*60*24)))
x["T09:01:59/T09:02:01"]
# [,1]
# 2011-04-09 09:02:00 0.9980737
# 2011-04-10 09:02:00 1.0778835

Resources