R: Date function in sqldf giving unusual answer (wrong date format?) - r

I am trying to add to a date using sqldf, i know it should be simple but I can't figure out what is wrong with my date format. Using:
sqldf("select date(model_date, '+1 day') from lapse_test")
give's answers like '-4666-01-23'
The model_date's are in the date format and look like 2015-01-01
I previously made them from a character string ('12/1/2015') using
lapse_test$model_date <- as.Date(lapse_test$date1,format = "%m/%d/%Y") or
lapse_test$model_date <- as.POSIXCT(lapse_test$date1,format = "%m/%d/%Y")
I'm guessing this is the problem? Any ideas?

Passing a character variable to the date() function seems to work:
df <- data.frame(a=as.Date("2010-10-01"))
df$b <- as.character(df$a)
sqldf("select date(a) from df")
# date(a)
# 1 -4672-08-24
sqldf("select date(b) from df")
# date(b)
# 1 2010-10-01
sqldf("select date(b, '+1 day') from df")
# date(b, '+1 day')
# 1 2010-10-02
Note that you can do (some) arithmetic on Date objects in R directly, without needing SQL:
df$a <- df$a + 1
df
# a b
# 1 2010-10-02 2010-10-01

SQLite date functions consider dates as days since Nov 24, 4714BC, which means the integer storage of 16770 for the example date of 2015-12-01 in R returns an ancient date somewhere in 4667BC.
You can figure out that the difference between the R origin of 1970-01-01 and the SQLite origin is 2440588 days. Which means, you can take this constant into account if you want:
test <- data.frame(model_date=as.Date("12/1/2015",format="%m/%d/%Y"))
sqldf("select date(model_date + 2440588, '+1 day') as select_date from test")
# select_date
#1 2015-12-02
#HongOoi's answer is probably better, but I thought this might be interesting to know the underlying workings.

Related

How to split a data frame in R based on date when multiple rows have identical date stamp [duplicate]

I am working with daily returns from a Brazilian Index (IBOV) since 1993, I am trying to figure out the best way to subset for periods between 2 dates.
The data frame (IBOV_RET) is as follows :
head(IBOV_RET)
DATE 1D_RETURN
1 1993-04-28 -0.008163265
2 1993-04-29 -0.024691358
3 1993-04-30 0.016877637
4 1993-05-03 0.000000000
5 1993-05-04 0.033195021
6 1993-05-05 -0.012048193
...
I set 2 variables DATE1 and DATE2 as dates
DATE1 <- as.Date("2014-04-01")
DATE2 <- as.Date("2014-05-05")
I was able to create a new subset using this code:
TEST <- IBOV_RET[IBOV_RET$DATE >= DATE1 & IBOV_RET$DATE <= DATE2,]
It worked, but I was wondering if there is a better way to subset the data between 2 date, maybe using subset.
As already pointed out by #MrFlick, you dont get around the basic logic of subsetting. One way to make it easier for you to subset your specific data.frame would be to define a function that takes two inputs like DATE1 and DATE2 in your example and then returns the subset of IBOV_RET according to those subset parameters.
myfunc <- function(x,y){IBOV_RET[IBOV_RET$DATE >= x & IBOV_RET$DATE <= y,]}
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
Test <- myfunc(DATE1,DATE2)
#> Test
# DATE X1D_RETURN
#2 1993-04-29 -0.02469136
#3 1993-04-30 0.01687764
#4 1993-05-03 0.00000000
#5 1993-05-04 0.03319502
You can also enter the specific dates directly into myfunc:
myfunc(as.Date("1993-04-29"),as.Date("1993-05-04")) #will produce the same result
You can use the subset() function with the & operator:
subset(IBOV_RET, DATE1> XXXX-XX-XX & DATE2 < XXXX-XX-XX)
Updating for a more "tidyverse-oriented" approach:
IBOV_RET %>%
filter(DATE1 > XXXX-XX-XX, DATE2 < XXXX-XX-XX) #comma same as &
There is no real other way to extract date ranges. The logic is the same as extracting a range of numeric values as well, you just need to do the explicit Date conversion as you've done. You can make your subsetting shorter as you would with any other subsetting task with subset or with. You can break ranges into intervals with cut (there is a specific cut.Date overload). But base R does not have any way to specify Date literals so you cannot avoid the conversion. I can't imagine what other sort of syntax you may have had in mind.
What about:
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
# creating a data range with the start and end date:
dates <- seq(DATE1, DATE2, by="days")
IBOV_RET <- subset(IBOV_RET, DATE %in% dates)
I believe lubridate could help here;
daterange <- interval(DATE1, DATE2)
TEST <- IBOV_RET[which(Date %within% daterange),]
I sort of love dplyr package
So if you
>library("dplyr")
and then, as you did:
>Date1<-as.Date("2014-04-01")
>Date2<-as.Date("2014-05-05")
Finally
>test<-filter(IBOV_RET, filter(DATE>Date1 & DATE<Date2))
You can use R's between() function after simply converting the strings to dates:
df %>%
filter(between(date_column, as.Date("string-date-lower-bound"), as.Date("string-date-upper-bound")))
Test = IBOV_RET[IBOV_RET$Date => "2014-04-01" | IBOV_RET$Date <= "1993-05-04"]
Here I am using "or" function | where data should be greater than particular data or data should be less than or equal to this date.

add_months function in Spark R

I have a variable of the form "2020-09-01". I need to increase and decrease this by 3 months and 5 months and store it in other variables. I need a syntax in Spark R.Thanks. Any other method will also work.Thanks, Again
In R following code works fine
y <- as.Date(load_date,"%Y-%m-%d") %m+% months(i)
The code below didn't work. Error says
unable to find an inherited method for function ‘add_months’ for signature ‘"Date", "numeric"
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- paste(year,month,"01",sep = "-")
y <- as.Date(load_date,"%Y%m%d")
y1 <- add_months(y,-3)
Expected Result - 2020-06-01
The lubridate package makes dealing with dates much easier. Here I have shuffled as.Date up a step, then simply subtract 3 months.
library(lubridate)
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- as.Date(paste(year,month,"01",sep = "-"))
new_date <- load_date - months(3)
new_date Output:
Date[1:1], format: "2020-06-01"

Subset between two dates [duplicate]

I am working with daily returns from a Brazilian Index (IBOV) since 1993, I am trying to figure out the best way to subset for periods between 2 dates.
The data frame (IBOV_RET) is as follows :
head(IBOV_RET)
DATE 1D_RETURN
1 1993-04-28 -0.008163265
2 1993-04-29 -0.024691358
3 1993-04-30 0.016877637
4 1993-05-03 0.000000000
5 1993-05-04 0.033195021
6 1993-05-05 -0.012048193
...
I set 2 variables DATE1 and DATE2 as dates
DATE1 <- as.Date("2014-04-01")
DATE2 <- as.Date("2014-05-05")
I was able to create a new subset using this code:
TEST <- IBOV_RET[IBOV_RET$DATE >= DATE1 & IBOV_RET$DATE <= DATE2,]
It worked, but I was wondering if there is a better way to subset the data between 2 date, maybe using subset.
As already pointed out by #MrFlick, you dont get around the basic logic of subsetting. One way to make it easier for you to subset your specific data.frame would be to define a function that takes two inputs like DATE1 and DATE2 in your example and then returns the subset of IBOV_RET according to those subset parameters.
myfunc <- function(x,y){IBOV_RET[IBOV_RET$DATE >= x & IBOV_RET$DATE <= y,]}
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
Test <- myfunc(DATE1,DATE2)
#> Test
# DATE X1D_RETURN
#2 1993-04-29 -0.02469136
#3 1993-04-30 0.01687764
#4 1993-05-03 0.00000000
#5 1993-05-04 0.03319502
You can also enter the specific dates directly into myfunc:
myfunc(as.Date("1993-04-29"),as.Date("1993-05-04")) #will produce the same result
You can use the subset() function with the & operator:
subset(IBOV_RET, DATE1> XXXX-XX-XX & DATE2 < XXXX-XX-XX)
Updating for a more "tidyverse-oriented" approach:
IBOV_RET %>%
filter(DATE1 > XXXX-XX-XX, DATE2 < XXXX-XX-XX) #comma same as &
There is no real other way to extract date ranges. The logic is the same as extracting a range of numeric values as well, you just need to do the explicit Date conversion as you've done. You can make your subsetting shorter as you would with any other subsetting task with subset or with. You can break ranges into intervals with cut (there is a specific cut.Date overload). But base R does not have any way to specify Date literals so you cannot avoid the conversion. I can't imagine what other sort of syntax you may have had in mind.
What about:
DATE1 <- as.Date("1993-04-29")
DATE2 <- as.Date("1993-05-04")
# creating a data range with the start and end date:
dates <- seq(DATE1, DATE2, by="days")
IBOV_RET <- subset(IBOV_RET, DATE %in% dates)
I believe lubridate could help here;
daterange <- interval(DATE1, DATE2)
TEST <- IBOV_RET[which(Date %within% daterange),]
I sort of love dplyr package
So if you
>library("dplyr")
and then, as you did:
>Date1<-as.Date("2014-04-01")
>Date2<-as.Date("2014-05-05")
Finally
>test<-filter(IBOV_RET, filter(DATE>Date1 & DATE<Date2))
You can use R's between() function after simply converting the strings to dates:
df %>%
filter(between(date_column, as.Date("string-date-lower-bound"), as.Date("string-date-upper-bound")))
Test = IBOV_RET[IBOV_RET$Date => "2014-04-01" | IBOV_RET$Date <= "1993-05-04"]
Here I am using "or" function | where data should be greater than particular data or data should be less than or equal to this date.

How can I keep a date formatted in R using sqldf?

How do I rename a date field in SQLDF without changing the format?
See my example below where my renamed date field "dt" converts the date to a number. How do I avoid this, or convert it back to a date?
#Question for Stack Exchange
df <- data.frame (date = c("2014-12-01","2014-12-02","2014-12-03"),
acct = c(1,2,3))
df$date = as.Date(df$date)
library("sqldf")
sqldf('
select
date as dt,
date,
acct
from df ')
dt date acct
1 16405 2014-12-01 1
2 16406 2014-12-02 2
3 16407 2014-12-03 3
Specify the method as follows:
sqldf('select date as dt__Date,
date as date__Date,
acct
from df',
method = "name__class")

Number of overlapping intervals over time

Let's say I have a set of, partly overlapping, intervals
require(lubridate)
date1 <- as.POSIXct("2000-03-08 01:59:59")
date2 <- as.POSIXct("2001-02-29 12:00:00")
date3 <- as.POSIXct("1999-03-08 01:59:59")
date4 <- as.POSIXct("2002-02-29 12:00:00")
date5 <- as.POSIXct("2000-03-08 01:59:59")
date6 <- as.POSIXct("2004-02-29 12:00:00")
int1 <- new_interval(date1, date2)
int2 <- new_interval(date3, date4)
int3 <- new_interval(date5, date6)
Does anyone have an idea how one could construct a time series plot that provides, for every point in time, the number of overlapping intervals at that point?
So, for instance, to take the above example: For a given date in January 2000, the function I'm looking for would return the value "1" (the date is only within int2) while for a date in January 2001, it would return "3" (since that date is within int1, int2 and int3). Etc.
Any ideas?
Here's one way using foverlaps() function using data.table package:
Please install the development version 1.9.5 by following the installation instructions as a bug that affects overlap joins on numeric types has been fixed there.
require(data.table) ## 1.9.5+
intervals = data.table(start = c(date1, date3, date5),
end = c(date2, date4, date6))
# assuming your query is:
query = as.POSIXct(c("2000-01-01 00:00:00", "2001-01-01 00:00:00"))
We'll construct the query data.table with both start and end intervals as well:
querydt = data.table(start=query, end=query) # identical start,end
Then we can use foverlaps() as follows:
setkeyv(intervals, c("start", "end"))
ans = foverlaps(querydt, intervals, which=TRUE, nomatch=0L, type="within")
# xid yid
# 1: 1 1
# 2: 2 1
# 3: 2 2
# 4: 2 3
We first set key - which sorts the data.table intervals by the columns provided, in increasing order, and marks those columns as the key columns on which we want to perform the overlap join.
Then we use foverlaps() to find which intervals in querydt overlaps (falls type=within) with intervals. In this case, querydt consists of just points as start and end points are identical. This returns all matching indices (nomatch=0L removes all rows with no matches and which=TRUE returns indices instead of merged result) for those rows in querydt that falls within intervals.
Now all we have to do is to aggregate by xid and count the number of observations to get the count:
ans[, .N, by=xid]
# xid N
# 1: 1 1
# 2: 2 3
Check ?foverlaps for more info.

Resources