Dates math does not result in same results in R vs. Redshift - r

I am having an issue in using the RPostgres package to connect with Redshift. I am unsure if it is an issue with our database set up, or if it is a known issue with the package.
I am getting different results when I use the same exact query in Redshift vs. using it in R with the RPostgres package.
It appears entirely due to the date math, as my overall row counts match, as well as everything else, when not using dates.
As an example, this may be a query I run in Redshift (using Metabase). If I run the same exact query in R with the RPostgres package, I will get completely different results.
SELECT
orders.*
FROM
orders
WHERE
orders.date >= current_date-3
AND
orders.date < current_date-2
The dates in Metabase make sense. It shows only one day, three days ago. However, in R, it shows 2 days.
For the purposes of this example, consider date a timestamp.
Has anyone ran into this, or know of an existing issue and work around?

It's usually better to be absolutely explicit with dates to make sure that session settings are not unintentionally affecting the query. Try this:
SELECT
orders.*
FROM
orders
WHERE
orders.date >= date_trunc('day', current_timestamp at time zone 'utc') - '3 days'::interval
AND
orders.date < date_trunc('day', current_timestamp at time zone 'utc') - '2 days'::interval
You may have to change the time zone from utc if your dates are implicitly stored in a different time zone.

Related

SQLite timestamp export

I have a SQLite 3 database on MacOS with timestamp data (typically something like 279020203.539467).
The documentation says that dates can be stored as
REAL - as Julian day numbers, the number of days since noon in Greenwich
on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
And it looks like this is what I've got.
I want to export this and import it into other databases.
I am assuming the timestamp datatype is not compatible between database engines, so some conversion has to happen somewhere, and I am working under the assumption that it'd be best to do that while exporting the data from the SQLite database.
But I can't figure out how to do this conversion.
I've looked at this answer which refers to this forum post which indicates that in SQLite you could do something like
select datetime('40660.9454658044', '+2415018 days', '+12 hours', 'localtime');
(Maybe 2415018 is the number of days between November 24, 4714 B.C. and some other magical date...)
However, replacing the timestamp string in this example with what I have results in null. Presumably because '279020203.539467' is some other kind of timestamp. It is also some magnitudes larger than the example.
But how to convert this to a usable date? I know it should be around 2011/2012.
Interpreting the data as an "integer" (seconds since 1970-01-01) gives 1978 so that is not correct either.
UPDATE: I've found that
288773834.371606 should be 2010-02-25 07.57
296636121.950064 should be 2010-05-27 08.55
(CET if that matters).
The good news is: To convert a Julian date to "regular" date format you could use datetime(strftime('%J',jtime)). FYI Here's the doc for sqlite date and time functions. But there's bad news.
A NASA Calculator computes the Julian date of 2010-02-25 to be 2455246. It computes the civil date of 288773834 as Sept 2, 785907 A.D. sqlite doesn't give that same result using the above notation, but it doesn't give a "date".
Even though the numbers look like Julian date notation, they are not any dates in our lifetimes.
DATEDIF("2010-02-25 07.57"; "2010-05-27 08.55"; "d")
This gives 91 days which works out to be 7862400 seconds, almost exactly the same as the difference between the two timestamps (296636121.950064 - 288773834.371606 = 7862287.578458).
On MacOS native timestamps are seconds from 2001-01-01 (Jan 1 2001).
And
DATEDIF("2001-01-01 00:00"; "2010-02-25 07.57"; "d")
gives 3342 days which is 288748800 seconds, near, but not exactly the timestamp but that difference is caused by
DATEDIF only caters for whole days
CET is one hour off
Correcting for that we get around 25000 seconds more to add, which takes us almost exactly to the timestamp for 2010-02-25 07.57.
So the gist of this is that SQLite on MacOS stores timestamps as MacOS native timestamps which are seconds since start of MacOS epoch (2001-01-01 00.00). This is probably caused by the application that created the data was not using SQLite timestamps but MacOS native dates and storing them in the database as any other data.
Converting this to some other format should be trivial, either during the export, a conversion of the exported file or during imports.
Possibly it would be easiest to convert the date on export from the original database using
select ...
datetime(table.timestamp_field, 'unixepoch', '+31 years')
from ...

Why filter ( as.Date(opp_date) == Sys.Date() ) is bringing yesterday's data?

I love using dplyr; I use it for everything. But, the problem I'm experimenting today is the following:
I'm trying to simply filter all rows fromm my opps table where opp_date is from today. So, when I use filter(opps, as.Date(opp_date) == Sys.Date()) it's bringing today's data but also yesterday's too, from 19:00:00 onwards.
To clarify any possible problem:
opp_date field is POSIXct class
Sys.Date() returns correctly my current date and time (just to check, Sys.time() brings the correct time and date: "2017-07-21 10:06:04 COT")
Any idea here? Thanks to the community for all the great inputs :)
The issue must be due to different time-zones; by default, R uses your system's local time-zone.
Try to explicitly set the environment variable as follows:
Sys.setenv(TZ='UTC')
Cannot add a comment due to low repputation, so posting this as an answer.
As I got to know recently, R dates are a formatting nightmare, especially through the base functions. Checkout lubridate package. You may want to convert your date column using dmy_hms function. It's easy and vectors are supported by default. Try it and let me know if the problem persists.
And please always try to provide sample data. Otherwise people cannot reproduce your problems.

UPDATE an SQLite datetime field with fractional seconds

I'm synchronizing historical data between two systems and I've found a small clock problem between their logs.
I've loaded the data into an SqlLite and need one of the sets by a small amount (~40 milliseconds). However, I'm unable to do so as it seems to always round the time to the nearest second.
For example, attempting something like the following
UPDATE my_table SET my_datetime = DATETIME(my_datetime, '+0.04 seconds') rounds up to the nearest second and I can't find any fractional/millisecond modifier option.
Is there a way to do this that I'm overlooking?
Thanks.
SQLite hasn't a type for datetime. See http://sqlite.org/datatype3.html
Using datetime(...) you are storing your dates as strings. This is equivalent to strftime('%Y-%m-%d %H:%M:%S', ...).
One option is to use a strftime with fractions of seconds:
UPDATE my_table SET my_datetime = STRFTIME('%Y-%m-%d %H:%M:%f',my_datetime, '+0.04 seconds')

Using Dates in SQLite

I have a TEXT column called "time" in a table meal and in a table pain which is TEXT formatted as YYYY-MM-DDTHH:MM. I'm trying to search for other times that are within 12 hours of a given time, although I can't figure out how to do that.
I've tried testing
WHERE pain.time < meal.time + "1:00" AND pain.time > meal.time
but this approach alters the year instead of the hour. I also tried testing the same query adding "0000-00-00T01:00", but it doesn't seem to do anything.
I'm not sure what else to test.
SQLite has no built-in date/time data type, so you have to use either numbers or strings and handle them correctly.
To do calculations, you have to either do them directly on the numerical value (which might require conversions into a number and back), or use the modifiers of the built-in date/time functions:
... WHERE meal.time BETWEEN datetime(pain.time, '-12 hours') AND pain.time

Alternative to sqlite OR a better way to handle date / time fields in sqlite

My data tends to be medium to large but never qualifies as "BIG" data. The data is almost always complexly relational. For the purposes I'm talking about here, 10-50 tables with a total size of 1-10 GB. Nothing more. When I deal with data bigger than this, I'll stick it into Postgres or SQL Server.
Overall, I like SQLite, but the data I work with has lots and lots of date / datetime fields and dealing with date fields in SQLite makes my head hurt and when I move data back and forth between R and SQLite, my dates often get mangled.
I am either looking for a file-based alternative to SQLite that is easy to work with from R.
OR
Better techniques/packages for moving data in/out of SQLite and R without mangling the dates. My goal is to stop mangling my dates. For example, when I use dbWriteTable from the RSQLite package my dates are usually messed up in a way that makes them impossible to work with.
My primary workstation is running Ubuntu but I work in an office dominated by Windows. If suggesting an alternative to SQLite, +++ for an alternative that works on both platforms (or more).
Use epoch times and dates (days from origin, seconds from origin). The conversion using epochs into R POSIXct or Date is fast (strings are very slow).
Edit: Another alternative, after re-reading and considering the size of your data:
You could simply save the tables directly in R format, perhaps with a small piece of extra metadata describing the key relationships between tables. You would have to create your own conventions and all, but it's definitely smoother (no impedance mismatches).
Also, I'm personally very partial to the package data.table. It's fast and has a syntax which is pure R but has a nice mapping onto SQL concepts. E.g. in dt[i, j, by=list(...)], i corresponds to "where", j correspond to "select", and by to "group by" and there are facilities for joins as well, although I wrote infix wrappers around those so it was easier to remember.
I typically do my data processing work exclusively in R (after an initial pull from SQLITE), and I find data.table more faster and practical than massive SQLDF queries.
http://datatable.r-forge.r-project.org/
sqlite wants to read the data in the standard format "YYYY-MM-DD HH:MM:SS" (you can omit the time part if you don't need it)---I don't know of a way to read arbitrary date strings. This results in a normalized date being stored.
On output, you want to format the date using sqlite functions to whatever your other software needs---check the options of strftime().
For instance, Octave likes the day number since year 0, so if I have a table mydata with column "date", I'd do
select julianday(mydate)-1721059.666667 from mydata
The magic number is julianday("0000-01-01T00:00:00-04:00") and compensates for the fact that julianday starts in year 4017BC or something like that, whereas Octave counts from year 0.

Resources