BigQuery get difference between event_timestamp from firebase analytics and current timestamp - firebase

I'm using bigquery integrated with Firebase Analytics and I'm trying to query the difference between an event_timestamp and current timestamp in hours. I'm doung something like this:
SELECT TIMESTAMP_DIFF(event_timestamp, CURRENT_TIMESTAMP(), HOUR)
FROM my-firebase-analytics-table
WHERE event_name = 'session_start'
With this query I'm getting an error in TIMESTAMP_DIFF(event_timestamp, CURRENT_TIMESTAMP(), HOUR). The error is:
No matching signature for function TIMESTAMP_DIFF for argument types: INT64, TIMESTAMP, DATE_TIME_PART. Supported signature: TIMESTAMP_DIFF(TIMESTAMP, TIMESTAMP, DATE_TIME_PART)
For what I could get for the tests I made is that the event_timestamp field is not an TIMESTAMP field. Is their a way I can transform it into a TIMESTAMP?

Usually if it is a INT64 - it is presented in milliseconds from epoch
To convert to timestamp - use TIMESTAMP_MILLIS(int64_expression) as in below example
#standardSQL
SELECT TIMESTAMP_DIFF(TIMESTAMP_MILLIS(event_timestamp), CURRENT_TIMESTAMP(), HOUR)
FROM `project.dataset.my-firebase-analytics-table`
WHERE event_name = 'session_start'
Obviously, it can be INT64 for seconds or microseconds - so you will use respective counterparts in this case - TIMESTAMP_MICROS or TIMESTAMP_SECONDS

Related

Daily schedule in BigQuery using data from Firebase analytics

So I have created a daily schedule in BigQuery using "Append to table" preference, so every day it adds yesterday's data to my specified table. I have scheduled to run this query every day at 9AM, but the issue is that sometimes Firebase creates previous day data table in BigQuery later then 9AM.
The example of daily scheduled SELECT I would be using is:
SELECT * FROM `analytics.events_*` WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
What would be the best practice to schedule a daily update for the previous day in BigQuery from Firebase, so there are no times where I am missing days?
Bigquery Schedules are set to run at fixed times. If your incoming data is varying in delivery time then BigQuery Schedules are not what you're looking for.
But if you insist in using BigQuery Schedules, you could just relax the WHERE condition and catch "missing" days the next time the schedule runs. Then you flipped your problem and instead need to handle the case of not appending already appended rows (also increasing query cost):
SELECT *
FROM `analytics.events_*`
LEFT JOIN [target dataset].[target table] AS T
USING (event_name, event_timestamp, user_pseudo_id)
WHERE T.event_name IS NULL
AND T.event_timestamp IS NULL
AND T.user_pseudo_id IS NULL
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY))
Or you could alternatively modify the query into an INSERT statement where you insert records and handle duplications similarly:
INSERT `[target dataset].[target table]`
SELECT *
FROM `analytics.events_*`
LEFT JOIN `[target dataset].[target table]` AS T
USING (event_name, event_timestamp, user_pseudo_id)
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY))
AND T.event_name IS NULL
AND T.event_timestamp IS NULL
AND T.user_pseudo_id IS NULL
Then you wouldn't need to configure a destination table for the schedule.
Futhermore, if your target table is timestamp partitioned, you can reduce amount of data scanned by limiting the range in which you scan in the target table by adding an additional WHERE condition that strictly limits to a single date instead of the entire table:
...
AND DATE(T.event_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 2)
...

Google cloud Datastore GQL - Get current datetime

Is it possible to make a google datastore GQL query with a comparison on the current DateTime?
Example:
select * from MY_KIND where created_at < DATETIME.NOW
I've seen anything about that on any post or google documentation:
https://cloud.google.com/datastore/docs/reference/gql_reference
I need to make an operation on data older than 30 days.
You can use an alternative approach by trying to fetch entries after a particular date.
SELECT * FROM Entry WHERE date > DATETIME(yyyy,mm,dd)
The right-hand side of a comparison can be one of the following:
A datetime, date, or time literal, with either numeric values or a string representation, in the following forms:
DATETIME(year, month, day, hour, minute, second)
DATETIME('YYYY-MM-DD HH:MM:SS')
DATE(year, month, day)
DATE('YYYY-MM-DD')
TIME(hour, minute, second)
TIME('HH:MM:SS')
Hope this answers your question!!

Firebase vs BigQuery Active Users Discrepancies

I've integrated my Firebase project with BigQuery. Now I'm facing a data discrepancy issue while trying to get 1 day active users, for the selected date i.e. 20190210, with following query from BigQuery;
SELECT COUNT(DISTINCT user_pseudo_id) AS 1_day_active_users_count
FROM `MY_TABLE.events_*`
WHERE event_name = 'user_engagement' AND _TABLE_SUFFIX = '20190210'
But the figures returned from BigQuery doesn't match with the ones reported on Firebase Analytics Dashboard for the same date. Any clue what's possibly going wrong here?
The following sample query mentioned my Firebase Team, here https://support.google.com/firebase/answer/9037342?hl=en&ref_topic=7029512, is not so helpful as its taking into consideration the current time and getting users accordingly.
N-day active users
/**
* Builds an audience of N-Day Active Users.
*
* N-day active users = users who have logged at least one user_engagement
* event in the last N days.
*/
SELECT
COUNT(DISTINCT user_id) AS n_day_active_users_count
FROM
-- PLEASE REPLACE WITH YOUR TABLE NAME.
`YOUR_TABLE.events_*`
WHERE
event_name = 'user_engagement'
-- Pick events in the last N = 20 days.
AND event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 20 DAY))
-- PLEASE REPLACE WITH YOUR DESIRED DATE RANGE.
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131';
So given the small discrepancy here, I believe the issue is one of timezones.
When you're looking at a "day" in the Firebase Console, you're looking at the time interval from midnight to midnight in whatever time zone you've specified when you first set up your project. When you're looking at a "day" in BigQuery, you're looking at the time interval from midnight to midnight in UTC.
If you want to make sure you're looking at the events that match up with what's in your console, you should query the event_timestamp value in your BigQuery table (and remember that it might span multiple tables) to match up with what's in your timezone.

Is it possible to get google analytics event timestamp in bigquery?

Im trying to get a the event timestamp from big query (google analytics 360) but I cant seem to find the correct export field to get it.
I have tried hits.eventInfo.timestamp and hits.eventInfo.datetime but none of them seem to work.
My query is
SELECT
hits.eventInfo.timestamp as purchaseDate,
fullVisitorId as visitorId
FROM (tables)
WHERE LOWER(hits.eventInfo.eventAction) == 'purchase'
GROUP BY 2
ORDER BY 1 DESC
LegacySQL
DATE_ADD(TIMESTAMP(FORMAT_UTC_USEC(visitStartTime*1000000)), (hits.time/1000), "SECOND")
This will return YYYY-MM-DD HH:MM:SS in UTC.
Hope it helps.
There is no field for hit timestamp however you can calculate this from hits.time and visitStartTime. There is a row from Schema description:
hits.time INTEGER The number of milliseconds after the visitStartTime when this hit was registered. The first hit has a hits.time of 0

How can I store the current timestamp in SQLite as ticks?

I have a SQLite database where I store the dates as ticks. I am not using the default ISO8601 format. Let's say I have a table defined as follows:
CREATE TABLE TestDate (LastModifiedTime DATETIME)
Using SQL, I wish to insert the current date and time. If I execute any of the below statements, I end up getting the date and time stored as a string and not in ticks.
INSERT INTO TestDate (LastModifiedTime) VALUES(CURRENT_TIMESTAMP)
INSERT INTO TestDate (LastModifiedTime) VALUES(DateTime('now'))
I have looked at the SQLite documenation, but I do not seem to find any option to obtain the current timestamp in ticks.
I can of course define a parameter in C# and store the value as a System.DateTime. This does result in the datetime getting stored to the database in ticks.
What I would like to do is be able to insert and update the current timestamp directly from within the SQL statement. How would I do this?
Edit:
The reason I want the data stored as ticks in the database, is that the dates are stored in the same format as stored by the ADO.Net data provider, and so that when the data is also queried using the ADO.Net provider it is correctly retrieved as a System.DataTime .Net type.
This particular oddity of SQLite caused me much anguish.
Easy way - store and retrieve as regular timestamp
create table TestDate (
LastModifiedTime datetime
);
insert into TestDate (LastModifiedTime) values (datetime('now'));
select datetime(LastModifiedTime), strftime('%s.%f', LastModifiedTime) from TestDate;
Output: 2011-05-10 21:34:46|1305063286.46.000
Painful way - store and retrieve as a UNIX timestamp
You can use strftime to retrieve the value in ticks. Additionally, to store a UNIX timestamp (roughly equivalent to ticks), you can can surround the number of seconds in single-quotes.
insert into TestDate (LastModifiedTime) values ('1305061354');
SQLite will store this internally as some other value that is not a UNIX timestamp. On retrieval, you need to explicitly tell SQLite to retrieve it as a UNIX timestamp.
select datetime(LastModifiedTime, 'unixepoch') FROM TestDate;
To store the current date and time, use strftime('%s', 'now').
insert into TestDate (LastModifiedTime) VALUES (strftime('%s', 'now'));
Full example:
create table TestDate (
LastModifiedTime datetime
);
insert into TestDate (LastModifiedTime) values (strftime('%s', 'now'));
select datetime(LastModifiedTime, 'unixepoch') from TestDate;
When executed by sqlite3, this script with print:
2011-05-10 21:02:34 (or your current time)
After further study of the SQLite documentation and other information found on date number conversions, I have come up with the following formula, which appears to produce correct results:
INSERT INTO TestDate(LastModifiedTime)
VALUES(CAST((((JulianDay('now', 'localtime') - 2440587.5)*86400.0) + 62135596800) * 10000000 AS BIGINT))
Seems like a painful way to produce something that I would expect to be available as a built-in datetime format, especially that the database supports the storing of datetime values in ticks. Hopefully, this becomes useful for others too.
Update:
The above formula is not perfect when it comes to daylight savings. See section Caveats And Bugs in SQLite docs regarding local time calculation.
The following will return the number of milliseconds since the UNIX Epoch:
SELECT (strftime('%s', 'now') - strftime('%S', 'now') + strftime('%f', 'now')) * 1000 AS ticks
It works by grabbing the number of seconds since the Unix Epoch (%s), subtracting the number of seconds in the current time (%S), adding the number of seconds with decimal places (%f), and multiplying the result by 1000 to convert from seconds to milliseconds.
The subtraction and addition are to add precision to the value without skewing the result. As stated in the SQLite Documentation, all uses of 'now' within the same step will return the same value.

Resources