Group by month Google Analytics / BigQuery - google-analytics

I am pretty new to BigQuery and have a question about grouping the Date using Google Analytics data (StandardSQL). The data is currently on daily level, how can I aggregate this to Year/Month level?
Desired outcome: Data on year/month level + selection of only the last 12 months.
#StandardSQL
SELECT
TIMESTAMP(PARSE_DATE('%Y%m%d',date)) as Date,
SUM(totals.visits) AS Visits,
totals.timeOnSite AS TimeOnSite,
totals.newVisits AS NewVisit
FROM
`XXXX.ga_sessions_20*`
WHERE
_TABLE_SUFFIX >= '180215'
GROUP by
Date,
TimeOnSite,
NewVisit
Thanks in advance!

As you limit the data selection to the previous year and if you have a field in your database that registers the date of the visit, you can get your aggregated results per month using this query:
#StandardSQL
SELECT
EXTRACT(MONTH FROM 'date_field_of_the_visit') AS Month,
sum(totals.visits) AS Visits
FROM
'XXXX.ga_sessions_20*'
WHERE
_TABLE_SUFFIX >= '170312'
Group by Month

You can use DATE_TRUNC function (https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#date_trunc) for that:
#StandardSQL
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d',date), MONTH) as MonthStart,
SUM(totals.visits) AS Visits,
totals.timeOnSite AS TimeOnSite,
totals.newVisits AS NewVisit
FROM
`XXXX.ga_sessions_20*`
WHERE
_TABLE_SUFFIX >= '180215'
GROUP by
Date,
TimeOnSite,
NewVisit

Related

How to add time to the output of this max/min select?

I have a simple sqlite3 database for recording temperatures,the database schema is trivially simple:-
CREATE TABLE temperatures (DateTime, Temperature);
To output maximum and minimum temperatures over 1 month I have the following query:-
SELECT datetime, max(temperature), min(temperature) from temperatures
WHERE datetime(DateTime) > datetime('now', '-1 month')
GROUP BY strftime('%d-%m', DateTime)
ORDER BY DateTime;
How can I get the times for maxima and minima as well? Does it need a sub-query or something like that?
Use window functions MIN(), MAX() and FIRST_VALUE() instead of aggregation:
SELECT DISTINCT date(DateTime) date,
MAX(temperature) OVER (PARTITION BY date(DateTime)) max_temperature,
FIRST_VALUE(time(datetime)) OVER (PARTITION BY date(DateTime) ORDER BY temperature DESC) time_of_max_temperature,
MIN(temperature) OVER (PARTITION BY date(DateTime)) min_temperature,
FIRST_VALUE(time(datetime)) OVER (PARTITION BY date(DateTime) ORDER BY temperature) time_of_min_temperature
FROM temperatures
WHERE datetime(DateTime) > datetime('now', '-1 month')
ORDER BY date;
If your DateTime column contains values in the ISO format YYYY-MM-DD hh:mm:ss there is no need for datetime(DateTime).
You can use directly DateTime.

How to get sessions by day, by userid (as custom dimension)?

I can get sessions by user ID, but I'm having problems to select also date. So my end result would have 3 cols: date, userid, sessions (where sessions is the total number of sessions for that day).
Sessions by user id, missing the date column:
SELECT
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) AS userId,
sum(totals.visits) AS sessions,
FROM
`ga-360-tvgo.76246634.ga_sessions_*`
WHERE _table_suffix BETWEEN "20200701" AND "20200723"
GROUP BY
1
Expected result:
I think you can simply add it to your groups
SELECT
date,
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) AS userId,
sum(totals.visits) AS sessions,
FROM
`ga-360-tvgo.76246634.ga_sessions_*`
WHERE _table_suffix BETWEEN "20200701" AND "20200723"
GROUP BY
1,2

BigQuery data matching GA on daily basis but not over larger time frame

I am calculating Bounced sessions (sessions with only 1 pageview) via BQ.
Query is joining a table that gives me number of all sessions and a table that gives me bounced sessions.
When I run my query on just one specific date, my numbers match with the numbers in GA, but if I select bigger timeframe, for example a month, the numbers (only for Bounced sessions) are off.
Also, if I run each subquery separately, I get correct numbers for any timeframe.
Here is my query:
SELECT
A.date AS Date,
A.Landing_Content_Group AS Landing_Content_Group,
MAX(A.sessions) AS Sessions,
MAX(B.Bounced_Sessions) AS Bounced_Sessions
FROM (
SELECT
date,
hits.contentGroup.contentGroup2 AS Landing_Content_Group,
COUNT(DISTINCT CONCAT(CAST(visitStartTime AS string),fullVisitorId)) AS sessions
FROM
`122206032.ga_sessions_201808*`,
UNNEST(hits) AS hits
WHERE
hits.type="PAGE"
AND hits.isEntrance = TRUE
GROUP BY
date,
Landing_Content_Group
ORDER BY
date DESC,
sessions DESC ) A
LEFT JOIN (
SELECT
date,
hits.contentGroup.contentGroup2 AS Landing_Content_Group,
COUNT(DISTINCT CONCAT(CAST(visitStartTime AS string),fullVisitorId)) AS Bounced_Sessions
FROM
`122206032.ga_sessions_201808*`,
UNNEST(hits) AS hits
WHERE
hits.type="PAGE"
AND totals.pageviews = 1
AND hits.isEntrance = TRUE
GROUP BY
date,
Landing_Content_Group
ORDER BY
date DESC,
Bounced_Sessions DESC ) B
ON
a.Landing_Content_Group = b.Landing_Content_Group
GROUP BY
Date,
Landing_Content_Group
ORDER BY
Date DESC,
Sessions DESC
What I should get:
GA results
What I get in BQ for that date when a time frame is a month:
BQ results
I tried different JOINs and Aggregations but so far still in the unknown :)
Ok, I solved it, the solution was to also join the tables on the date.
ON
a.date = b.date
AND a.Landing_Content_Group = b.Landing_Content_Group

SQLite strftime Group By

I have a table consisting of a date field and a barcode field; I want the number of barcodes grouped by day for the previous month.
This looked like it would work:
SELECT
COUNT(*) AS count,
strftime('%d-%m-%Y',date) AS day
FROM barcodes
WHERE date >= datetime('now', '-1 month')
GROUP BY day
ORDER BY date ASC;
But that gives me incorrect counts. E.g.:
341|30-01-2017
274|31-01-2017
288|01-02-2017
332|02-02-2017
224|03-02-2017
35|04-02-2017
1009|06-02-2017
1481|07-02-2017
1626|08-02-2017
507|09-02-2017
428|10-02-2017
125|11-02-2017
1838|13-02-2017
2591|
Whereas:
SELECT COUNT(*) FROM barcodes WHERE date LIKE '2017-02-10%';
579
If I do this:
SELECT
COUNT(*) AS count,
strftime('%d-%m-%Y',date) AS day
FROM barcodes
WHERE date LIKE '2017-02-10%'
GROUP BY day
ORDER BY date ASC;
I get:
428|10-02-2017
151|
So my question is: why is SQLite providing the result as two lines when I use strftime()?
%d-%m-%Y is not one of the supported date formats, so comparisons do not work correctly, and any of the built-in date functions will return NULL.

BigQuery & Firebase - how to determine weekly growth

I'm trying to build a dashboard for following key metrics of my Android application. To do so, I am using Firebase analytics backed by BigQuery.
I'm trying to get weekly growth of first_open event count and ratio for
the current week
the previous week
the best week ever
I'm able to get the current week and previous week first_open event count as separate queries in BigQuery (such as the following as an example for the previous week query) :
SELECT
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL CAST ( FORMAT_DATE("%u", CURRENT_DATE()) as INT64 ) + 6 DAY)) AS previousMonday,
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL CAST ( FORMAT_DATE("%u", CURRENT_DATE()) as INT64 ) DAY)) AS previousSunday,
COUNT(*) as counter,
h.name as event
FROM `com_package_app_ANDROID.app_events_*`, UNNEST(event_dim) as h
WHERE _TABLE_SUFFIX
BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL CAST ( FORMAT_DATE("%u", CURRENT_DATE()) as INT64 ) + 6 DAY))
AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL CAST ( FORMAT_DATE("%u", CURRENT_DATE()) as INT64 ) DAY))
AND
h.name='first_open'
GROUP BY event
ORDER BY counter DESC
But I'm unable to get a ratio by combining the 2 queries (for current week vs previous week), and also, I'm unable to get the best ever week first_open count.
To get the "best week ever", you can group the events by week and order by count. Something like this...
SELECT
DATE_TRUNC(event_date, WEEK) AS week,
COUNT(*) AS count
FROM BQ_TABLE
WHERE BQ_TABLE.name = 'first_open'
GROUP BY week
ORDER BY count DESC
To get the current/previous week, you can turn the above query into a subquery and filter it by your target weeks.
Note that the "best week ever" query will always be a full table scan and could get expensive depending on the number of events and how frequently you need to perform the query.

Resources