I am trying to plot DAU/MAU in google data studio but when I try to create a formula it always says invalid formula.
Additionally, if I try to insert a scorecard with DAU it always fails.
Please advice.
I am afraid you may be coming up against the fact that GA data does not present to GDS as a single data set with all columns in, but rather as different purpose subsets which can't all be joined together. What you could do (as recommended by my son who does this stuff on a daily basis) would be to use the GA plugin in Google Sheets, use formulas there to get the data as you want it, and then point GDS at your Google Sheet data.
To create a scorecard for daily or monthly actives:
Click the pencil next to the metric and make sure it's set to Sum
Set the default date range to Custom and set the interval to Yesterday.
You might also find this article interesting. It shows how to use Google Sheets to combine the DAU / MAU stats.
You can write the below query on the data studio when you link among firebase, bigquery, and data studio;
SELECT
(
SELECT count(distinct user_pseudo_id) as count
FROM `projectName.events_*` AS A
WHERE A._TABLE_SUFFIX = CONCAT( SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 0 , 4),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 6 , 2),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 9 , 2))
AND PARSE_DATE('%Y%m%d', event_date) = DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY)
)AS DAU,
(
SELECT count(distinct user_pseudo_id) as count
FROM `projectName.events_*` AS A
WHERE A._TABLE_SUFFIX BETWEEN CONCAT( SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 WEEK) AS string), 0 , 4),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 WEEK) AS string), 6 , 2),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 WEEK) AS string), 9 , 2))
AND CONCAT( SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 0 , 4),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 6 , 2),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 9 , 2))
AND PARSE_DATE('%Y%m%d', event_date) BETWEEN DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 WEEK)
AND DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY)
)AS WAU,
(
SELECT count(distinct user_pseudo_id) as count
FROM `projectName.events_*` AS A
WHERE A._TABLE_SUFFIX BETWEEN CONCAT( SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 MONTH) AS string), 0 , 4),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 MONTH) AS string), 6 , 2),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 MONTH) AS string), 9 , 2))
AND CONCAT( SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 0 , 4),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 6 , 2),
SUBSTR(CAST(DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY) AS string), 9 , 2))
AND PARSE_DATE('%Y%m%d', event_date) BETWEEN DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 MONTH)
AND DATE_SUB(PARSE_DATE('%Y%m%d', #DS_END_DATE), INTERVAL 1 DAY)
)AS MAU
Related
My calendar year runs from 07-01-(of one year) to 06-30-(of the next year).
My SQLITE DB has a Timestamp column and it's data type is datetime and stores the timestamp as 2023-09-01 00:00:00.
What I'm trying to do is get the MAX date of the latest snowfall. For example, with my seasonal years beginning July-01 (earliest) and ending June 30 (latest), I want to find only the latest (MAX) date snowfall was recorded, regardless of the year, based on the month.
Say if out of five years (2017 to 2022) worth of data in the database and it snowed Mar 15, 2020. And there was no date greater than than this one in any year, then this would be the latest date regardless which year it fell.
I've been trying many variations of the below query. This query says it runs with no mistakes and returns "null" values. I'm using SQLITE DB Browser to write and test the query.
SELECT Timestamp, MAX(strftime('%m-%d-%Y', Timestamp)) AS lastDate,
snowDepth AS lastDepth FROM DiaryData
WHERE lastDepth <> 0 BETWEEN strftime('%Y-%m-%d', Timestamp,'start of year', '+7 months')
AND strftime('%Y-%m-%d', Timestamp, 'start of year', '+1 year', '+7 months', '- 1 day')
ORDER BY lastDate LIMIT 1
and this is what's in my test database:
Timestamp snowFalling snowLaying snowDepth
2021-11-10 00:00:00 0 0 7.2
2022-09-15 00:00:00 0 0 9.5
2022-12-01 00:00:00 1 0 2.15
2022-10-13 00:00:00 1 0 0.0
2022-05-19 00:00:00 0 0 8.82
2023-01-11 00:00:00 0 0 3.77
If it's running properly I should expect:
Timestamp
lastDate
lastDepth
2022-05-19 00:00:00
05-19-2022
8.82
What am I missing or is this not possible in SQLITE? Any help would be appreciative.
Use aggregation by fiscal year utilizing SQLite's feature of bare columns:
SELECT Timestamp,
strftime('%m-%d-%Y', MAX(Timestamp)) AS lastDate,
snowDepth AS lastDepth
FROM DiaryData
WHERE snowDepth <> 0
GROUP BY strftime('%Y', Timestamp, '+6 months');
See the demo.
I'd get season for each record first, snowfall date relative to record's season start date after this, and largest snowfall date relative to record's season start date finally:
with
data as (
select
*
, case
when cast(strftime('%m', "Timestamp") as int) <= 7
then strftime('%Y-%m-%d', "Timestamp", 'start of year', '-1 year', '+6 months')
else strftime('%Y-%m-%d', "Timestamp", 'start of year', '+6 months')
end as "Season start date"
from DiaryData
where 1==1
and "snowDepth" <> 0.0
)
, data2 as (
select
*
, julianday("Timestamp") - julianday("Season start date")
as "Showfall date relative to season start date"
from data
)
, data3 as (
select
"Timestamp"
, "snowFalling"
, "snowLaying"
, "snowDepth"
from data2
group by null
having max("Showfall date relative to season start date")
)
select
*
from data3
demo
You can use the ROW_NUMBER window function to address this problem, yet need to apply a subtle tweak. In order to account for fiscal years, you can partition on the year for timestamps slided 6 months further. In this way, ranges like [2021-01-01, 2021-12-31] will instead be slided to [2021-06-01, 2022-05-31].
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(
PARTITION BY STRFTIME('%Y', DATE(Timestamp_, '+6 months'))
ORDER BY Timestamp_ DESC ) AS rn
FROM tab
)
SELECT Timestamp_,
STRFTIME('%d-%m-%Y', Timestamp_) AS lastDate,
snowDepth AS lastDepth
FROM cte
WHERE rn = 1
Check the demo here.
I want to write a chart that shows the active users in firebase
I wrote this code
SELECT event_date, COUNT(DISTINCT user_pseudo_id) AS user_count
FROM `mark-3314e.analytics_197261162.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'session_start'
GROUP BY event_date
ORDER BY event_date ASC
And this is the response
Row event_date user_count
1 20190617 1
2 20190621 3
is there any way to fill the missing dates between 21 and 17 with the previous data? like:
event_date user_count
20190617 1
20190618 1
20190619 1
20190620 1
20190621 3
You may join with a calendar table which contains the full date range of interest:
WITH dates AS (
SELECT '20190617' AS dt UNION ALL
SELECT '20190618' UNION ALL
SELECT '20190619' UNION ALL
SELECT '20190620' UNION ALL
SELECT '20190621'
)
SELECT
t1.dt AS event_date,
COUNT(DISTINCT t2.user_pseudo_id) AS user_count
FROM dates t1
LEFT JOIN `mark-3314e.analytics_197261162.events_*` t2
ON t1.dt = t2.event_date AND
t2._TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND t2.event_name = 'session_start'
GROUP BY
t1.dt
ORDER BY
t1.dt;
For a more general way to generate a date range in BigQuery, see this SO question.
Here is a possible solution using GENERATE_DATE_ARRAY function in BigQuery.
with data as (
SELECT parse_date('%Y%m%d', event_date) AS event_date, COUNT(DISTINCT user_pseudo_id) AS user_count
FROM `mark-3314e.analytics_197261162.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'session_start'
GROUP BY event_date
ORDER BY event_date ASC
)
select dt as event_date, user_count from (
select user_count,
if(
previousdate is null,
generate_date_array(date, date_sub(nextdate, interval 1 day), interval 1 day),
generate_date_array(date, if(nextdate is null, date, date_sub(nextdate, interval 1 day)), interval 1 day)
) as dates
from (
select
lag(event_date) over(order by event_date) as previousdate,
event_date as date,
lead(event_date) over(order by event_date) as nextdate,
user_count
from data
)
), unnest(dates) dt
I'm trying to setup a rolling 7 day users & rolling 31 day users in BigQuery (w/ Firebase) using the following query. I want it where for each day it examines the previous 31 days as well as 7 days. I've been stuck and getting the message:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
The query:
With events AS (
SELECT PARSE_DATE("%Y%m%d", event_date) as event_date, user_pseudo_id FROM `my_data_table.analytics_178206500.events_*`
Where _table_suffix NOT LIKE "i%" AND event_name = "user_engagement"
GROUP BY 1, 2
),
DAU AS (
SELECT event_date as date, COUNT(DISTINCT(user_pseudo_id)) AS dau
From events
GROUP BY 1
)
SELECT DAU.date, DAU.dau,
(
SELECT count(distinct(user_pseudo_id))
FROM events
WHERE events.event_date BETWEEN DATE_SUB(DAU.date, INTERVAL 29 DAY) and dau.date
) as mau,
(
SELECT count(distinct(user_pseudo_id))
FROM events
WHERE events.event_date BETWEEN DATE_SUB(DAU.date, INTERVAL 7 DAY) and dau.date
) as wau
FROM DAU
ORDER BY 1 DESC
I'm able to get the DAU part but the last 7 day users (WAU) & last 31 day users (MAU) aren't coming through. I have tried to CROSS JOIN DAU w/ events but I get the following results GraphResults
Any pointers would be greatly appreciated
I wrote the following query below against our mobile app's data. Due to a high user-base, I am getting a 400 request error "Resources exceeded during query execution: The query could not be executed in the allotted memory" when I add the ORDER BY at the bottom.
Question: Is there anything that I can do to optimize the query, but still retain the ORDER BY at the bottom?
I already added in the firebase's demo data-set, but I think their data-set is just too small to have a problem (compared to my data-set which is 5-10 million records big).
SELECT
f.user_pseudo_id,
f.event_timestamp,
DATE(TIMESTAMP_MICROS(f.event_timestamp)) as event_timestamp_date,
f.event_name,
f.user_first_touch_timestamp,
DATE(TIMESTAMP_MICROS(f.user_first_touch_timestamp)) as user_first_touch_date,
CASE WHEN r.has_appRemove >= 1 THEN "removed" ELSE "not-removed" END AS status_after_first7days
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*` f
LEFT JOIN (
SELECT user_pseudo_id, 1 has_appRemove
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*`
WHERE DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
AND platform = "ANDROID"
AND event_name = "app_remove"
GROUP BY user_pseudo_id
) r on f.user_pseudo_id = r.user_pseudo_id
WHERE
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
AND platform = "ANDROID"
ORDER BY 1,2 ASC
You can apply windowing/analytical function instead of join'ing - like in below example (not tested)
#standardSQL
SELECT
user_pseudo_id,
event_timestamp,
DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_timestamp_date,
event_name,
user_first_touch_timestamp,
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_date,
COUNTIF(event_name = "app_remove") OVER(PARTITION BY user_pseudo_id) > 0 isRemoved
FROM `firebase-analytics-sample-data.ios_dataset.app_events_*`
WHERE
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) >= DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY)
AND DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) < DATE_SUB(CURRENT_DATE(), INTERVAL 9 DAY)
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 10 DAY))
AND _TABLE_SUFFIX < FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
AND platform = "ANDROID"
ORDER BY 1,2 ASC
The Problem: Given a day of the week (1, 2, 3, 4, 5, 6, 7), a starting date and an ending date, compute the number of times the given day of the week appears between the starting and ending dates not inclusive of a date for which there were no sales.
Context:
Table "Ticket" has the following structure and sample content:
i_ticket_id c_items_total dt_create_time dt_close_time
----------------------------------------------------------------------------
1 8.50 '10/1/2012 10:23:00' '10/1/2012 11:05:05'
2 10.50 '10/1/2012 11:00:00' '10/1/2012 11:45:05'
3 8.50 '10/2/2012 08:00:00' '10/2/2012 09:25:05'
4 8.50 '10/4/2012 08:00:00' '10/4/2012 09:25:05'
5 7.50 '10/5/2012 13:22:23' '10/5/2012 14:33:27'
.
.
233 6.75 '10/31/2012 23:20:00' '10/31/2012 23:55:39'
Details
There may or may not be any tickets for one or more days during a month. (i.e. the place was closed that/those day/s)
Days in which the business is closed are not regular. There is no predictable pattern.
Based on Get number of weekdays (Sundays, Mondays, Tuesdays) between two dates SQL,
I have derived a query which returns the number of times a given day of the week occurs between the start date and the end date:
DECLARE #dtStart DATETIME = '10/1/2013 04:00:00'
DECLARE #dtEnd DATETIME = '11/1/2013 03:59:00'
DECLARE #day_number INTEGER = 1
DECLARE #numdays INTEGER
SET #numdays = (SELECT 1 + DATEDIFF(wk, #dtStart, #dtEnd)-
CASE WHEN DATEPART(weekday, #dtStart) #day_number THEN 1 ELSE 0 END -
CASE WHEN DATEPART(weekday, #dtEnd) <= #day_number THEN 1 ELSE 0 END)
Now I just need to filter this so that any zero-dollar days are not included in the count. Any help you can provide to add this filter based on the contents of the tickets table is greatly appreciated!
If I understand correctly, you can use a calendar table to count the number of days where the day of week is n and between the start and end and is a date that has ticket sales, which I guess is when the date exists in tickets and has the sum(c_items_total) > 0
WITH cal AS
(
SELECT cast('2012-01-01' AS DATE) dt, datepart(weekday, '2012-01-01') dow
UNION ALL
SELECT dateadd(day, 1, dt), datepart(weekday, dateadd(day, 1, dt))
FROM cal
WHERE dt < getdate()
)
SELECT COUNT(1)
FROM cal
WHERE dow = 5
AND dt BETWEEN '2012-04-01' AND '2012-12-31'
AND EXISTS (
SELECT 1
FROM tickets
WHERE cast(dt_create_time AS DATE) = dt
GROUP BY cast(dt_create_time AS DATE)
HAVING sum(c_items_total) > 0
)
OPTION (MAXRECURSION 0)
SQLFiddle