I recently found out that visitNumber in big query google analytics export starts over at 1 if a user has not visited the site in 183 days or more. I am now trying to understand if the same lookback window is applied when google analytics defines new users?
The result of SUM(totals.newVisits) in bigquery is exactly the same as the new user count reported in Google Analytics Audience report for a day in my exported data that has users marked as new visitors eventhough they have visited our site earlier. I therefore conclude that google analytics also uses the same lookback window.
I found that in order to count new users depending on their actual first visit (cookie creation date) it's possible to use the last part of the client id. As an example, this query finds the number of new users for 20181025:
#StandardSQL
SELECT SUM(CASE WHEN cookie_date = '2018-10-25' THEN 1 ELSE 0 END) AS new_visitors,
count(*) AS all_visitors
FROM (SELECT clientId,
DATE(TIMESTAMP_ADD("1970-01-01 00:00:00 UTC", INTERVAL min(CAST(REGEXP_EXTRACT(clientId, r'[0-9]*$') AS INT64)) SECOND), "Europe/Berlin") as
cookie_date
FROM `xxx.ga_sessions_20181025`
GROUP BY clientId)
Related
I'm trying to count the correct number of google ad clicks in the google analytics big query table, but the numbers differ when compared with the google analytics web interface.
I tried two different approaches, counting only the entrance hits of each visit and group by channelGrouping:
SELECT
channelGrouping,
count(*) as clicks,
FROM `ga_sessions_table`, UNNEST(hits) as hits
WHERE hits.isEntrance
GROUP BY channelGrouping
Or counting each PAGE event
SELECT
channelGrouping,
count(*) as clicks,
FROM `ga_sessions_table`, UNNEST(hits) as hits
WHERE hits.type = 'PAGE'
GROUP BY channelGrouping
I compare one specific day in our data.
With the first query I get around 10% less of clicks as in the web UI.
With the second query I get more than 3 times more compare to the web UI.
How do I correctly count the clicks with that table?
Google analitycs announced today their new policy about data retention
Google Analytics Data Retention
Along with the option for choosing how long data will be stored (I chose do not delete data, obviously), there is an option "Reset on new activity".
I cannot find info about what is this option about. What they mean by 'activity'? If I don't switch it off, the option could be reset and data deleted?
It means that the counter for data retention is reset when the user visits the page again.
Image your retention period is set to 15 months.
User visits once = after 15 months the data is purged.
User visits. One month later the user visits a second time = second visit resets the counter, effectively user data is retained for 16 months in total.
I am trying to get the count of total.pageviews of people go through the booking page on website. Here is my query.
SELECT sum( totals.pageviews ) AS Searches,Date
FROM `table*`
WHERE exists (
select 1 from unnest(hits) as hits
where hits.page.pagePath ='booking'
)
and date='20161109'
GROUP BY DATE
But I got way more results than what i got from Google Analytics.
Big query result: around 1M
GA: around 300,000
This is the GA page that I am trying to match with
GA result
After looking a bit more into Google Analytics data, I think that you actually want to count entries in hits that match the condition directly instead of relying on totals.pageViews. The problem is that totals.pageViews represents the number of distinct pages visited within a particular session (if I'm using the correct terminology), which includes pages that don't match your filter. I think you want something like this instead:
SELECT
COUNT(*) AS Searches,
Date
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking';
This counts the matched pages directly, and will hopefully give the expected numbers.
Try below
SELECT
date,
COUNT(*) AS Searches,
SUM(totals.pageviews) as PageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
AND hit.hitNumber = 1
GROUP BY date
Searches - number of sessions started with booking page as an entry point to website;
PageViews - number of pageviews in those (above) sessions
I would like to have total(totals.pageview ) for the booking page on
the website. how many times that the booking page has been viewed
First - total(totals.pageview) - doesn't help in identifying what really you need as you are assuming that using total.pageviews field is correct, which seems is not - at least based on the rest of your wording
Secondly, if to assume that what you need is - count of pageviews of the booking page on the website - the only reasonable answer is below
SELECT
date,
COUNT(1) AS BookingPageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
GROUP BY date
Finally, if you still getting numbers different from what you expect - you need to revisit your what actually you are looking for. It might be that the number that you see in GA represents metric that is different from what you think it represents. This is the only explanation I would see
I found the solution solve this problem:
SELECT count(totals.pageviews) AS Searches,Date
FROM table, UNNEST(hits) as hits
WHERE hits.page.pagePath ='/booking' and hits.type='PAGE'
GROUP BY DATE
Hope this answer can help other people.
I'm querying page views by page from BigQuery. My query is:
SELECT hits.page.pagePath, COUNT(*) as pageViews FROM `bigquery-refresh.refresh.ga_sessions_2015*`,
UNNEST(hits) as hits
WHERE date >= '20150101' AND date < '20150701'
AND geoNetwork.country = "United States"
AND hits.type="PAGE"
GROUP BY hits.page.pagePath
ORDER BY pageViews DESC
I'm comparing this query to the total page views reported from within GA (for the same country and date range), and am finding that the total number of page views in GA is ~0.4% larger than in BigQuery. Is there a reason for this small discrepancy?
I'm not familiar with GA, but here are my random guesses:
(1) As Elliott pointed out, maybe GA includes some extra data
(2) Or maybe GA uses different rule than count(*)
(3) I happen to know that Adwords will adjust the report data even several days later. Maybe GA has the same feature.
Are you sure you're counting the right thing?
On the Schema documentation it says that each row in BQ corresponds to a session (not a hit, nor a pageview), so the count(*) wouldn't be correct and thus show a different number when compared to GA's UI.
The schema also shows that for pageviews you have the totals:
totals.pageviews (check definition here)
totals.hits (check definition here)
So, every interaction with the page is a hit. Can you confirm that by using the totals.pageviews you get to the correct number?
I'm working on Firebase analytics for my app so the following question is in the same context - Does firebase have a concept of "retained user, who did not open the app but had app on device" ? If so, does it appear on the Firebase Dashboard?
Also how can I get a count of freshly installed users (new users) for each day.
All help is appreciated.
No, there is no way to track this on Firebase Analytics. When your users use your app, Firebase SDK sends events to FA and they aggregate this data in order to generate reports.
This way they can extract active users, but there is no way to determine users who have the app installed but they don't use it.
You can determine new users based on "first_open" event. This event shows you how many users open the app for the first time
It is possible to compute N-day inactive users in BigQuery after linking-up Firebase with BQ (Source):
-- N-Day Inactive Users = users in the last M days who have not logged a user_engagement event in the last N days where M > N.
SELECT
COUNT(DISTINCT M_days.user_id)
FROM (
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*`
WHERE
event_name = 'user_engagement'
/* Has engaged in last M = 7 days */
AND event_timestamp > UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131') AS M_days
/* EXCEPT ALL is not yet implemented in BigQuery. Use LEFT JOIN in the interim.*/
LEFT JOIN (
SELECT
user_id
FROM
/* PLEASE REPLACE WITH YOUR TABLE NAME */
`YOUR_TABLE.events_*`
WHERE
event_name = 'user_engagement'
/* Has engaged in last N = 2 days */
AND event_timestamp > UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 2 DAY))
/* PLEASE REPLACE WITH YOUR DESIRED DATE RANGE */
AND _TABLE_SUFFIX BETWEEN '20180521' AND '20240131') AS N_days
ON
M_days.user_id = N_days.user_id
WHERE
N_days.user_id IS NULL