I'm trying to count the correct number of google ad clicks in the google analytics big query table, but the numbers differ when compared with the google analytics web interface.
I tried two different approaches, counting only the entrance hits of each visit and group by channelGrouping:
SELECT
channelGrouping,
count(*) as clicks,
FROM `ga_sessions_table`, UNNEST(hits) as hits
WHERE hits.isEntrance
GROUP BY channelGrouping
Or counting each PAGE event
SELECT
channelGrouping,
count(*) as clicks,
FROM `ga_sessions_table`, UNNEST(hits) as hits
WHERE hits.type = 'PAGE'
GROUP BY channelGrouping
I compare one specific day in our data.
With the first query I get around 10% less of clicks as in the web UI.
With the second query I get more than 3 times more compare to the web UI.
How do I correctly count the clicks with that table?
Related
I recently found out that visitNumber in big query google analytics export starts over at 1 if a user has not visited the site in 183 days or more. I am now trying to understand if the same lookback window is applied when google analytics defines new users?
The result of SUM(totals.newVisits) in bigquery is exactly the same as the new user count reported in Google Analytics Audience report for a day in my exported data that has users marked as new visitors eventhough they have visited our site earlier. I therefore conclude that google analytics also uses the same lookback window.
I found that in order to count new users depending on their actual first visit (cookie creation date) it's possible to use the last part of the client id. As an example, this query finds the number of new users for 20181025:
#StandardSQL
SELECT SUM(CASE WHEN cookie_date = '2018-10-25' THEN 1 ELSE 0 END) AS new_visitors,
count(*) AS all_visitors
FROM (SELECT clientId,
DATE(TIMESTAMP_ADD("1970-01-01 00:00:00 UTC", INTERVAL min(CAST(REGEXP_EXTRACT(clientId, r'[0-9]*$') AS INT64)) SECOND), "Europe/Berlin") as
cookie_date
FROM `xxx.ga_sessions_20181025`
GROUP BY clientId)
Using the GA UI I am able to view how many users viewed a specific "page view" that is being fired. Is it possible to find this data in big query as well?
totals.pageviews
Gives me the over all page views during a session but my interest is in a specific page view.
From reading the export schema I know I can query for a specific page path but there is no mention of a specific page view.
--Using Legacy SQL
Yes, it's certainly possible. Here's probably the most efficient way of doing so:
#standardSQL
SELECT
COUNT(1)
FROM `dataset.ga_sessions_tableid`
WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE REGEXP_CONTAINS(page.pagepath, r'/home/') AND type = 'PAGE')
This example counts how many customers visited a page whose path had '/home/' somewhere.
Notice this counts total users and not total page views. If a customer saw the same page twice it still counts as 1.
This query gives the total page views:
#standardSQL
SELECT
SUM((SELECT COUNTIF(REGEXP_CONTAINS(page.pagepath, r'/home/')) FROM UNNEST(hits) WHERE type = 'PAGE'))
FROM `dataset.ga_sessions_tableid`
WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE REGEXP_CONTAINS(page.pagepath, r'/home/') AND type = 'PAGE')
To get the number of page views for a specific page, use this:
select hits.page.pagePath , count(*)
from [<insert data set name>]
where
hits.type='PAGE' and
hits.page.pagePath = '<insert page path>'
group by hits.page.pagePath
I am trying to get the count of total.pageviews of people go through the booking page on website. Here is my query.
SELECT sum( totals.pageviews ) AS Searches,Date
FROM `table*`
WHERE exists (
select 1 from unnest(hits) as hits
where hits.page.pagePath ='booking'
)
and date='20161109'
GROUP BY DATE
But I got way more results than what i got from Google Analytics.
Big query result: around 1M
GA: around 300,000
This is the GA page that I am trying to match with
GA result
After looking a bit more into Google Analytics data, I think that you actually want to count entries in hits that match the condition directly instead of relying on totals.pageViews. The problem is that totals.pageViews represents the number of distinct pages visited within a particular session (if I'm using the correct terminology), which includes pages that don't match your filter. I think you want something like this instead:
SELECT
COUNT(*) AS Searches,
Date
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking';
This counts the matched pages directly, and will hopefully give the expected numbers.
Try below
SELECT
date,
COUNT(*) AS Searches,
SUM(totals.pageviews) as PageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
AND hit.hitNumber = 1
GROUP BY date
Searches - number of sessions started with booking page as an entry point to website;
PageViews - number of pageviews in those (above) sessions
I would like to have total(totals.pageview ) for the booking page on
the website. how many times that the booking page has been viewed
First - total(totals.pageview) - doesn't help in identifying what really you need as you are assuming that using total.pageviews field is correct, which seems is not - at least based on the rest of your wording
Secondly, if to assume that what you need is - count of pageviews of the booking page on the website - the only reasonable answer is below
SELECT
date,
COUNT(1) AS BookingPageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
GROUP BY date
Finally, if you still getting numbers different from what you expect - you need to revisit your what actually you are looking for. It might be that the number that you see in GA represents metric that is different from what you think it represents. This is the only explanation I would see
I found the solution solve this problem:
SELECT count(totals.pageviews) AS Searches,Date
FROM table, UNNEST(hits) as hits
WHERE hits.page.pagePath ='/booking' and hits.type='PAGE'
GROUP BY DATE
Hope this answer can help other people.
I'm querying page views by page from BigQuery. My query is:
SELECT hits.page.pagePath, COUNT(*) as pageViews FROM `bigquery-refresh.refresh.ga_sessions_2015*`,
UNNEST(hits) as hits
WHERE date >= '20150101' AND date < '20150701'
AND geoNetwork.country = "United States"
AND hits.type="PAGE"
GROUP BY hits.page.pagePath
ORDER BY pageViews DESC
I'm comparing this query to the total page views reported from within GA (for the same country and date range), and am finding that the total number of page views in GA is ~0.4% larger than in BigQuery. Is there a reason for this small discrepancy?
I'm not familiar with GA, but here are my random guesses:
(1) As Elliott pointed out, maybe GA includes some extra data
(2) Or maybe GA uses different rule than count(*)
(3) I happen to know that Adwords will adjust the report data even several days later. Maybe GA has the same feature.
Are you sure you're counting the right thing?
On the Schema documentation it says that each row in BQ corresponds to a session (not a hit, nor a pageview), so the count(*) wouldn't be correct and thus show a different number when compared to GA's UI.
The schema also shows that for pageviews you have the totals:
totals.pageviews (check definition here)
totals.hits (check definition here)
So, every interaction with the page is a hit. Can you confirm that by using the totals.pageviews you get to the correct number?
I am a GA premium user and new to bigquery. I am doing some data exploration and want to pull page views by title. I don't think you can do something like the following query because the totals.pageview records are an aggregate already:
SELECT hits.page.pageTitle, sum(totals.pageviews) FROM [sample.ga_sessions_20150125]
GROUP BY hits.page.pageTitle
Can someone explain how I can recreate pulling pageviews from raw hits data? Thanks!
Be aware, totals contains aggregate values across the visit of the visitor, so the context for those numbers is the visitor.
SELECT hits.page.pageTitle,
count(DISTINCT fullVisitorId)
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
GROUP BY hits.page.pageTitle
this query calculates the unique visitors per page.
You can run the above query on BigQuery sample dataset.
Ahh, okay, I got it. I used:
select hits.page.pageTitle, count(*)
from [sample.ga_sessions_20150125]
where hits.type='PAGE'
group by hits.page.pageTitle
Just needed the hits.type value, and to not use the totals number. Thanks!