I am a GA premium user and new to bigquery. I am doing some data exploration and want to pull page views by title. I don't think you can do something like the following query because the totals.pageview records are an aggregate already:
SELECT hits.page.pageTitle, sum(totals.pageviews) FROM [sample.ga_sessions_20150125]
GROUP BY hits.page.pageTitle
Can someone explain how I can recreate pulling pageviews from raw hits data? Thanks!
Be aware, totals contains aggregate values across the visit of the visitor, so the context for those numbers is the visitor.
SELECT hits.page.pageTitle,
count(DISTINCT fullVisitorId)
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
GROUP BY hits.page.pageTitle
this query calculates the unique visitors per page.
You can run the above query on BigQuery sample dataset.
Ahh, okay, I got it. I used:
select hits.page.pageTitle, count(*)
from [sample.ga_sessions_20150125]
where hits.type='PAGE'
group by hits.page.pageTitle
Just needed the hits.type value, and to not use the totals number. Thanks!
Related
I'm playing around with BigQuery google_analytics_sample data.
I'm trying to retrieve the number of Total Unique Searches I'm seeing from the Google Analytics UI.
I'm running the following query:
SELECT
hits.page.searchKeyword AS Search
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*` AS GA,
UNNEST(GA.hits) AS hits
WHERE
(_TABLE_SUFFIX BETWEEN '20170101'
AND '20171231')
and hits.page.searchKeyword IS NOT NULL
and I got 441 when the UI show 607 Total Unique Searches.
What do I'm missing?
Thanks.
It looks like the linked table doesn't contain data for all dates in 2017.
SELECT
max(_TABLE_SUFFIX) as max_suffix
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*` AS GA
WHERE
(_TABLE_SUFFIX BETWEEN '20170101'
AND '20171231')
Try adjusting your date filters in the GA report.
I am trying to get the count of total.pageviews of people go through the booking page on website. Here is my query.
SELECT sum( totals.pageviews ) AS Searches,Date
FROM `table*`
WHERE exists (
select 1 from unnest(hits) as hits
where hits.page.pagePath ='booking'
)
and date='20161109'
GROUP BY DATE
But I got way more results than what i got from Google Analytics.
Big query result: around 1M
GA: around 300,000
This is the GA page that I am trying to match with
GA result
After looking a bit more into Google Analytics data, I think that you actually want to count entries in hits that match the condition directly instead of relying on totals.pageViews. The problem is that totals.pageViews represents the number of distinct pages visited within a particular session (if I'm using the correct terminology), which includes pages that don't match your filter. I think you want something like this instead:
SELECT
COUNT(*) AS Searches,
Date
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking';
This counts the matched pages directly, and will hopefully give the expected numbers.
Try below
SELECT
date,
COUNT(*) AS Searches,
SUM(totals.pageviews) as PageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
AND hit.hitNumber = 1
GROUP BY date
Searches - number of sessions started with booking page as an entry point to website;
PageViews - number of pageviews in those (above) sessions
I would like to have total(totals.pageview ) for the booking page on
the website. how many times that the booking page has been viewed
First - total(totals.pageview) - doesn't help in identifying what really you need as you are assuming that using total.pageviews field is correct, which seems is not - at least based on the rest of your wording
Secondly, if to assume that what you need is - count of pageviews of the booking page on the website - the only reasonable answer is below
SELECT
date,
COUNT(1) AS BookingPageViews
FROM `table*`, UNNEST(hits) AS hit
WHERE hit.page.pagePath = 'booking'
GROUP BY date
Finally, if you still getting numbers different from what you expect - you need to revisit your what actually you are looking for. It might be that the number that you see in GA represents metric that is different from what you think it represents. This is the only explanation I would see
I found the solution solve this problem:
SELECT count(totals.pageviews) AS Searches,Date
FROM table, UNNEST(hits) as hits
WHERE hits.page.pagePath ='/booking' and hits.type='PAGE'
GROUP BY DATE
Hope this answer can help other people.
I'm querying page views by page from BigQuery. My query is:
SELECT hits.page.pagePath, COUNT(*) as pageViews FROM `bigquery-refresh.refresh.ga_sessions_2015*`,
UNNEST(hits) as hits
WHERE date >= '20150101' AND date < '20150701'
AND geoNetwork.country = "United States"
AND hits.type="PAGE"
GROUP BY hits.page.pagePath
ORDER BY pageViews DESC
I'm comparing this query to the total page views reported from within GA (for the same country and date range), and am finding that the total number of page views in GA is ~0.4% larger than in BigQuery. Is there a reason for this small discrepancy?
I'm not familiar with GA, but here are my random guesses:
(1) As Elliott pointed out, maybe GA includes some extra data
(2) Or maybe GA uses different rule than count(*)
(3) I happen to know that Adwords will adjust the report data even several days later. Maybe GA has the same feature.
Are you sure you're counting the right thing?
On the Schema documentation it says that each row in BQ corresponds to a session (not a hit, nor a pageview), so the count(*) wouldn't be correct and thus show a different number when compared to GA's UI.
The schema also shows that for pageviews you have the totals:
totals.pageviews (check definition here)
totals.hits (check definition here)
So, every interaction with the page is a hit. Can you confirm that by using the totals.pageviews you get to the correct number?
I have been trying to count the session for each page using bigquery where data is exported to bigquery from GA. The schema of the data can be found here.
I have tried following query
SELECT
hits.page.pagePath AS page,
COUNT(totals.visits) AS sessions
FROM
[xxxxxxx.ga_sessions_20160801]
WHERE
REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
AND hits.type = 'PAGE'
GROUP BY
page
ORDER BY
sessions DESC
I compared the result of the query with numbers that I get from the GA but the result is quite different. I expected that above query would give total session for each page but it gives total pageviews for each page. In other words result of above query exactly match with pageviews of each page instead of sessions of each page.
I also tried the following query
SELECT
hits.page.pagePath AS page,
COUNT(hits.isEntrance) AS sessions
FROM
[xxxxxxx.ga_sessions_20160801]
WHERE
REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
AND hits.type = 'PAGE'
GROUP BY
page
ORDER BY
sessions DESC
The result this time is very close to actual but not exactly the same as numbers that I am getting from GA. This time bigquery result is slightly less than that of the GA for some pages.
There is no sampling in GA in my case otherwise result is acceptable because error is between 0.5% to 4%
I am working with raw data without any filter on GA profile and same data is exported to bigquery.
Question: How is session counted when we count session by pages?
When I don't group the result by hits.page.pagePath there is no mismatch of results that I get from GA and that from bigquery
Instead of using COUNT(totals.visits), what if you use COUNT(1)? The results of COUNT will vary depending on whether you are using a repeated field. Possibly relevant question with some in depth answers: BigQuery flattens when using field with same name as repeated field
As an aside, standard SQL (uncheck "Use Legacy SQL" under "Show Options") has less surprising semantics around counting, although it would require you to be more explicit with operations on arrays in this case.
To count sessions, I use COUNT(visitId) instead of COUNT(totals.visits). This seems to give me numbers identical--or very, very close--to what I see in GA.
I am trying to get the figures related to search performance based on goal completions in Google Analytics.This goals are based on urls, so as a first step what I did was getting the total completions adding as many ORs as goal urls we have and that's fine. So far so good.
The problem is when we have to segment it by "visits with search". Based on url as well: pagepath like "%search_parameter%" but this time in a separate statement as the previous goal urls:
SELECT sum(totals.visits)
FROM [XXXXXX.ga_sessions_20150101]
WHERE
(
REGEXP_MATCH (hits.page.pagePath,r'/goal1/')
or REGEXP_MATCH (hits.page.pagePath,r'/goal2/')
or REGEXP_MATCH (hits.page.pagePath,r'/goal3/')
or REGEXP_MATCH (hits.page.pagePath,r'/goal4/')
)
and REGEXP_MATCH (hits.page.pagePath,r'/search/')
In Google Analytics interface of course I have goals completed from people doing searches, so I don't understand what may have been missing when constructing this query.
Any help?
Many Thanks!
If I correctly understood that "visits with search" mean that at least one url in hits.page.pagePath matches '/search/', then I think the following should work for you:
SELECT sum(totals.visits)
FROM
(SELECT
totals.visits,
hits.page.PagePath,
SOME(REGEXP_MATCH(hits.page.pagePath,r'/search/'))
WITHIN RECORD AS has_search
FROM [XXXXXX.ga_sessions_20150101]
WHERE REGEXP_MATCH(hits.page.pagePath,r'[/goal1/|/goal2/|/goal3/|/goal4/]')
)
WHERE has_search