I am trying the new google analytics 4 and bigquery and would like to understand if there is a way to take care of the "rogue referral" describe by simo on this link
https://www.simoahava.com/gtm-tips/fix-rogue-referral-problem-single-page-sites/
The idea is that for single page application website in order for the document location to stay the same we had to create a javascript variable with the original location ( landing page) and send it on the field to set on google tag manager to send to Google Universal Analytics
1 - Is there any way to have the same on Google Tag Manager but this time sending data to the New Google Analytics 4
2 - Is there a way to do referral exclusion as my domain name is consistently inside the utm medium / source / campaign while running this query
SELECT
event_name,
DATETIME(TIMESTAMP_MICROS(event_timestamp)),
user_pseudo_id,
(select value.string_value from UNNEST(event_params) where key = 'page_location' ) as page_location,
(select value.string_value from UNNEST(event_params) where key = 'page_referrer' ) as page_referrer,
traffic_source.name,
traffic_source.medium,
traffic_source.source,
(select value.string_value from UNNEST(event_params) where key = 'source' ) as source_,
(select value.string_value from UNNEST(event_params) where key = 'medium' ) as medium_,
(select value.string_value from UNNEST(event_params) where key = 'campaign' ) as campaign_
FROM `mytable`
where user_pseudo_id ='mycookie'
group by 1,2,3,4,5,6,7,8,9,10,11
order by 2 desc
GA4 and Rogue Referrer
Looks like this issue does not persist in GA4 based on how sessions are calculated
From Google's documentation,
Differences in session counts between Google Analytics 4 and Universal Analytics
You may see lower session counts in Google Analytics 4 because Google Analytics 4 does not create a new session when the campaign source changes mid session, while Universal Analytics does create a new session under that circumstance.
Related
In Google Analytics (GA4) GUI, under the traffic acquisition report, it is possible to see app visits split by source.
However, I cannot see the same info in BigQuery.
According to [GA4] BigQuery Export schema documentation traffic_source is the "Name of the traffic source that first acquired the user". I have checked and in fact it seems that the value of traffic_source changes only when the user_pseudo_id changes, which means it persists until the app is reinstalled.
Scenario:
User A installed the app with a google-play campaign, then visits the app a second time following a Google cpc campaign and then a third time following a push notification.
Question:
In BigQuery, how can I see that the second visit was from cpc and the third from the push notification?
The traffic_source is indeed persisting over multiple sessions and it captures only the source of the first app install.
To get attribution at visit level, you need to use the parameters inside the firebase_campaign event, for example:
SELECT
user_id,
user_pseudo_id,
event_date,
event_timestamp,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source_,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium_
FROM `project.table.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20210316' AND '20210319'
AND event_name IN('firebase_campaign')
AND user_id = 'XXXXX'
I have a firebase connection with BigQuery and i want to extract the daily user engagement information that is present on the analytics dashboard.
Firebase Analytics Dashboard Daily User Engagement
I´ve already tried to find the numbers using EVENT_NAME filter as 'user_engagement', 'screen_view' and EVENT_PARAMS_KEY as 'engagement_time_msec', 'engaged_session_event'.
With the filter mentioned previously i couldn´t even get near from the firebase values.
Somebody knows how to reach them
According to the documentation, user engagement data collection is triggered periodically by Firebase SDK, while the app is running in the foreground. Saying this, for user_engagement we might be looking over the events, filtering for the entry where event_params.key = "engagement_time_msec" and fetching the event_params.value.int_value from there.
SELECT event_timestamp, user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `firebase*.events_*`
WHERE event_name = "user_engagement"
I noticed there are thousands of events duplicated in the events tables of BigQuery (in an integration with Firebase).
My definition of duplicated is: 2 or more events that share the same data in all these fields:
event_timestamp, event_name, user_pseudo_id, app_info.id, device.advertising_id
It happens for automatically collected events, and also custom events. I found some of the parameters that could differ from one result to the other are (what make those events different):
event_server_timestamp_offset, geo.continent, geo.country
I guess there is no reason for a duplicated event at the same moment, same user, same app, same device, but one event is geo.continent=America and the other geo.continent=Asia.
Any thoughts why this is happening? Thanks in advance.
Google's explanation is that Firebase data duplication in BigQuery is mostly related to network issues on the client's side that cause events to be buffered and sent twice.
However there is a way to deduplicate these events by using event_server_timestamp_offset. This field is difference between the time the event was sent to Google's server and when it was received.
This means that given the same event_timestamp, event_name and user_pseudo_id you could take only the event with lower event_server_timestamp_offset to have a correct result.
You can also safely delete duplicates records from your event table.
Sorry I can't share sources for this because the answer came from Google Analytics support, as I was encountering the same issue.
We use the QUALIFY clause for deduplication Firebase events in BigQuery:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Qualifying columns:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.
Is `visitNumber the number of times a user visits a site to date?
But in the data (see screenshot below), I'm seeing visit numbers skipped, and the visitorId is also null.
1
I assume you are dealing with Google Analytics exported to BigQuery
If so:
visitorId is deprecated (thus nulls) and fullVisitorId should be used instead.
visitNumber is an INTEGER that represents session number for the user. If this is the first session, then this is set to 1.
fullVisitorId is a STRING that represents unique visitor ID (also known as client ID).
See more at BigQuery Export schema
I'm under the impression fullVisitorId being just a hash of clientId, there should be one-to-one mapping between the two. But here, I've a situation where few of the fullVisitorId are mapped to two different client Id (we're collecting GA Client ID into User scoped custom dimensions)
Is that possible ? under what circumstances?
Thanks for any clarification on this
Cheers!
[edit: ] attaching screenshot
You may be interested in reading about the Google Analytics schema for BigQuery. Some of the relevant parts are:
fullVisitorId: The unique visitor ID (also known as client ID).
visitId: An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.
So client ID and full visitor ID are synonymous, and if you want a unique ID for a particular visit, you should use a combination of fullVisitorId and visitId.