I'm under the impression fullVisitorId being just a hash of clientId, there should be one-to-one mapping between the two. But here, I've a situation where few of the fullVisitorId are mapped to two different client Id (we're collecting GA Client ID into User scoped custom dimensions)
Is that possible ? under what circumstances?
Thanks for any clarification on this
Cheers!
[edit: ] attaching screenshot
You may be interested in reading about the Google Analytics schema for BigQuery. Some of the relevant parts are:
fullVisitorId: The unique visitor ID (also known as client ID).
visitId: An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.
So client ID and full visitor ID are synonymous, and if you want a unique ID for a particular visit, you should use a combination of fullVisitorId and visitId.
Related
I am trying to match each user with the respective advertising_id, vendor_id, user_id and user_pseudo_id.
Since forever I was using the user_pseudo_id as the trusted unique identifier since it is always present in the data that returned (other ids sometimes return as NULL).
But lately I have noticed that user_pseudo_id can be the same for multiple users (symptom is there are events of multiple devices with the same user_pseudo_id; with differrent vendor_id and advertising_id though)
I cannot use CONCAT of those 3 ids as a unique key since most of the time 1 of them would be NULL or BLANK
Do you guys have any ideas to handle this problem or should I just ignore it since it doesn't really accounts for that much compares to the total number of users?
I noticed there are thousands of events duplicated in the events tables of BigQuery (in an integration with Firebase).
My definition of duplicated is: 2 or more events that share the same data in all these fields:
event_timestamp, event_name, user_pseudo_id, app_info.id, device.advertising_id
It happens for automatically collected events, and also custom events. I found some of the parameters that could differ from one result to the other are (what make those events different):
event_server_timestamp_offset, geo.continent, geo.country
I guess there is no reason for a duplicated event at the same moment, same user, same app, same device, but one event is geo.continent=America and the other geo.continent=Asia.
Any thoughts why this is happening? Thanks in advance.
Google's explanation is that Firebase data duplication in BigQuery is mostly related to network issues on the client's side that cause events to be buffered and sent twice.
However there is a way to deduplicate these events by using event_server_timestamp_offset. This field is difference between the time the event was sent to Google's server and when it was received.
This means that given the same event_timestamp, event_name and user_pseudo_id you could take only the event with lower event_server_timestamp_offset to have a correct result.
You can also safely delete duplicates records from your event table.
Sorry I can't share sources for this because the answer came from Google Analytics support, as I was encountering the same issue.
We use the QUALIFY clause for deduplication Firebase events in BigQuery:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Qualifying columns:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.
Is `visitNumber the number of times a user visits a site to date?
But in the data (see screenshot below), I'm seeing visit numbers skipped, and the visitorId is also null.
1
I assume you are dealing with Google Analytics exported to BigQuery
If so:
visitorId is deprecated (thus nulls) and fullVisitorId should be used instead.
visitNumber is an INTEGER that represents session number for the user. If this is the first session, then this is set to 1.
fullVisitorId is a STRING that represents unique visitor ID (also known as client ID).
See more at BigQuery Export schema
I have been trying to understand the concept behind google client id, set as fullvisitor id in BigQuery Export Schema
I know that to define a session, a unique combination of fullvisitorid and visitid has to be found.
However,I couldn't find a good explanation regarding how google defines this id, and how permanent it is across sessions.
Thank you very much!
I'm not sure about what you didn't understand, but the client id is generated on the user's browser when the GA script is initialised and stored in the cookie.
After that, every hit sent by that instance of the script [that has the cookie] will have the cid value as the client id.
When implementing GA you can also set that value yourself if you want. Although there are better ways for doing that.
Here's some documentation on the cid and here is how the session is defined (thus how the session id is derived during data collection)
Firebase Analytics connected to BigQuery and the BQ table schema is described here:
https://support.google.com/firebase/answer/7029846
I would like to find out how each event record can be uniquely identified.
Originally I thought that a combination of a
user_pseudo_id and event_timestamp
is to be unique. But I found out that it is not unique...
I added: event_date, event_name, event_previous_timestamp, stream_id, etc. into the 'group by' clause, but nothing helps.
Can anybody advise me, what makes the event record unique, please?
We are using Google Advertising ID as unique device ID. A user may have logged on to your app with multiple devices but using the same account, so in this case user_id is not unique, user_pseudo_id for the same device will change if he/she re-installs the app. Only assumption out here is that the user has not intentionally reset his/her GAID.The GAID field can be found under event_params with Key as gaid in BigQuery. Hope this helps!