Is `visitNumber the number of times a user visits a site to date?
But in the data (see screenshot below), I'm seeing visit numbers skipped, and the visitorId is also null.
1
I assume you are dealing with Google Analytics exported to BigQuery
If so:
visitorId is deprecated (thus nulls) and fullVisitorId should be used instead.
visitNumber is an INTEGER that represents session number for the user. If this is the first session, then this is set to 1.
fullVisitorId is a STRING that represents unique visitor ID (also known as client ID).
See more at BigQuery Export schema
Related
I am trying to match each user with the respective advertising_id, vendor_id, user_id and user_pseudo_id.
Since forever I was using the user_pseudo_id as the trusted unique identifier since it is always present in the data that returned (other ids sometimes return as NULL).
But lately I have noticed that user_pseudo_id can be the same for multiple users (symptom is there are events of multiple devices with the same user_pseudo_id; with differrent vendor_id and advertising_id though)
I cannot use CONCAT of those 3 ids as a unique key since most of the time 1 of them would be NULL or BLANK
Do you guys have any ideas to handle this problem or should I just ignore it since it doesn't really accounts for that much compares to the total number of users?
User Id implementation
Hello everybody, I implemented the User Id tracking for GA4. I did it properly, I am pretty sure I am collecting it correctly because I can see User Id in Real time report and also in Add Comparison tool.
I have 3 problems about how to exploit User Id in GA4 reports
#Problem 1
I don´t figure how to exploit User Id as user_property or custom dimension in my reports. I tried to do it in the Aquisition report but the User Id column show me (not set) value everywhere. When I add comparison with the Comparision tool it shows me up all User Id numeric value in the column, against the (not set) value of the predifined column...
#Problem 2
I also tried to exploit User Id info in the Hub analysis tool but it also shows me all user id in (not set).
#Problem 3
I ended up thinking that GA4 doesnt allow to exploit User Id in it's own interface so I went through BigQuery. But I turned crazy today when I saw that in Bigquery my user_id field had been populated without any value, it shows me "null" in the table.
Could someone help me, I would just like to be able to create a report with a list of all User Id in a row (dimension) and total events count in columns (metrics). How could I do it?
I noticed there are thousands of events duplicated in the events tables of BigQuery (in an integration with Firebase).
My definition of duplicated is: 2 or more events that share the same data in all these fields:
event_timestamp, event_name, user_pseudo_id, app_info.id, device.advertising_id
It happens for automatically collected events, and also custom events. I found some of the parameters that could differ from one result to the other are (what make those events different):
event_server_timestamp_offset, geo.continent, geo.country
I guess there is no reason for a duplicated event at the same moment, same user, same app, same device, but one event is geo.continent=America and the other geo.continent=Asia.
Any thoughts why this is happening? Thanks in advance.
Google's explanation is that Firebase data duplication in BigQuery is mostly related to network issues on the client's side that cause events to be buffered and sent twice.
However there is a way to deduplicate these events by using event_server_timestamp_offset. This field is difference between the time the event was sent to Google's server and when it was received.
This means that given the same event_timestamp, event_name and user_pseudo_id you could take only the event with lower event_server_timestamp_offset to have a correct result.
You can also safely delete duplicates records from your event table.
Sorry I can't share sources for this because the answer came from Google Analytics support, as I was encountering the same issue.
We use the QUALIFY clause for deduplication Firebase events in BigQuery:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Qualifying columns:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.
I am importing google Analytics data into bigquery session_streaming table using Owox BI. I have a requirement to count returning visits count using this data but result is not matching.
Business logic: If newVisits is null then its a returning visitor
Date Range: 10th june 2018
Source : Google
Medium: CPC
BigQuery Result: 136 Returning Visits
GA Account: 95(TotalUsers-New Users)
SELECT
count(distinct clientId ) as returningvisits
FROM `test.Test.session_streaming_20180610` where trafficSource.medium ='cpc' and trafficSource.source ='google' and newVisits is null
Schema of session streaming table
user RECORD NULLABLE
user.id STRING NULLABLE
user.phone STRING NULLABLE
user.email STRING NULLABLE
clientId STRING NULLABLE
date STRING NULLABLE
sessionId STRING NULLABLE
visitNumber INTEGER NULLABLE
newVisits INTEGER NULLABLE
There are few more fields.
Could you please help me whats wrong with this query ?
Mayank!
You've already contacted our support service and we're replied in the support chat.
Just to double check everything, we're sending you the reply here.
First of all, it's not the best idea to count new users in GA like Total Users - Returned Users.
Because a single user can be a New Visitor and a Returning Visitor in the same time period.
New Users are the first time visitors to your website who have initiated a session for the first time for a given date range. If the same users return back to your website on the same day or any other day then they’ll be considered as returning visitors in GA.
We recommend you to use the same logic of counting users in the segment, e.g. count New users in both GA and BQ.
And use GROUP BY instead of COUNT(DISTINCT ...), it gives a more accurate result in most cases.
I'm under the impression fullVisitorId being just a hash of clientId, there should be one-to-one mapping between the two. But here, I've a situation where few of the fullVisitorId are mapped to two different client Id (we're collecting GA Client ID into User scoped custom dimensions)
Is that possible ? under what circumstances?
Thanks for any clarification on this
Cheers!
[edit: ] attaching screenshot
You may be interested in reading about the Google Analytics schema for BigQuery. Some of the relevant parts are:
fullVisitorId: The unique visitor ID (also known as client ID).
visitId: An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.
So client ID and full visitor ID are synonymous, and if you want a unique ID for a particular visit, you should use a combination of fullVisitorId and visitId.