ga_session_id is unique identifier on Google Analytics? - firebase

I checked the official docs here:
https://support.google.com/firebase/answer/7061705?hl=en
But, when I checked my data, there are several user_id and user_pseudo_id in one ga_session_id.
How is it possible?

ga_session_id is not supposed to be globally unique (afaik it's based on a skewed in-device timestamp) but in most circumstances (except for edge cases) it should be locally unique for a given user_pseudo_id

session_id is not unique (two or more users can have the same session_id).
It is just timestamp when the session started, so we need to concat session_id and user_pseudo_id or user_id.

I think the problem is the query used, with this I see the unique values associated correctly in my data:
SELECT event_timestamp, user_id, user_pseudo_id, event_name, event_params.value.int_value AS session_id
FROM `MYTABLE`,
UNNEST (event_params) AS event_params
WHERE event_params.key = "ga_session_id" LIMIT 1000

Related

Is there a way to duplicate sessions from the app-warming issue in Firebase < 8.11.0 via BigQuery?

In Version 8.12.1 of the Firebase Apple SDK an issue with session_start events being logged during app prewarming on iOS 15+ which inserts additional 'session_start' events. I've noticed that as a result of additional session start rows which inserts additional 'ga_session_id' values into the BigQuery table.
ga_session_id is a unique session identifier associated with each event that occurs within a session and is thus created when this additional session_start fires when the app_warming occurs - using the session_number field and calculating session length it's possible to remove sessions with just one session_start and a small session length but this does not seem to reduce the overall count of sessions by much.
This has impacted the reported number of sessions when querying the BigQuery table when counting distinct user_psuedo_id||ga_session_id.
Is there a way to isolate these sessions in a separate table or constrict them from the query using an additional clause in said query to remove the sessions which are not truly sessions.
https://github.com/firebase/firebase-ios-sdk/issues/6161
https://firebase.google.com/support/release-notes/ios
A simplified version of said query I'm using:
with windowTemp as
(
select
PARSE_DATE("%Y%m%d",event_date) as date_formatted,
event_name,
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
from
`firebase-XXXX.analytics_XXX.events_*`
where
_table_suffix between '20210201' and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
group by
1,2,3,4
)
SELECT
date_formatted,
Count(DISTINCT user_pseudo_id) AS users,
Count(DISTINCT Concat(user_pseudo_id,session_id)) AS sessions,
FROM
windowTemp
GROUP by 1
ORDER BY 1

Streaming Google Analytics 4 data to BigQuery causing data collection issues

We have configured a linking between the GA 4 property and GoogleBigQuery via the GA interface (without any additional code). It works fine, we see a migrated data in GBQ tables, but however, we face an issue with how this data is written in those tables.
If we look at any table we could see that events from different users can be recorded in one session (and there can be different clientIDs (and even usedIDs, which we pass when authorizing a user)) See an example
This is a result of executing following query:
SELECT
event_name,
user_pseudo_id,
user_id,
device.category,
device.mobile_brand_name,
device.mobile_model_name,
device.operating_system_version,
geo.region,
geo.city,
params.key,
params.value.int_value
FROM `%project_name%.analytics_256374149.events_20210331`, unnest(event_params) AS params
WHERE event_name="page_view"
AND params.value.int_value=1617218965
ORDER BY event_timestamp
As a result, you can see that within one session different users from different regions, with different devices and identifiers are combined. It is, of course, impossible to use such data for reporting purposes. Once again, it is a default GA4 → BigQuery setup in the GA4 interface (no add-ons).
We do not understand what the error is (in import, in requests, or somewhere else) and would like to get advice on this issue.
Thanks.
You should look at the combination of user_pseudo_id and the event_param ga_session_id. This combination is unique and used for measuring unique sessions across a property.
For example, this query counts the number of unique event names in each session:
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
COUNT(DISTINCT event_name) AS unique_event_name_count
FROM `<project>.<dataset>.events_*`
GROUP BY user_pseudo_id, ga_session_id

How to get gender and age in BigQuery from Firebase Analytics?

I am using Analytics Events and trying to take advantage of the user data.
I can get pretty much data.
With this Query.
SELECT
*
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
However, I don't need all. So, I did
SELECT
event_params.value.string_value,
event_previous_timestamp,
device,
geo,
app_info
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
And then, I realized that the result doesn't have gender data and age data. And in the document, it says Firebase automatically gets the information. I'd like to combine sex, age(or age group) with the result from the query above.
How can I get it?
Note that this document is just an example on how to query BigTable data by using BigQuery and not from Firebase.
The Firebase layout mentions that it has a RECORD field named "user_properties" which has a "key" STRING field.
Thus, you could try:
SELECT DISTINCT user_properties.key
FROM
`test-project-23471.analytics_205774787.events_20191120`
To retrieve the correct name for the gender/sex property an include it in your query. For instance:
SELECT
event_params.value.string_value,
event_previous_timestamp,
device,
geo,
app_info,
user_properties.value.string_value as gender
FROM
`test-project-23471.analytics_205774787.events_20191120`,
UNNEST(event_params) AS event_params
WHERE
event_name ='select_content'
AND event_params.value.string_value = 'a_item_open'
AND user_properties.key = "Gender"
Nevertheless, if you don't find the Gender info, please consider this. Otherwise, I suggest reaching the Firebase support.
Hope it helps.
For privacy reasons these fields are not available in BigQuery export. You can only see aggregated data for gender and age in Firebase Analytics console.
You can't even use them for targeting in other Firebase features, like RemoteConfig, so user-level granularity is not possible.

Firebase events dedup in Big Query - best practices?

There seems to be 1-2% of duplicates in the Firebase analytics events exported to Big Query. What are the best practices to remove these?
Atm the client does not send a counter with the events (per session). This would provide an unambiguous way of removing duplicate events, so I recommend Firebase implementing that. However, at the moment, what would be a good way to remove the duplicates? Look at client user_pseudo_id, event_timestamp, and event_name - fields and remove all except one with same triple?
How does event_bundle_sequence_id -field work? Will duplicates have the same value in this field, or different? That is, are duplicate events sent within the same bundle, or in different bundles?
Is Firebase planning to remove these duplicates earlier in the processing, either for Firebase analytics itself, or in the export to Big Query?
Standard SQL to check for duplicates in one days events:
with n_dups as
(
SELECT event_name, event_timestamp, user_pseudo_id, count(1)-1 as n_duplicates
FROM `project.dataset.events_20190610`
group by event_name, event_timestamp, user_pseudo_id
)
select n_duplicates, count(1) as n_cases
from n_dups
group by n_duplicates
order by n_cases desc
We use the QUALIFY clause for deduplication Firebase events in BigQuery:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Qualifying columns:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.

SQLite: SELECT from grouped and ordered result

I'm new to SQL(ite), so i'm sorry if there is a simple answer i just were to stupid to find the right search terms for.
I got 2 tables: 1 for user information and another holding points a user achieved. It's a simple one to many relation (a user can achieve points multiple times).
table1 contains "userID" and "Username" ...
table2 contains "userID" and "Amount" ...
Now i wanted to get a highscore rank for a given username.
To get the highscore i did:
SELECT Username, SUM(Amount) AS total FROM table2 JOIN table1 USING (userID) GROUP BY Username ORDER BY total DESC
How could i select a single Username and get its position from the grouped and ordered result? I have no idea how a subselect would've to look like for my goal. Is it even possible in a single query?
You cannot calculate the position of the user without referencing the other data. SQLite does not have a ranking function which would be ideal for your user case, nor does it have a row number feature that would serve as an acceptable substitute.
I suppose the closest you could get would be to drop this data into a temp table that has an incrementing ID, but I think you'd get very messy there.
It's best to handle this within the application. Get all the users and calculate rank. Cache individual user results as necessary.
Without knowing anything more about the operating context of the app/DB it's hard to provide a more specific recommendation.
For a specific user, this query gets the total amount:
SELECT SUM(Amount)
FROM Table2
WHERE userID = ?
You have to count how many other users have a higher amount than that single user:
SELECT COUNT(*)
FROM table1
WHERE (SELECT SUM(Amount)
FROM Table2
WHERE userID = table1.userID)
>=
(SELECT SUM(Amount)
FROM Table2
WHERE userID = ?);

Resources