Difference ClientId - FullVisitorId - google-analytics

The documentation of the fullvisitorId states that it is "the unique visitor ID (also known as client ID)". The value of the fullvisitorId is however structured differently when comparing the clientId and the fullvisitorId in BQ. One question stated that the fullvisitorId is a hashed version of the clientId, however I cannot find an official documentation stating that.
My question is why are there two parameters stating the same information and if they are not the same what is the difference?

They now also added clientId to the export schema. There it says:
Unhashed version of the Client ID for a given user associated with any given visit/session.
In the measurement protocol reference they state:
This field is required if User ID (uid) is not specified in the request. This anonymously identifies a particular user, device, or browser instance. For the web, this is generally stored as a first-party cookie with a two-year expiration. For mobile apps, this is randomly generated for each particular instance of an application install. The value of this field should be a random UUID (version 4) as described in http://www.ietf.org/rfc/rfc4122.txt.
So, it's randomly generated - for web it's a cookie, for app it's set per install (or if the IDFA changes).
On the user id documentation they confirm this randomization by contrasting the two ids:
Randomly generated and automatically sent with all hits by Analytics libraries.
The hashing algorithm is not known. But in principle the only difference between the two is the hashing.
However, you can apply the hashing algorithm to your client ids since July 17, 2018 using the hashClientId method provided by the Management API. To do that, the API wants a client id and a web property id. Although providing the web property id could mean that they use it to salt the hash - they're not doing it. fullVisitorId is the same thing for different properties (tested using the Management API).
Which basically means they didn't want you to connect data using the client id and basically pseudo-/anonymized it. Now they're allowing it and you can connect data from different sources by either
creating the fullvisitorid using the Management API (for older data)
or using the client id from the new field (for newer data)

Related

Firebase custom authentication how to choose unique user ID?

Firebase is great as it offers a lot of authentication providers. In one of my apps, I use four different providers provided by Firebase (Email, Twitter, Facebook and Google), but I also need to let users sign in via LinkedIn.
As Firebase SDK does not offer LinkedIn, I need to implement the login flow manually, which doesn't seem to be difficult, but there is one huge issue which I see. During the creation of a custom JWT token, I need to assign a user ID. And I have no idea how to generate one while making sure that my approach will not conflict with user IDs which Firebase generate on its own for other providers.
For example, let's imagine that a user Andriy Gordiychuk signs in via LinkedIn and his email address is andriy#gordiychuk.com. A simple way to create a user ID would be to take an email address (andriy#gordiychuk.com) and to randomise it using some hashing function. I would get some random id such as aN59nlphs... which I would be able to recreate as long as the same user signs in. So far, so good.
However, how can I be sure that the ID which I get is not already used by another user who signed in via Twitter, for example?
One way to mitigate this issue is to store LinkedIn user IDs in a Firestore collection. Then, when I need to create a token, I first check whether I already have an ID for this user. If not, I would hash the email address, and I would try to create a user with this ID. If this ID is already occupied, I would then try to create another ID until I stumble upon an ID which is not occupied, and I would then use it.
I don't like this approach for two reasons:
Although the chance that I would generate an already occupied ID
is small, theoretically the process of finding an "available ID" can
take a lot of steps (an infinite loop in a worst-case scenario).
Once I find an available ID, I must store it. Given that all these calls are asynchronous there is a real chance that I would create a user with a suitable ID, but because the save operation fails, I would not be able to use this ID.
So, does anyone know how to choose user IDs for such use case correctly?
It's fairly common to generate a string with enough entropy (randomness) to statistically guarantee it will never be duplicated. This is for example behind the UUID generators that exist in many platforms, and similarly behind Firebase Realtime Database's push keys, and Cloud Firestore's add() keys. If there's one in your platform, I recommend starting with that.
Also see:
The 2^120 Ways to Ensure Unique Identifiers, which explains how Firebase Realtime Database's push() works.
Universally unique identifier, Version 4 on Wikipedia
the uuid npm module

Firebase browser key API restrictions

When creating a new project Firebase generates browser API keys automatically in the GCP API credentials. This is the same API key that is set in the Firebase Web client SDKs and is publicly available.
By default the key has no restrictions, so it's prone to quota stealing for every API enabled for that project. Surprisingly I have not found information about securing this key in the Firebase documentation.
So I took two extra steps to secure the key:
Added HTTP referrer restriction to allow requests from my domain only.
Added Identity Toolkit API to the list of allowed APIs. Experimentally I've figured out that it's enough for Firebase Auth and Firestore to work.
Added Token Service API. This is needed for refresh tokens to work and keep the authentication.
My question is mostly related to points #2-3. What are the APIs that needs to be enabled for various components of Firebase to work on the web?
I also enabled those same two APIs, but I used the Metrics Explorer to see what the various Firebase-created keys had been using based on actual traffic.
In GCP,
Go to Monitoring -> Metrics Explorer
Click 6W in the time range above the graph
Resource Type, start typing consumed_api and select it
Metric, choose Request Count
Group By, type credential_id, select it, then type service, and select it
Aggregator, select sum
By now, the legend for the graph should list all the credential ids and which services they used in the last 6 weeks. You should be able to figure out the APIs from the service.
You can use Filter to filter by credential_id if the results are too noisy.
By default the key has no restrictions, so it's prone to quota
stealing for every API enabled for that project.
This is indeed possible and I am able to make e. g. Google Maps API call with the auto generated Firebase API key.
Such preconfigured behaviour was certainly unexpected and I am now experimenting with the restrictions as per the extra steps described in the original question.

Erase firebase instance with app_instance_id from BigQuery

It is possible to erase a customer with InstanceID to comply with GDPR: https://godoc.org/firebase.google.com/go/iid#Client.DeleteInstanceID
However we do not have historical Firebase Instance IDs. BigQuery has a field app_info.app_instance_id but this is not a valid instance ID.
Is it possible to erase a customer with app_instance_id?
An app instance ID identifies (as its name implies) an app instance. It does not identify a specific user. While it is quite common to associate IIDs with users, Firebase has nothing built in for that. This means that, unless you have the data in your database, there is no way to find out the associated IIDs for a user by calling the API.

Tracking userId across multiple Alexa skills

If I have created multiple Alexa skills, is there a userId that would remain the same across all the skills? Specifically, if user does an action in Skill 1, I'd like to be aware of it for Skills 2 and Skills 3... and essentially allow the skills to share the same DynamoDB table.
Ideally I wouldn't require the user to do any sort of login, but it would know it's the same user based on a unique identifier tied to their Amazon account.
No. About a year ago Amazon made specific changes to prevent you from doing that. You also can't identify a user who uninstalls your skill and then reinstalls it. You always get a new random user id.
The same thing has been happening for mobile development: Google and Apple are blocking access to anything that would allow you to ID a physical device, or to ID a user of different app installs without doing some sort of account linking - so I doubt Amazon is going to relax about this.
According to this part of the documentation, it should be possible by using the deviceId instead of userId:
device
An object providing information about the device used to send the
request. The device object contains both deviceId and
supportedInterfaces properties. The deviceId property uniquely
identifies the device. [...]
You find it as:
deviceId = this.event.context.System.device.deviceId
It is the same value you need to use for requesting the device address with the new address api. Hence, I don't see why it shouldn't work for your needs also.

Google Measurement Protocol Client ID Server Side

When my server gets a request via webservice i want to send a pageview hit to my google analytics.
Measurement Protocol requires a Client ID, in this case i want to set the server as id as client.
Should i just fake data like &cid=1 or there is any other way?
The client id is there to aggregate interactions into sessions and users. If you have no need to separate visits into different users you can generate a static client id and use that - you then will have only a single user which makes the reports look a bit strange, but as far as technology goes there is nothing wrong with simply using an random integer (the recommended format is UUID, but that's just to avoid situations where different users get the same id. This does not seem to be an issue).
However if your webservice users send along a token that is unique per user I suggest you use that (or rather a hash value based on that). This will allow you, amongst other thing, to see how many different users use your webservice.

Resources