We use Firebase/Google analytics in our android app. Each event is saved with a lot of extra information (user-id, device-info, timestamps, user-properties, geographical location …). The extra info is there by default but we don’t want it to be collected.
We tried 2 things:
1) Update Big Query Schema
Delete the unwanted columns from Big Query. Unfortunately, Big Query creates a new export every day. So we would need to know where those fields are coming from. Something we don't know.
2) DefaultParameters within the app
Tried to use default parameters from inside the app, so the city will always be null. Here is an example with the user’s city
Bundle defaultValues = new Bundle();
defaultValues.putString("geo.city", null);
FirebaseAnalytics.getInstance(ctx).setDefaultEventParameters(defaultValues);
Unfortunately, we still see geo.city in our BigQuery data filled in.
Is there a way of changing what is collected by default?
There is no way to disable the geography information. Analytics uses IP addresses to derive the geolocation of a visitor. Probably the solution about update Big Query Schema is a viable way. You have to create a system that carries out this update on a daily basis precisely because the export takes place every day.
Related
I would appreciate some guidance on how to structure data stored within an app. While there are some reasons for the first way, I'm concerned it wouldn't be able to operate efficiently for the second case.
Simplified, the app would contain a list of Places by State. The main use case would be viewing Places within a selected State. The second use case would be that individual users could save specific Places they liked into their profile and view them all at once (showing all state Places in one list).
Option 1- Places saved in one "places" collection, which has a field of "state."
Main use: To show these places by state, the app would query where the "state" field matches the state.
Secondary use: When a user saved the place, the app would save the docID for each place into the user's profile, each of which would need to be retrieved to show the list of places.
Option 2- Have one collection per state.
Main use: To show these places by state, the app would pull all documents within the query and list them out.
Secondary use: When a user saved the place to the user's profile, the app would save the docID for each place into the user's profile, distributed across the different collections, each of which would need to be retrieved to show the list of places.
Goals:
Use the same place document to appear in both the State lists and the user's profile.
Minimize the number of calls/slowness as much as possible in the Secondary use case.
I have been reviewing Firestore data storage guidelines, but I would appreciate any thoughts from experienced developers regarding this data structure.
There is no "perfect", "the best" or "the correct" solution for structuring a Firestore database. We are usually structuring the database according to the queries that we intend to perform.
Regarding storing all the places in a single collection vs. having one collection per state, please note that there is no difference in terms of speed or costs. You'll always have to pay a number of reads that is equal to the number of documents that your query returns. However, if you need to display in your app, for example, all places of all states, then having a collection for each state, will require a separate query for each state.
Furthermore, regarding saving a list of places in a user's profile vs. storing only the IDs, it's a matter of measurement. You should measure how often the details within the places are changed. Remember that if a place is changed, then you should update that data in all places it exists. So if it's not changed so often then you can save the entire place object, otherwise, save only the ID.
I'm wondering if there's a possibility to fetch disaggregated data from Google, using their APIs.
Currently I'm able to already receive a quite detailed segmentation by selecting ga:source, ga:dateHourMinute, ga:country and others, but of course these are still groups of sessions.
Thanks a lot!
Not by default - there is no dimensions for sessions in the API, and not even the client id is exposed via the API.
An easy way to obtain a session marker is to store a random number in a session scoped custom dimension. Since a session scoped dimension by definition stores only the last value in the session this will give you an unique (well, not technically unique, but unique enough) value per session, which can be use in conjunction with the client id, which you'd need to store in another custom dimension.
Of course since this will give you a lot of single rows you will be running into API limits pretty soon.
In a GA360 account you could use BigQuery - the BQ export schema includes session identifiers.
I thought Datastore's key was ordered by insertion date, but apparently I was wrong. I need to periodically look for new entities in the Datastore, fetch them and process them.
Until now, I would simply store the last fetched key and wrongly query for anything greater than it.
Is there a way of doing so?
Thanks in advance.
Datastore automatically generated keys are generated with uniform distribution, in order to make search more performant. You will not be able to understand which entity where added last using keys.
Instead, you can try couple of different approaches.
Use Pub/Sub and architecture your app so another background task will consume this last added entities. On entities add in DB, you will just publish new Event into Pub/Sub with key id. You event listener (separate routine) will receive it.
Use names and generate you custom names. But, as you want to create sequentially growing names, this will case performance hit on even not big ranges of data. You can find more about this in Best Practices of Google Datastore.
https://cloud.google.com/datastore/docs/best-practices#keys
You can add additional creation time column, and still use automatic keys generation.
Lets say I'm making an app with firebase where the user can create permanent lobbies in which they can send permanent dated messages to. The lobby's name is a key in my data structure. What I want to do is that each time a new lobby is created, an index is automatically created on the server side to sort the messages of that lobby by date.
That can probably be done if I have another server listening in to the creation of new lobbies but is there a way to do this without having an additional server? Just through the client? Without compromising the security of the app?
(Note: I'm using the Unity sdk)
There is no way to programmatically add an index, short from updating a rules.json file and uploading it with the Firebase tools/CLI, which I'd highly recommend against.
If you find you need to dynamically add indexes, you've probably structured your data wrong. But without seeing a minimal sample of the JSON (as text, no screenshots please) that reproduces the problem, it is impossible to say more than that.
You can use the Push() function on a database reference. This will create a unique key based on the timestamp so all values can easily be sorted chronologically.
Use Push() anytime you need to generate a new unique key on your database. You can use this for the lobby itself and even the conversations within the lobby.
Source
I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.