Google Cloud Datastore avoid entity overhead when fetching - google-cloud-datastore

Is it possible to avoid the entity overhead when fetching data in the cloud datastore ? Ideally I'd love to have SQL-like results : simple arrays/objects with key:values instead of
{
"batch": {
"entityResultType": "FULL",
"entityResults": [
{
"entity": {
"key": {
"partitionId": {
"datasetId": "e~******************"
},
"path": [
{
"kind": "Answer",
"id": "*******************"
}
]
},
"properties": {
"value": {
"stringValue": "12",
"indexed": false
},
"question": {
"integerValue": "120"
}
}
}
}
],
"endCursor": "********************************************",
"moreResults": "MORE_RESULTS_AFTER_LIMIT",
"skippedResults": null
}
}
which has just too much overhead for me (I plan on running queries over thousands of entities). I just couldn't find in the docs if that's possible or not.

You can use projection queries to query for specific entity properties.
In your example, you could use a projection query to avoid returning the entity's key in the query results.
The other fields are part of the API's data representation and are required to be returned, however, many of them (e.g. endCursor) are only returned once per query batch, so the overhead is low when there are many results.

Related

Discrepancy between UI and API for transactions/purchase events for GA4

My goal is to pull in purchase transaction count and transaction revenue from a client, segmented by Google Ads campaign ID. My current query looks like the following:
{
"propertyId": "*********",
"query": {
"dateRanges": [
{
"startDate": "2022-09-30",
"endDate": "2022-10-06"
}
],
"dimensions": [
{
"name": "googleAdsCampaignId"
},
{
"name": "googleAdsCampaignName"
}
],
"metrics": [
{
"name": "advertiserAdClicks"
},
{
"name": "advertiserAdCost"
},
{
"name": "transactions"
},
{
"name": "purchaseRevenue"
}
],
"dimensionFilter": {
"andGroup": {
"expressions": [
{
"filter": {
"fieldName": "googleAdsCustomerId",
"stringFilter": {
"matchType": "EXACT",
"value": "*********",
"caseSesnsitive": false
}
}
}
]
}
}
}
}
What I expect when querying the 'transactions' dimension is, as the API schema describes, "The count of transaction events with purchase revenue. Transaction events are in_app_purchase, ecommerce_purchase, purchase, app_store_subscription_renew, app_store_subscription_convert, and refund."
The response from my query comes back with these numbers, for an example campaign:
{
"dimensionValues": [
{
"value": "***********",
"oneValue": "value"
},
{
"value": "Example Campaign",
"oneValue": "value"
}
],
"metricValues": [
{
"value": "2482480",
"oneValue": "value"
},
{
"value": "6492393600000",
"oneValue": "value"
},
{
"value": "331",
"oneValue": "value"
},
{
"value": "31374.205645000002",
"oneValue": "value"
}
]
}
However, if, in the GA4 dashboard, I attempt to view a report of purchase conversions by campaign over the same date range, this is what is displayed for the 'example campaign':
6 Example Campaign 239.47 31,981.63
Where 239.47 is the number of transactions, and 31,981.63 is the event value (transaction revenue). Notably, the transactions are off by over 25%. The revenue/event value is similar but also off by a slight amount. This is consistent for all campaigns under the client, with the API response being significantly (but by varying percentages) higher. The dashboard value is always lower. These numbers don't change if the report is run on different dates.
Additionally, I suspected that there may be some additional event being tracked under the transactions api field that was not displaying in the dashboard, so I also tried adding this filter to my query:
{
"filter": {
"fieldName": "eventName",
"stringFilter": {
"matchType": "EXACT",
"value": "purchase",
"caseSesnsitive": false
}
}
}
the transactions field still came back as 331.
What I want to figure out is if I'm querying the wrong field, the frontend is under reporting data, or the API is over reporting data. I found that I was not able to post on the official GA4 Issue tracker so I've come here.
Someone better versed in GA4 at my company explained that the issue here had to do with dimension scopes. The transactions field is session scoped, but the dimensions I was pulling in were event scoped. This means that the numbers being output were basically meaningless. As a matter of fact through the dashboard, it won't let you create a free-form explore report using the combination of fields that I did in my query, explaining that the metrics are incompatible. It appears to just be a bug that the API would allow that to happen.
As an alternative, I'm now pulling the conversions field, with a filter on eventName as listed above, including only purchase events.

Non Recursive Index CosmosDb

As a CosmosDB (SQL API) user I would like to index all non object or array properties inside of an object.
By default the index in cosmos /* will index every property, our data set is getting extremely large (expensive) and this strategy is no longer optimal. We store our metadata at the root and our customer data wrapped inside of an object property data.
Our platform restricts queries on the data path to be value type properties, this means that for us to index objects and arrays nested under the data path is just slowing down writes and costing RUs to store but never getting used.
I have tried several iterations of index policies but cannot find one that fits. Example:
{
"partitionKey": "f402a704-19bb-4f4d-93e6-801c50280cf6",
"id": "4a7a11e5-00b5-4def-8e80-132a8c083f24",
"data": {
"country": "Belgium",
"employee": 250,
"teammates": [
{ "name": "Jake", "id": 123 ...},
{ "name": "kyle", "id": 3252352 ...}
],
"user": {
"name": "Brian",
"addresses": [{ "city": "Moscow" ...}, { "city": "Moscow" ...}]
}
}
}
In this case I want to only index the root properties as well as /data/employee and /data/country.
Policies like /data/* will not work because it would then index /data/teammates/name ... and so on.
/data/? => assumes data is a value type which it never will be so this doesn't work.
/data/ and /data/*/? and /data/*? are not accepted by cosmos as valid policies.
Additionally I can't simply exclude /data/teammates/ and /data/user/ because what is inside of data is completely dynamic so while that might cover this use case there are several 100k others that it would not.
I have tried many iterations but it seems that options don't work for various reasons, is there a way to support what I am trying to do?
This indexing policy will index the properties you are asking for.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/partitionKey/?"
},
{
"path": "/data/country/?"
},
{
"path": "/data/employee/?"
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}

Google Firestore - REST API without pagination

I've set up a database in Google Firebase - Firestore and already uploaded just about 500 documents into a single collection.
Now, I've also figured out the REST API to read "all" of the documents from that collection:
GET https://firestore.googleapis.com/v1/projects/my-project/databases/(default)/documents/my-collection
This request returns following response structure:
{
"documents": [
{
"name": "projects/my-project/databases/(default)/documents/my-collection/3D-Technologien7a69809a-dc97-4663-a661-2df3f6f034dd",
"fields": {
"start": {
"stringValue": "28.10.2021"
},
"location": {
"stringValue": "Hb"
},
"name": {
"stringValue": "3D-Technologien"
},
"end": {
"stringValue": "30.10.2021"
}
},
"createTime": "2020-10-09T16:22:53.865356Z",
"updateTime": "2020-10-09T16:22:53.865356Z"
}
],
"nextPageToken": "AFTOeJymPSjChtwIgEMwViyHTAnXjWr_wNAQ7t8h9ZioAI-tJRX7WxdS7TBCeHvREsNgmEGOSLtuDI_YCjIWU2BdlT1-QDoIPHtG4QCxxcjJaSB5NRA-o7smLhr6Pxil0tPUlLBdiLwyjOD-c7qm7QNTR9SqL7HNI3uNKbZi7MG3ng-WMLpoN6MWZ7uGjQGpgfysbwk"
}
It's pretty clear that the response is based on pagination, but I don't find any documentation on this and even worse, on how to adjust this. Currently, max. 20 documents are being returned for the request. I want all 500 to be returned.
Is there any configuration option for this?
According to the docs you can use query params pageSize and pageToken
?pageSize=20&pageToken=ABCDEF1234567890
https://cloud.google.com/firestore/docs/reference/rest/v1beta1/projects.databases.documents/list

Redux + Normalizr : Adding and deleting normalized entities in Redux state

I have an API response which has a lot of nested entities. I use normalizr to keep the redux state as flat as possible. For eg. the api response looks like below:
{
"id": 1,
"docs": [
{
"id": 1,
"name": "IMG_0289.JPG"
},
{
"id": 2,
"name": "IMG_0223.JPG"
}
],
"tags": [
{
"id": "1",
"name": "tag1"
},
{
"id": "2",
"name": "tag2"
}
]
}
This response is normalized using normalizr using the schema given below:
const OpeningSchema = new schema.Entity('openings', {
tags: [new schema.Entity('tags')],
docs: [new schema.Entity('docs')]
});
and below is how it looks then:
{
result: "1",
entities: {
"openings": {
"1": {
"id": 1,
"docs": [1,2],
"tags": [1,2]
}
},
"docs": {
"1": {
id: "1",
"name": "IMG_0289.JPG"
},
"2": {
id: "2",
"name": "IMG_0223.JPG"
}
},
"tags": {
"1": {
"id": 1,
"name": "tag1"
},
"2": {
"id": 2,
"name": "tag2"
}
}
}
}
The redux state now looks something like below:
state = {
"opening" : {
id: 1,
tags: [1,2],
docs: [1,2]
},
"tags": [
{
"id":1,
"name": "tag1"
},
{
"id":2,
"name": "tag2"
}
],
"docs": [
{
"id":1,
"name": "IMG_0289.JPG"
},
{
"id":2,
"name": "IMG_0223.JPG"
}
]
}
Now if I dispatch an action to add a tag, then it adds a tag object to state.tags but it doesn't update state.opening.tags array. Same behavior while deleting a tag also.
I keep opening, tags and docs in three different reducers.
This is an inconsistency in the state. I can think of the following ways to keep the state consistent:
I dispatch an action to update tags and listen to it in both tags reducer and opening reducer and update tags subsequently at both places.
The patch request to update opening with tags returns the opening response. I can again dispatch the action which normalizes the response and set tags, opening etc with proper consistency.
What is the right way to do this. Shouldn't the entities be observing for changes to the related entities and make the changes itself. Or there are any other patterns that could be followed any such action.
First to summarise how normalizr works: normalizr flattens nested API response to entities defined by your schemas. So, when you made your initial GET openings API request, normalizr flattened the response and created your Redux entities and the flattened objects: openings, docs, tags.
Your suggestions are viable, but I find normalizr's real benefit in separating API data from UI state; so I don't update the data in Redux store myself... All my API data are kept in entities and they are not altered by me; they are vanilla back-end data... All I do is to do a GET upon state changing API operations, and normalise the GET response. There is a small exception for DELETE case that I'll expand on later on... A middleware will deal with such cases, so you should use one if you haven't been using. I created my own middleware, but I know redux-promise-middleware is quite popular.
In your data set above; when you add a new tag, I assume you are making an API POST to do so, which in turn updates the back-end. Then, you should do another GET openings which will update the entities for openings and all its nested schemas.
When you delete a tag, e.g. tag[2], upon sending the DELETE request to the back-end, you should nullify the deleted object in your entities state, ie. entities.tags[2] = null before making the GET openings again to update your normalizr entities.

How to form inner SubQuery in Gremlin Server (Titan 1.0)?

I'm using Following Query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
).select('notificationInfo','postInfo')
it is giving following result :
{
"requestId": "9846447c-4217-4103-ac2e-de3536a3c62a",
"status": {
"message": "",
"code": ​200,
"attributes": { }
},
"result": {
"data": [
{
"notificationInfo": {
"id": "c0zs-fw3k-347p-g2g0",
"label": "Notification",
"type": "edge",
"inVLabel": "Comment",
"outVLabel": "User",
"inV": ​749664,
"outV": ​741440,
"properties": {
"ParentPostId": "823488",
"PostedDate": "2016-05-26T02:35:52.3889982Z",
"PostedDateLong": ​635998269523889982,
"Type": "CommentedOnPostNotification",
"NotificationInitiatedByVertexId": "1540312"
}
},
"postInfo": {
"id": ​749664,
"label": "Comment",
"type": "vertex",
"properties": {
"PostImage": [
{
"id": "amto-g2g0-2wat",
"value": ""
}
],
"PostedByUser": [
{
"id": "am18-g2g0-2txh",
"value": "orbitpage#gmail.com"
}
],
"PostedTime": [
{
"id": "amfg-g2g0-2upx",
"value": "2016-05-26T02:35:39.1489483Z"
}
],
"PostMessage": [
{
"id": "aln0-g2g0-2t51",
"value": "hi"
}
]
}
}
}
],
"meta": { }
}
}
I want to get information of Vertex "NotificationInitiatedByVertexId" (Edge Property ) in the response as well.
For that i tried following query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,2).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
g.V(1540312).next().as('notificationByUser')
).select('notificationInfo','postInfo','notificationByUser')
Note : I tried directly with vertex Id in subquery as I wasn't aware how to dynamically get value from edge property in query itself.
It is giving error. I tried a lot but am not able to find any solution.
I'm assuming that you are storing a Titan generated identifier in that edge property called NotificationInitiatedByVertexId. If so, please consider the following even though this first part doesn't really answer your question. I don't think you should store a vertex identifier on the edge. Your graph model should explicitly track the relationship of NotificationInitiatedBy with an edge and by storing the identifier of the vertex on the edge itself you are bypassing that. Also, if you ever have to migrate your data in some way, the ids won't be preserved (Titan will generate new ones) and trying to sort that out will be a mess.
Even if that is not a Titan generated identifier and a logical one you created, I still think I would look to adjust your graph schema and promote that Notification to a vertex. Then your Gremlin traversals would flow more easily.
Now, assuming you don't change that, then I don't see a reason to not just issue two queries in the same request and then combine the results to one data structure. You just need to do a lookup with the vertex id which is going to be pretty fast and inexpensive:
edgeStuff = g.V(741440).outE('Notification').
order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').
... // whatever logic you have
select('notificationInfo','postInfo').next()
vertexStuff = g.V(edgeStuff.get('notificationInfo').value('NotificationInitiatedByVertexId')).next()
[notificationInitiatedBy: vertexStuff, notification: edgeStuff]

Resources