Looking at datastore field that is not indexed - google-cloud-datastore

I have an id field that is indexed and a boolean field x that is not indexed. Is there any way to view all the entities with x set to true without the following?
having a set of ids to filter by
scrolling through the UI page by page

Unfortunately, no. Cloud Datastore requires an index to query for a property. You could write a script to generate the list of IDs. For example, in python:
from google.cloud import datastore
client = datastore.Client()
query = client.query(kind='foo')
results = list(query.fetch())
for i in results:
if i['x'] == True:
print('Entity {} with id {} has x = True'.format(i.key, i['id']))

Related

Firebase query subcolldections by documentId

I have a doc tree which looks like:
connections(collection):
connection1(docId):
...fields...
intentItems(subcollection):
intentItem1(docId):
...fields...
intentItem2(docId):
...fields...
connection2(docId):
...fields...
intentItems(subcollection):
intentItem3(docId):
...fields...
intentItem4(docId):
...fields...
As an admin, I want to read an intentItem by id without knowing its connection id.
I tried;
import {doc, documentId, FieldPath} from "#firebase/firestore";
import {query, where} from "firebase/firestore";
//FirebaseError: Invalid query. When querying a collection group by documentId(), the value provided must result in a valid document path, but 'XXXX' is not because it has an odd number of segments (1).
query(collectionGroup(db, Collections.intentItems), where(documentId(), "==", term))
//this returns with empty (** used as wildcard in rules)
useDocument(doc(db, Collections.connections, '**', Collections.intentItems, term))
Do I really need to index an id field in the intentItems if I want to get them back as a result?
Each entry in a Firestore index has to be unique. For indexes on a single collection, it automatically adds the document ID to ensure this. But in a collection group index, multiple documents might have the same ID. That's why, in a collection group index Firestore actually stores the entire document path instead of just the document ID.
o that means that this code won't work:
where(documentId(), "==", term)
As a workaround I recommend storing the document ID in the document itself too, creating a separate index with that value in it, and then filtering on that field rather than the built in documentId().

Why is project id attached to my Datastore Key object?

For some inexplicable reason, my project id is attached to the Key of my User entity:
<Key('User', 5703358593630208), project=my-project-id>
This is giving me issues, such as when I am trying to use this same key as an ancestor of another entity — I would get this error:
google.cloud.ndb.exceptions.BadValueError: Expected Key instance, got <Key('User', 5703358593630208), project=my-project-id>
I created the User entity like this:
from google.cloud import datastore
datastore_client = datastore.Client()
def save_user(name):
key = datastore_client.key('User')
user = datastore.Entity(key=key)
user.update({
'name': name,
'created': datetime.datetime.utcnow()
})
datastore_client.put(user)
Additional Example: Making an ancestral query
query = MyEntity.query(ancestor=user_key)
TypeError: ancestor must be a Key; received <Key('User', 5752652897976320), project=my-project-id>
What could be the explanation for this?
I believe the issue is that you are using both google.cloud.datastore and the NDB library and the key objects are not compatible. Here's an example of converting a datastore client key to an NDB key:
from google.cloud import datastore
from google.cloud import ndb
# Start with a google.cloud.datastore key
datastore_client = datastore.Client()
datastore_key = datastore_client.key('Parent', 'foo', 'User', 1234)
def key_to_ndb_key(key):
# Use flat_path property to create an ndb_key
key_path = key.flat_path
ndb_key = ndb.Key(*key_path)
return ndb_key
# Convert to a ndb key
ndb_client = ndb.Client()
with ndb_client.context() as context:
ndb_key = key_to_ndb_key(datastore_key)
print(ndb_key)
Entities are partitioned into subsets, currently identified by a project ID and namespace ID.
for more reference please check google doc and this.

Boto3 dynamodb query on a table with 10Gb size

I have been trying to fetch all the records on one of my GSI and have seen that there is a option to loop through using the LastEvaluatedKey in the response only if I do a scan. I did not find a better way to use pagination using query in boto3. Is it possible to paginate using a query.
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
res = table.query(
TableName='myTable',
IndexName='my-index',
KeyConditionExpression=Key('myVal').eq(1)
)
while 'LastEvaluatedKey' in res:
for item in res['Items']:
print item #returns only a subset of them
The document mentioned the limit of boto3.dynamodb.table.query() : 1MB data.
You can only use Paginator.Query return iterator(which make sense).
It seems you can replace your table.query with the Paginator.Query. Try it out.
Notes :
There is a catch for boto3.resource() : not all resources services are implemented. So for the dynamodb pagination generator, this is one of those case.
import boto3
dyno_client = boto3.client('dynamodb')
paginator = dyno_client.get_paginator('query')
response_iterator = paginator.paginate(.....)

Google Cloud Endpoints adding extra parameters

I'm using the 'endpoints-proto-datastore' library and a bit lost in how to add extra parameters to my requests.
Basically I want to add these fields [ID, token] with ID being required. Blossom.io is doing something similar, here Blossom.io Api
Here's my Post method
#Doctor.method(path='doctor', http_method='POST', name='doctor.insert')
def DoctorInsert(self, doctor):
#Edit
Without the Proto-Datastore library:
request = endpoints.ResourceContainer(
message_types.VoidMessage,
id=messages.IntegerField(1,variant=messages.Variant.INT32),
token=messages.IntegerField(2, variant=messages.Variant.INT32)
)
#endpoints.method(request, response,
path='doctor/{id}', http_method='POST',
name='doctor.insert')
How can I do the same using the proto-datastore library?
The way I do it is to add another property to the model decorated with #EndpointsAliasProperty and a setter. I wouldn't call it ID because it may confuse with the App Engine built-in ID.
class Doctor(EndpointsModel):
...
#EndpointsAliasProperty(
setter=set_doctorid, property_type=messages.StringField
)
def doctorid(self):
#Logic to retrieve the ID
return doctorid
def set_doctorid(self, value):
#The ID will be in the value, assign and store it in your model

NHibernate - Duplicate Records with lazily mapped collection

All,
I have an entity, that has several collections,- each collection is mapped lazily. When I run a criteria query, I get duplicate results for my root entity in the result set. How's that possible when all my collections are mapped lazily!
I verified, my collections, load lazily.
Here's my mapping:
Root entity 'Project':
[Bag(0, Lazy = CollectionLazy.True, Inverse = true, Cascade = "all-delete-orphan")]
[Key(1, Column = "job_id")]
[OneToMany(2, ClassType = typeof(ProjectPlan))]
public virtual IList<ProjectPlan> PlanList
{
get { return _planList; }
set { _planList = value; }
}
The criteria query is:
ICriteria criteria = session.Session.CreateCriteria<Entities.Project>()
.Add(Restrictions.Eq(Entities.Project.PROP_STATUS, !Entities.Project.STATUS_DELETED_FLAG));
.CreateAlias(Entities.Project.PROP_PLANLIST, "p")
.Add(Restrictions.Eq("p.County", 'MIDDLSEX'))
.setFirstResult(start).setMaxResults(pageSize)
.List<Entities.Project>();
I know, I can correct this problem w/ Distinct result transformer, I just want to know if this is normal behavior on lazy collections.
EDIT: I found the cause of this,- when looking at the raw SQL, the join, and where clause are correct but what baffles me is the generated Select clause,- it not only contains columns from the project entity (root entity) but also columns from the project plans entity which causes the issue I described above. I am not at work right now, but I'll try to do this: .SetProjection(Projections.RootEntity()), so I only get Project's columns in the select clause.
One way, how to solve this (I'd say so usual scenario) is: 1) not use fetching collections inside of the query and 2) use batch fetching, as a part of the mapping
So, we will always be querying the root entity. That will give us a flat result set, which can be correctly used for paging.
To get the collection data for each recieved row, and to avoid 1 + N issue (goign for collection of each record) we will use 19.1.5. Using batch fetching
The mapping would be like this
[Bag(0, Lazy = CollectionLazy.True
, Inverse = true
, Cascade = "all-delete-orphan"
, BatchSize = 25)] // Or something similar to batch-size="25"
[Key(1, Column = "job_id")]
[OneToMany(2, ClassType = typeof(ProjectPlan))]
public virtual IList<ProjectPlan> PlanList
{
...
Some other similar QA (with the almost same details)
How to Eager Load Associations without duplication in NHibernate?
NHibernate QueryOver with Fetch resulting multiple sql queries and db hits
Is this the right way to eager load child collections in NHibernate
And we still can filter over the collection items! but we have to use subqueries, an example Query on HasMany reference

Resources