Most efficient way to run a recursive query in a container - recursion

I’m relatively new to Azure Cosmos DB, and I am struggling with how to approach this problem due to some conflicting documentation.
I have a single container, with JSON data.
Each JSON document has a root level array called opcos which can contain N number of GUIDS (typically less than 5).
These opcos GUIDS refer to child items which are ID’s or separate documents.
If a parent document links to a child, then I need to check the child for more children in its opcos node.
Whats the best way to get all the related items, there could be approx. 100 related documents.
I need to keep each document separate, so I can’t store them as sub-documents, as link between parent and child is fluid between multiple parents.
I am looking for a recursive solution, and I am trying to do this from within Cosmos DB, as I am assuming that running potentially 100 calls from outside of Cosmos DB carries a performance overhead with all the connecting etc.
Advice is welcomed, I took a snippet off another article and tried editing it, but it immediately errors onvar context = getContext();
Also, any tips on debugging functions and stored procedures is welcome. I've 15 years of TSQL behind me, but this is very different.
When I tried using a function in Cosmos DB it says
ReferenceError:
'getContext' is not defined
If I try the following code
var context = getContext();
var collection = context.getCollection();
function userDefinedFunction(id){
var context = getContext();
var collection = context.getCollection();
var metadataQuery = 'SELECT company.opcos FROM company where company.id in (' + id + ')';
var metadata = collection.queryDocuments(collection.getSelfLink(), metadataQuery, {}, function (err, documents, options) {
if (err) throw new Error('Error: ', + err.message);
if (!documents || !documents.length) {
throw new Error('Unable to find any documents');
} else {
var response = getContext().getResponse();
/*for (var i = 0; i < documents.length; i++) {
var children = documents[i]['$1'].Children;
if (children.length) {
for (var j = 0; j < children.length; j++) {
var child = children[j];
children[j] = GetWikiChildren(child);
}
}
}*/
response.setBody(documents);
}
});
}

The answer really comes down to your partitioning strategy.
First and foremost your udf doesn't run because UDFs don't have the execution context as part of their API. Your function will run but you need to create it as a stored procedure, not a user defined function.
Now you have to keep in mind that stored procedures can be executed only against a single logical partition and this is their transaction scope. Your technique will work as long as you pass an array of ids in the stored procedure and the documents you're manipulating are in the same partition. If they are not then it's impossible to used a stored proc (well except if you have one per document which probably isn't worth it at this point).
On a side note you want to parameterize the way you add the ids in the query to prevent potential sql injection.

Related

Flutter & Firebase Get more than 10 Firebase Documents into a Stream<List<Map>>

With Flutter and Firestore, I am trying to get more than 10 documents into a Stream<List>. I can do this with a .where clause on a collection mapping the QuerySnapshot. However, the 10 limit is a killer.
I'm using the provider package in my app. So, in building a stream in Flutter with a StreamProvider, I can return a
Stream<List<Map from the entire collection. too expensive. 200 plus docs on these collections and too many users. Need to get more efficient.
Stream<List<Map using a .where from a Collection that returns a Stream List 10 max on the list...doesn't cut the mustard.
Stream<Map from a Document, that returns 1 stream of 1 document.
I need something in between 1 and 2.
I have a Collection with up to 500 Documents, and the user will choose any possible combination of those 500 to view. The user assembles class rosters to view their lists of users.
So I'm looking for a way to get individual streams of, say 30 documents, and then compile them into a List: But I need this List<Stream<Map to be a Stream itself so each individual doc is live, and I can also filter and sort this list of Streams. I'm using the Provider Package, and if possible would like to stay consistent with that. Here's where I am currently stuck:
So, my current effort:
Future<Stream<List<AttendeeData>>> getStreams() async {
List<Stream<AttendeeData>> getStreamsOutput = [];
for (var i = 0; i < teacherRosterList.length; i++) {
Stream thisStream = await returnTeacherRosterListStream(facility, teacherRosterList[i]);
getStreamsOutput.add(thisStream);
}
return StreamZip(getStreamsOutput).asBroadcastStream();
}
Feels like I'm cheating below: I get an await error if I put the snapshot directly in Stream thisStream above as Stream is not a future if I await, and if I don't await, it moves too fast and gets a null error.
Future<Stream<AttendeeData>> returnTeacherRosterListStream(String thisFacility, String thisID) async {
return facilityList.doc(thisFacility).collection('attendance').doc(thisID).snapshots().map(_teacherRosterListFromSnapshot);
}
}
Example of how I'm mapping in _teacherRosterListFromSnapshot (not having any problem here):
AttendeeData _teacherRosterListFromSnapshot(DocumentSnapshot doc) {
// return snapshot.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}
My StreamProvider Logic and the error:
return MultiProvider(
providers: [
StreamProvider<List<AttendeeData>>.value(
value: DatabaseService(
teacherRosterList: programList,
facility: user.claimsFacility,
).getStreams()),
]
Error: The argument type 'Future<Stream<List>>' can't be assigned to the parameter type 'Stream<List>'.
AttendeeData is my Map Class name.
So, the summary of questions:
Can I even do this? I'm basically Streaming a List of Streams of Maps....is this a thing?
If I can, how do I do it?
a. I can't get this into the StreamProvider because getStreams is a Future...how can I overcome this?
I can get the data in using another method from StreamProvider, but it's not behaving like a Stream and the state isn't updating. i'm hoping to just get this into Provider, as I'm comfortable there, and I can manage state very easily that way. However, beggars can't be choosers.
Solved this myself, and since there is a dearth of good start to finish answers, I submit my example for the poor souls who come after me trying to learn these things on their own. I'm a beginner, so this was a slog:
Objective:
You have any number of docs in a collection and you want to submit a list of any number of docs by their doc number and return a single stream of a list of those mapped documents. You want more than 10 (firestore limit on .where query), less than all the docs...so somewhere between a QuerySnapshot and a DocumentSnapshot.
Solution: We're going to get a list of QuerySnapshots, we're going to combine them and map them and spit them out as a single stream. So we're getting 10each in chunks (the max) and then some odd number left over. I plug mine into a Provider so I can get it whenever and wherever I want.
So from my provider I call this as the Stream value:
Stream<List<AttendeeData>> filteredRosterList() {
var chunks = [];
for (var i = 0; i < teacherRosterList.length; i += 10) {
chunks.add(teacherRosterList.sublist(i, i + 10 > teacherRosterList.length ? teacherRosterList.length : i + 10));
} //break a list of whatever size into chunks of 10.
List<Stream<QuerySnapshot>> combineList = [];
for (var i = 0; i < chunks.length; i++) {
combineList.add(*[point to your collection]*.where('id', whereIn: chunks[i]).snapshots());
} //get a list of the streams, which will have 10 each.
CombineLatestStream<QuerySnapshot, List<QuerySnapshot>> mergedQuerySnapshot = CombineLatestStream.list(combineList);
//now we combine all the streams....but it'll be a list of QuerySnapshots.
//and you'll want to look closely at the map, as it iterates, consolidates and returns as a single stream of List<AttendeeData>
return mergedQuerySnapshot.map(rosterListFromTeacherListDocumentSnapshot);
}
Here's a look at how I mapped it for your reference (took out all the fields for brevity):
List<AttendeeData> rosterListFromTeacherListDocumentSnapshot(List<QuerySnapshot> snapshot) {
List<AttendeeData> listToReturn = [];
snapshot.forEach((element) {
listToReturn.addAll(element.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}).toList());
});
return listToReturn;
}

DynamoDB how to get items count for a partition keys using .net core?

How can I get items count for a particular partition key using .net core preferably using Object Persistence Interface or Document Interfaces?
Since I do not see any docs any where, currently I get the number of items count by retrieve all the item and get its count, but it is very expensive to do the reads.
What is the best practices for such item count request? Thank you.
dynamodb is mostly a document oriented key-value db; so its not optimized for functionality of the common relation db functions (like item count).
to minimize the data that is transmitted and to improve speed you may want to do the following:
Create Lambda Function that returns Item Count
To avoid transmitting data outside of AWS; which is slow and expensive.
query options
use only keys in your projection-expression,
reducing the data that is transmitted from db
max page-size, reducing number of calls needed
Stream Option
Streams could also be used for keeping counts; e.g. as described in
https://medium.com/signiant-engineering/real-time-aggregation-with-dynamodb-streams-f93547cfb244
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-aggregation.html
Related SO Question
Complexity of finding total records count with partition key in nosql dynamodb table?
I just realized that using low level interface in QueryRequest one can set Select = "COUNT" then when calling QueryAsync() orQuery() will return the count only as a integer only. Please refer to code sample below.
private static QueryRequest getStockRecordCountQueryRequest(string tickerSymbol, string prefix)
{
string partitionName = ":v_PartitionKeyName";
string sortKeyPrefix = ":v_sortKeyPrefix";
var request = new QueryRequest
{
TableName = Constants.TableName,
ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL,
Select = "COUNT",
KeyConditionExpression = $"{Constants.PartitionKeyName} = {partitionName} and begins_with({Constants.SortKeyName},{sortKeyPrefix})",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{ $"{partitionName}", new AttributeValue {
S = tickerSymbol
}},
{ $"{sortKeyPrefix}", new AttributeValue {
S = prefix
}}
},
// Optional parameter.
ConsistentRead = false,
ExclusiveStartKey = null,
};
return request;
}
but I would like to point out that this still will consumed the same read units as retrieving all the item and get its count by yourself. but since it is only returning the count as an integer, it is a lot more efficient then transmitting the entire items list cross the wire.
I think using DynamoDB Streams in a more proper way to get the counts for large project. It is just a lot more complicated to implement.

How do i query a Firebase database for a nested property?

Hi i have a noSql db in firebase.
I want to get the object where userId is 288
i'v tried many combinations but i cant figure out how its done.
This is my code so far :
var refTest= database.ref('conversation')
var query = refTest
.orderByChild('messages');
query.on('value', function(data) {
var a = data.val();
console.log(a.messages.userId);
console.log(data.val());
});
This is a image of my "schema"
I'm obviously a noob when it comes to NoSQL. I do understand SQL
All help is appreciated
You can order/filter on a nested value like this:
var refTest= database.ref('conversation')
var query = refTest.orderByChild('messages/userId').equalTo("288");
query.on('value', function(snapshot) {
snapshot.forEach(function(child) {
console.log(child.key);
console.log(child.val());
});
});
The forEach is needed, since there may be multiple child nodes with messages/userId equal to 288.
The key named "messages" doesn't make sense in your schema. Because if you want to have another message under that conversation, then you wouldn't be able to add it with the same key name and you also couldn't add it under "messages" because it would overwrite the other one. My suggestion is to use the push() method for adding a new message. This way you uniquely identify each message.
Regarding your question, an easy to understand way of parsing your schema is this: you loop through each message of each conversation for finding the messages with userID.
refTest.on('value', function(data) {
var conversations = data.val();
for (conversation in conversations){
for (message in conversation) {
if (message.userId == 288) {
// do whatever you need
// and eventually return something to break the loops
}
}
}
}
Of course, you can adapt it based on your needs

Asynchronous with indexeddb problems

I am having problem with a function in IndexedDB, where I need to change the status of some meetings. The Search feature which meetings are checked by grabbing the ID of each one of them, soon after I A for() where I retrace the vector that contains the ids for each database access do I get a different passing the id of the time. The following code example:
var val = [];
var checkbox = $('input:checkbox[class^=checkReunioes]:checked');
if(checkbox.length > 0){
checkbox.each(function(){
val.push($(this).val());
});
}
for(var i = 0; i < val.length; i++){
var transaction = db.transaction(["tbl_REUNIOES"], "readwrite").objectStore("tbl_REUNIOES");
var request = transaction.get(val[i]);
request.onerror = function(event) {
alert("BAD");
};
request.onsuccess = function(event) {
var data = request.result;
data.FLG_STATU_REUNI = 'I';
var codigo_igreja = localStorage.getItem("igreja");
var dataJSON = JSON.stringify(data);
enviarFilaSincronismo("tbl_REUNIOES", "U", dataJSON, " WHERE COD_IDENT_REUNI = '" + val[i] + "' and COD_IDENT_IGREJ = '" + codigo_igreja + "'");
var requestUpdate = transaction.put(data);
requestUpdate.onerror = function(event) {
alert("OK");
};
requestUpdate.onsuccess = function(event) {
$("#listReunioes").html("");
serchAll(w_key_celula);
};
};
}
In my view the problem is occurring due to be a bank indexeddb asynchronous, it passes to the next search, even before the first stop.
But how can I do to confer this ?
What is the good practice for something in this case ?.
If you are inexperienced with writing asynchronous code, a good general rule to consider is to never define functions inside loops. Do not set request.onsuccess to a function from within the for loop.
You can perform multiple get and put requests on the same transaction when you do not expect the individual requests to fail for data-related reasons, such as the violation of a uniqueness constraint of an index, or because you are performing many thousands of requests on the same transaction and reaching processing limits.
You might find that using IDBObjectStore.prototype.openCursor together with IDBCursor.prototype.update is more convenient than using IDBObjectStore.prototype.get and IDBObjectStore.prototype.put.
Your example code indicates that a successful get request means that data was retrieved, when in fact, this is not what actually happens. A successful get request just means that a request occurred without errors (e.g. against an object store that exists, against a database that is not blocked by other requests, against a database connection that is still valid). It does not mean that an object matched your get request query. You should be checking for whether the request's result object is defined, and use that check as a determination of whether an object matched your get query, and not simply that a successful request occurred.
You might want to spend more time organizing your code into smaller functions that use clearer names. Your example code is difficult to read.
It looks like you are using some type of global db variable. If you are not well experienced with writing asynchronous code, avoid using a global db variable. There is no guarantee the db variable will be defined and open when you decide to access it, which could lead to an unexpected error.

How to delete all but most recent X children in a Firebase node?

Given a Firebase node lines filled with unique-ID children (from push() operations), such as this:
Firebase--
--lines
--K3qx02jslkdjNskjwLDK
--K3qx23jakjdz9Nlskjja
--K3qxRdXhUFmEJdifOdaj
--etc...
I want to be able to delete all children of lines except the most recently added 200 (or 100, or whatever). Basically this is a cleanup operation. Now I know I could do this by grabbing a snapshot of all children of lines on the client side, counting the entries, then using an endsAt(totalChildren-numToKeep) to grab the relevant data and run remove(). But I want to avoid grabbing all that data to the client.
Is there an alternative to my idea above?
Keep the most recent N items, is one of the trickier use-cases to implement. If you have any option to change it into "keep items from the past N hours", I recommend going that route.
The reason the use-case is tricky, is that you're counting items and Firebase does (intentionally) not have any count-based operations. Because of this, you will need to retrieve the first N items to know which item is N+1.
ref.child('lines').once('value', function(snapshot) {
if (snapshot.numChildren() > MAX_COUNT) {
var childCount = 0;
var updates = {};
snapshot.forEach(function (child) {
if (++childCount < snapshot.numChildren() - MAX_COUNT) {
updates[child.key()] = null;
}
});
ref.child('lines').update(updates);
}
});
A few things to note here:
this will download all lines
it performs a single update() call to remove the extraneous lines
One way to optimize this (aside from picking a different/time-based truncating strategy) is to keep a separate list of the "line ids".
lineids
--K3qx02jslkdjNskjwLDK
--K3qx23jakjdz9Nlskjja
--K3qxRdXhUFmEJdifOdaj
So you'll still keep the data for each line in lines, but also keep a list of just the ids. The code to then delete the extra ones then becomes:
ref.child('lineids').once('value', function(snapshot) {
if (snapshot.numChildren() > MAX_COUNT) {
var childCount = 0;
var updates = {};
snapshot.forEach(function (child) {
if (++childCount < snapshot.numChildren() - MAX_COUNT) {
updates['lineids/'+child.key()] = null;
updates['lines/'+child.key()] = null;
}
});
ref.update(updates);
}
});
This last snippet is slightly more involved, but prevents from having to download all lines data by just downloading the line ids.
There are many variations you can choose, but I hope this serves as enough inspiration to get started.

Resources