xquery finding records with specific security role - xquery

I'm trying to find records with a specific security role, and I cna't seem to find a way to do it using cts:search (which should be faster than a for loop). Here is the for loop:
let $validRoleList := (
xdmp:role("myRole1"),
xdmp:role("myRole2")
)
for $recordUri in cts:uris((), (), cts:collection-query("bigCollection"))
let $documentPermissions := xdmp:document-get-permissions($recorduri)/sec:role-id/fn:string()
let $intPermissions :=
for $permissionValue in $documentPermissions
return xs:unsignedLong($documentPermissions)
where $intPermissions eq $validRoleList
return $recordUri
With my "bigCollection" being in the 15 million record range, even on the task server it's taking over an hour. Is there any easier way to find a record by its permission role name?

I found this function somewhere years ago, and I don't know how it works, but it does. I've used it in production systems for years, and it works great for your question of "How do I query for documents that have a particular permission?" It's in XQuery, but I believe there's a JS equivalent for each XQuery function.
declare function permission-query($role, $capability)
{
cts:term-query(
xdmp:add64(
xdmp:mul64(xdmp:add64(xdmp:mul64(xdmp:role($role), 5), xdmp:hash64($capability)), 5),
xdmp:hash64("permission()")
)
)
};

This looping approach is inherently slow because it's going to pull every document off disk to extract its permissions. 15 million docs means 15 million disk fetches. No matter the code, that's slow.
The fastest and easiest way to answer this would be to make and become a user with those two roles and do a cts:uris query for all the URIs in the database, and the answer will be automatically and efficiently limited the URIs visible for those two roles.
If you need it more dynamic without creating such a user, it's possible for an admin user to xdmp:login with a list of roles.

Related

How can I get a document at a specific index after orderBy

I have some code like this:
...
const snapshot = firestore().collection("orders").orderBy("deliveryDate")
...
I want to access only the 100th order in the returned documents. So far, the only way I achieve this is to do firestore().collection("orders").orderBy("deliveryDate").limit(100) and this returns first 100 documents and I can access the last order. But, I end up fetching 99 unwanted documents and this could become quite slower if I want the 200th document or higher.
So, I basically want to know if there's a possible way of getting just the index I want after sorting.
As far as I know, startAt() and startAfter() only accept a doc reference or field values, not an index/offset
Firestore does not offer any way to offset by some numeric amount to web and mobile clients (and doing so would end up having the exact same cost as what you're doing now).
If you need to impose some sort of offset into your collection, you will need to maintain that in the document itself for querying, or use some other type of storage that gives you fast cheap access by index.

How to query one field then order by another one in Firebase cloud Firestore?

I'm struggling to make a (not so) complex query in firebase cloud firestore.
I simply need to get all docs where the id field == a given id.
Then order the results by their date and limit to 10 results.
So this is my current situation
db.firestore().collection('comments')
.where("postId",'==',idOfThePost)
.orderBy('date','asc').limit(10).get().then( snapshot => {
//Nothing happens and the request wasn't even been executed
})
I can get the result only if i don't use the orderBy query but i have to process this sorting for the needs of my application.
Someone has an idea to help me to fix this ?
thanks
You can do this by creating an Index in the firestore.
The first field of the index should be the equality field and the second field of the index should be the order by field.
Given your problem, you would need the following index:
first field: postId, 'asc'
second field: date, 'asc'
Please check the doc. It says
However, if you have a filter with a range comparison (<, <=, >, >=), your first ordering must be on the same field
you can try this code
db.firestore().collection('comments')
.where("postId",'==',idOfThePost)
.orderBy('postId')
.orderBy('date','asc').limit(10).get().then( snapshot => {
.....
})
My Workaround
If you're googling this you've probably realized it can't be done traditionally. Depending on your problem though there may be some viable workarounds, I just finished creating this one.
Scenario
We have an app that has posts that appear in a feed (kind of like Reddit), each post has an algorithmic score 'score' and we needed a way to get the 'top posts' from 12-24 hours ago. Trying to query sorted by 'score' where timestamp uses > and < to build the 12-24 hour ago range fails since Firebase doesn't allow multiple conditional querying or single conditional querying with an descending sort on another field.
Solution
What we ended up doing is using a second field that was an array since you can compound queries for array-contains and descending. At the time a post was made we knew the current hour, suppose it was hour 10000 since the server epoch (i.e. floor(serverTime/60.0/60.0)). We would create an array called arrayOfHoursWhenPostIsTwelveToTwentyFourHoursOld and in that array we would populate the following values:
int hourOffset = 12;
while (hourOffset <= 24) {
[arrayOfHoursWhenPostIsTwelveToTwentyFourHoursOld addObject:#(currentHour+hourOffset)];
hourOffset++;
}
Then, when making the post we would store that array under the field hoursWhenPostIsTwelveToTwentyFourHoursOld
THEN, if it had been, say, 13 hours since the post was made (the post was made at hour 10000) then the current hour would be 10013, so we could use the array-contains query to see if our array contained the value 10013 while also sorting by algorithm score at the same time
Like so:
FIRFirestore *firestore = [Server sharedFirestore];
FIRCollectionReference *collection = [firestore collectionWithPath:#"Posts"];
FIRQuery *query = [collection queryOrderedByField:#"postsAlgorithmScore" descending:YES];
query = [query queryWhereField:#"hoursWhenPostIsTwelveToTwentyFourHoursOld" arrayContains:#(currentHour)];
query = [query queryLimitedTo:numberToLoad];
Almost Done
The above code will not run properly at first since it is using a compound index query, so we had to create a compound index query in firebase, the easiest way to do this is just run the query then look at the error in the logs and firebase SDK will generate a link for you that you can navigate to and it will auto-generate the compound index for your database for you, otherwise you can navigate to firebase>database>index>compound>new and build it yourself using hoursWhenTwelveToTwentyFourHoursOld: Arrays, score: Descending
Enjoy!
same here, it is weird why can't. below is another sample. can't get the result. Hoping firebase can reply about this and update the document.
dbFireStore.collection('room').where('user_id.'+global.obj_user.user_id,'==',true).orderBy('last_update').get().then((qs)=>{
console.log(qs);
});
using other work-around solution is javascript array and array.sort()
I ran into the same issue yesterday on Android. The Callback was just not called. Today I suddenly got an error message. FAILED_PRECONDITION: The query requires an index. It even contains a URL in the error message to generate that index with one click.
It seems that if you want to use orderBy on your data, you need to create an index for that field. The index also needs to be in the correct order (DESC, ASC).
As per firestore document,
If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
So just click that link you get in Logcat, it will be redirected to create index page, just create index. It will take some time. after enabling composite index, you will get the result as your requested query.
Stumbled across this looking for help when i found that using the orderBy function didnt work and the documentation still says it does not support it. A bit weird and unclear to be honest, because it does support it so long as you index your Firestore database. For example, this query now works fine for me having set up indexing:
const q = query(docRef, where("category", "==", 'Main'), orderBy('title', 'asc')
Indexing in Firestore
Console Log that even gives you the url to automatically create the index if you try and run with the above command.
Maybe I am missing something, or a later version of Firebase (I am using v9) simply does support it.

How can I use a gremlin query to filter based on a users permissions?

I am fairly new to graph databases, however I have used SQL Server and document databases (Lucene, DocumentDb, etc.) extensively. It's completely possible that I am approaching this query the wrong way, since I am new to graph databases. I am trying to convert some logic to a graph database (CosmosDB Graph via Gremlins to be specific) that we currently are using SQL Server for. The reason for the change is that this problem set is not really what SQL Server is great at and so our SQL query (which we have optimized as good as we can) is really starting to be the hot spot of our application.
To give a very brief overview of our logic, we run a web shop that allows admins to configure products and users with several levels of granular permissions (described below). Based on these permissions, we show the user only the products they are allowed to see.
Entities:
Region: A region consists of multiple countries
Country: A country has many markets and many regions
Market: A market is a group of stores in a single country
Store: A store is belongs to a single market
Users have the following set of permissions and each set can contain multiple values:
can-view-region
can-view-country
can-view-market
can-view-store
Products have the following set of permissions and each set can contain multiple values:
visible-to-region
visible-to-country
visible-to-market
visible-to-store
After trying for a few days, this is the query that I have come up with. This query does work and returns the correct products for the given user, however it takes about 25 seconds to execute.
g.V().has('user','username', 'john.doe').union(
__.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
__.out('can-view-country').in('in-market').hasLabel('store'),
__.out('can-view-market').in('in-market').hasLabel('store'),
__.out('can-view-store')
).dedup().union(
__.out('in-market').in('contains-country').in('visible-to-region').hasLabel('product'),
__.out('in-market').in('visible-to-country').hasLabel('product'),
__.out('in-market').in('visible-to-market').hasLabel('product'),
__.in('visible-to-store').hasLabel('product')
).dedup()
Is there a better way to do this? Is this problem maybe not best suited with a graph database?
Any help would be greatly appreciated!
Thanks,
Chris
I don't think this is going to help a lot, but here's an improved version of your query:
g.V().has('user','username', 'john.doe').union(
__.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
__.out('can-view-country','can-view-market').in('in-market').hasLabel('store'),
__.out('can-view-store')
).dedup().union(
__.out('in-market').union(
__.in('contains-country').in('visible-to-region'),
__.in('visible-to-country','visible-to-market')).hasLabel('product'),
__.in('visible-to-store').hasLabel('product')
).dedup()
I wonder if the hasLabel() checks are really necessary. If, for example, .in('in-market') can only lead a store vertex, then remove the extra check.
Furthermore it might be worth to create shortcut edges. This would increase write times whenever you mutate the permissions, but should significantly increase the read times for the given query. Since the reads are likely to occur way more often than permission updates, this might be a good trade-off.
CosmosDB Graph team is looking into improvements that can done on union step in particular.
Other options that haven't already been suggested:
Reduce the number of edges that are traversed per hop with additional predicates. e.g:
g.V('1').outE('market').has('prop', 'value').inV()
Would it be possible to split the traversal up and do parallel request in your client code? Since you are using .NET, you could take each result in first union, and execute parallel requests for the traversals in the second union. Something like this (untested code):
string firstUnion = #"g.V().has('user','username', 'john.doe').union(
__.out('can-view-region').out('contains-country').in('in-market').hasLabel('store'),
__.out('can-view-country').in('in-market').hasLabel('store'),
__.out('can-view-market').in('in-market').hasLabel('store'),
__.out('can-view-store')
).dedup()"
string[] secondUnionTraversals = new[] {
"g.V({0}).out('in-market').in('contains-country').in('visible-to-region').hasLabel('product')",
"g.V({0}).out('in-market').in('visible-to-country').hasLabel('product')",
"g.V({0}).out('in-market').in('visible-to-market').hasLabel('product')",
"g.V({0}).in('visible-to-store').hasLabel('product')",
};
var response = client.CreateGremlinQuery(col, firstUnion);
while (response.HasMoreResults)
{
var results = await response.ExecuteNextAsync<Vertex>();
foreach (Vertex v in results)
{
Parallel.ForEach(secondUnionTraversals, (traversal) =>
{
var secondResponse = client.CreateGremlinQuery<Vertex>(col, string.Format(traversal, v.Id));
while (secondResponse.HasMoreResults)
{
concurrentColl.Add(secondResponse);
}
});
}
}

Firebase - Structuring Data For Efficient Indexing

I've read almost everywhere about structuring one's Firebase Database for efficient querying, but I am still a little confused between two alternatives that I have.
For example, let's say I want to get all of a user's "maxBenchPressSessions" from the past 7 days or so.
I'm stuck between picking between these two structures:
In the first array, I use the user's id as an attribute to index on whether true or false. In the second, I use userId as the attribute NAME whose value would be the user's id.
Is one faster than the other, or would they be indexed a relatively same manner? I kind of new to database design, so I want to make sure that I'm following correct practices.
PROGRESS
I have come up with a solution that will both flatten my database AND allow me to add a ListenerForSingleValueEvent using orderBy ONLY once, but only when I want to check if a user has a session saved for a specific day.
I can have each maxBenchPressSession object have a key in the format of userId_dateString. However, if I want to get all the user's sessions from the last 7 days, I don't know how to do it in one query.
Any ideas?
I recommend to watch the video. It is told about the structuring of the data very well.
References to the playlist on the firebase 3
Firebase 3.0: Data Modelling
Firebase 3.0: Node Client
As I understand the principle firebase to use it effectively. Should be as small as possible to query the data and it does not matter how many requests.
But you will approach such a request. We'll have to add another field to the database "negativeDate".
This field allows you to get the last seven entries. Here's a video -
https://www.youtube.com/watch?v=nMR_JPfL4qg&feature=youtu.be&t=4m36s
.limitToLast(7) - 7 entries
.orderByChild('negativeDate') - sort by date
Example of a request:
const ref = firebase.database().ref('maxBenchPressSession');
ref.orderByChild('negativeDate').limitToLast(7).on('value', function(snap){ })
Then add the user, and it puts all of its sessions.
const ref = firebase.database().ref('maxBenchPressSession/' + userId);
ref.orderByChild('negativeDate').limitToLast(7).on('value', function(snap){ })

do document IDs in Meteor need to be random or just unique?

i'm migrating data from a rails system, and it would be really convenient to assign the migrated objects IDs like post0000000000001, etc.
i've read here
Creating Meteor-friendly id's in Mongo?
that Meteor creates random 17 character strings from
23456789ABCDEFGHJKLMNPQRSTWXYZabcdefghijkmnopqrstuvwxyz
which looks to be chosen to avoid possibly ambiguous characters (omits 1 and I, etc.)
do the IDs need to be random for some reason? are there security implications to being able to guess a Meteor document's ID?! or it is just an easy way of generating unique IDs?
Mongo seems fine with sequential ids:
http://docs.mongodb.org/manual/core/document/#the-id-field
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
so i would guess this would have to be a Meteor constraint if it exists.
The IDs just need to be unique.
Typically there is an element of order: Such as using integers, or timestamps, or something with sequentiality.
This can't work in Meteor since inserts can come from the client, they may be disconnected for a period, or clients clocks may be off/have varying latency. Also its not possible to know the previous _id (in the case of a sequential _id) at the time an _id is written owing to latency compensation (instant inserts).
The consequence of the lack of order in the DDP protocol is the decision to use entirely random ids. That is not to say you can't use your own _ids.
while there is a risk of a collision with this strategy it is minimal on the order of [number of docs in your collection]/[55^17] * 100 % or nearly impossible. In the event this occurs the client will temporarily insert it and cancel it once the server confirms the error with a Mongo Duplicate Key error.
Also when it comes to security with the other answer. It is not too much of an issue if the _id of the user is known. It is not possible to log in without a valid hashed login token or retrieve any information with it. This applies to the user collection only of course. If you have your own collection an easily guessable URL containing an id as a reference without publish method checks on the eligibility to read the data is a risk the high entropy random ids generated by Meteor can mitigate.
As long as they are unique it should be ok to use your own ids.
I am not an expert, but I suppose Mongo needs a unique ID so when it updates the document, it in fact creates a new version of the document of that same ID.
The real question is - I too whish to know - if we can change the ID without screwing Mongo mechanism and reliability, or we need to create a secondary attribute? (It can make a smaller index too I suppose)?
But me too, I can imagine that security wise, it is better if document IDs are difficult to guess, especially user IDs! Otherwise, could it be easy or possible to fake a user, knowing the ID? Anybody, correct me if I am wrong.
I don't think it's possible and desirable to change ID from Mongo.
But you can easily create a autoincrement ID with http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
I have created a package that does just that and that is configurable.
https://atmospherejs.com/stivaugoin/fluid-refno
var refNo = generateRefNo({
name: 'invoices', // default: 'counter'
prefix: 'I-', // default: ''
size: 5, // default: 5
filling: '0' // default: '0'
});
console.log(refNo); // output: "I-00001"
you now can use refNo to add in your document on Insert
maybe it will help you

Resources