Paginating chronologically prioritized Firebase children - firebase

tl;dr Performing basic pagination in Firebase via startAt, endAt and limit is terribly complicated, there must be an easier way.
I'm constructing an administration interface for a large number of user submissions. My initial (and current) idea is to simply fetch everything and perform pagination on the client. There is however a noticable delay when fetching 2000+ records (containing 5-6 small number/string fields each) which I attribute to a data payload of over 1.5mb.
Currently all entries are added via push but I'm a bit lost as to how paginate through the huge list.
To fetch the first page of data I'm using endAt with a limit of 5:
ref.endAt().limit(10).on('child_added', function(snapshot) {
console.log(snapshot.name(), snapshot.val().name)
})
which results in the following:
-IlNo79stfiYZ61fFkx3 #46 John
-IlNo7AmMk0iXp98oKh5 #47 Robert
-IlNo7BeDXbEe7rB6IQ3 #48 Andrew
-IlNo7CX-WzM0caCS0Xp #49 Frank
-IlNo7DNA0SzEe8Sua16 #50 Jimmmy
Firstly to figure out how many pages there are I am keeping a separate counter that's updated whenever someone adds or removes a record.
Secondly since I'm using push I have no way of navigating to a specific page since I don't know the name of the last record for a specific page meaning an interface like this is not currently possible:
To make it simpler I decided on simply having next/previous buttons, this however also presents a new problem; if I use the name of the first record in the previous result-set I can paginate to the next page using the following:
ref.endAt(null, '-IlNo79stfiYZ61fFkx3').limit(5).on('child_added', function(snapshot) {
console.log(snapshot.name(), snapshot.val().name)
})
The result of this operation is as follows:
-IlNo76KDsN53rB1xb-K #42 William
-IlNo77CtgQvjuonF2nH #43 Christian
-IlNo7857XWfMipCa8bv #44 Jim
-IlNo78z11Bkj-XJjbg_ #45 Richard
-IlNo79stfiYZ61fFkx3 #46 John
Now I have the next page except it's shifted one position meaning I have to adjust my limit and ignore the last record.
To move back one page I'll have to keep a separate list on the client of every record I've received so far and figure out what name to pass to startAt.
Is there an easier way of doing this or should I just go back to fetching everything?

We're working on adding an "offset()" query to allow for easy pagination. We'll also be adding a special endpoint to allow you to read the number of children at a location without actually loading them from the server.
Both of these are going to take a bit though. In the meantime the method you describe (or doing it all on the client) are probably your best bet.
If you have a data structure that is append-only, you could potentially also do pagination when you write the data. For example: put the first 50 in /page1, put the second 50 in /page2, etc.

Another way to accomplish this is with two trees:
Tree of ids: {
id1: id1,
id2: id2,
id3: id3
}
Tree of data: {
id1: ...,
id2: ...,
id3: ...
}
You can then load the entire tree of ids (or a big chunk of it) to the client, and do fancy pagination with that tree of ids.

Here's a hack for paginate in each direction.
// get 1-5
ref.startAt().limit(5)
// get 6-10 from 5
ref.startAt(null, '5th-firebase-id' + 1).limit(5)
// get 11-15 from 10
ref.startAt(null, '10th-firebase-id' + 1).limit(5)
Basically it's a hack for startAtExclusive(). You can anything to the end of the id.
Also figured out endAtExclusive() for going backwards.
// get 6-10 from 11
ref.endAt(null, '11th-firebase-id'.slice(0, -1)).limit(5)...
// get 1-5 from 6
ref.endAt(null, '6th-firebase-id'.slice(0, -1)).limit(5)...
Will play with this some more but seems to work with push ids. Replace limit with limitToFirst or limitToLast if using firebase queries.

Related

Function of Rows, Rowsets in PeopleCode

I'm trying to get a better understanding of what Rows and Rowsets are used for in PeopleCode? I've read through PeopleBooks and still don't feel like I have a good understanding. I'm looking to get more understanding of these as it pertains to Application Engine programs. Perhaps walking through an example may help. Here are some specific questions I have:
I understand that Rowsets, Row, Record, and Field are used to access component buffer data, but is this still the case for stand alone Application Engine programs run via Process Scheduler?
What would be the need or advantage to using these as opposed to using SQL objects/functions (CreateSQL, SQLExec, etc...)? I often see in AE programs where the CreateRowset object is instantiated and uses a .Fill method with a SQL WHERE Clause and I don't quite understand why a SQL was not used instead.
I've seen in PeopleBooks that a Row object in a component scroll is a row, how does a component scroll relate to the row? I've seen references to rows having different scroll levels, is this just a way of grouping and nesting related data?
After you have instantiated the CreateRowset object, what are typical uses of it in the program afterwards? How would you perform logic (If, Then, Else, etc..) on data retrieved by the rowset, or use it to update data?
I appreciate any insight you can share.
You can still use Rowsets, Rows, Records and fields in stand alone Application Engines. Application Engines do not have component buffer data as they are not running within the context of a component. Therefore to use these items you need to populate them using built-in methods like .fill() on a rowset, or .selectByKey() on a record.
The advantage of using rowsets over SQL is that it makes the CRUD easier. There are built-in methods for selecting, updating, inserting and deleting. Additionally you don't have to worry about making a large number of variables if there were multiple fields like you would with a SQL object. Another advantage is when you do the fill, the data is read into memory, where if you looped through the SQL, the SQL cursor would be open longer. The rowset, row, record and field objects also have a lot of other useful methods such as allowing you to executeEdits (validation) or copy from one rowset\row\record to another.
This question is a bit less clear to me but I'll try and explain. If you have a Page, it would have a level 0 row. It then could have multiple Level 1 rowsets. Under each of those it could have a level 2 rowsets.
Level0
/ \
Level1 Level1
/ \ / \
Level2 Level2 Level2 Level2
If one of your level1 rows had 3 rows, then you would find 3 rows in the Rowset associated with that level1. Not sure I explained this to answer what you need, please clarify if I can provide more info
Typically after I create a rowset, I would loop through it. Access the record on each row, do some processing with it. In the example below, I look through all locked accounts and prefix their description with LOCKED and then updated the database.
.
Local boolean &updateResult;
local integer &i;
local record &lockedAccount;
Local rowset &lockedAccounts;
&lockedAccounts = CreateRowset(RECORD.PSOPRDEFN);
&lockedAccounts.fill("WHERE acctlock = 1");
for &i = 1 to &lockedAccounts.ActiveRowCount
&lockedAccount = &lockedAccounts(&i).PSOPRDEFN;
if left(&lockedAccount.OPRDEFNDESCR.value,6) <> "LOCKED" then
&lockedAccount.OPRDEFNDESCR.value = "LOCKED " | &lockedAccount.OPRDEFNDESCR.value;
&updateResult = &lockedAccount.update();
if not &updateResult then
/* Error handle failed update */
end-if;
end-if;
End-for;

How to use cursors for navigating to previous pages using GQL and the new gcloud-java API?

I'm using the new gcloud-java API (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-datastore/src/main/java/com/google/cloud/datastore) for working with the Cloud Datastore. My specific question is on using GQL for pagination with cursors. I was able to page through the results one page at a time in the forward direction using cursors, but not having any luck with paging backwards.
Example Scenario:
Let's say I've 20 entities in a Kind with IDs 1 through 20. I have a page size of 5. Once I'm on the 3rd page (IDs 11 through 15), if I need to go one page back; i.e. retrieve IDs 6 through 10, what would be the correct GQL/sample code? Again, I prefer not to use offset with a number, but would like to use Cursors.
From what I can tell (actually tested), it looks like one needs to keep track of Start/End cursors for each page as they navigate in the forward direction, then use the saved cursors when there is a need to go back. I just want to make sure if this is the correct/only way or there is a simpler way to accomplish this.
Thanks in advance for your help.
If you add to your original query a sort by key (appended to the end of your "order by" clause), you should be able to reverse each property's sort order and use the latest cursor from your original query to get results in reverse.
Suppose you've iterated through some of the values from your forward query's QueryResults. You can call QueryResults's cursorAfter() method, which will return a cursor pointing right after the last result you saw from your original query. Now you can issue a new query (with the opposite sort order on each property, including the key property) using that cursor as the start cursor. You'll probably want to skip the first result, since it will be the last result you saw from the original query.

Dynamodb data model for process/transaction monitoring

I am wanting to keep track of multi stage processing job.
Likely just need the following fields
batchId (guid) | eventId (guid) | statusId (int) | timestamp | message (string)
There are relatively small number of events per batch.
I want to be able to easily query events that have a statusId less than n (still being processed or didn't finish processing).
Would using multiple rows for each status change, and querying for latest status be the best approach? I would use global secondary index but StatusId does not seem like a good candidate for hashkey (less than 10 statuses).
Instead of using multiple rows for every status change, if you updated the same event row instead, you could use a technique described in the DynamoDB documentation in the section 'Use a Calculated Value'. Basically this would involve adding another attribute (say 'derivedStatusId') which would be derived by appending a random number to statusId at the time of writing to DynamoDB. For example, for a statusId of 2, derivedStatusId could be one of {"2-00", "2-01", .. "2-99"}. Setting up a Global Secondary Index on derivedStatusId would give you some fan-out that will help in preventing the index from becoming hot.
If you are sure that you will use this index for only unfinished events, then removing the derivedStatusId attribute from the record when it transitions to a finished status will remove it from index as well - which may be a good property if events are expected to finish processing eventually, and if they stay around forever. This technique is called "Sparse Index" and is described in more detail here.
From your question, it seems like keeping status history recording is a desired property (I assume this because you want to have multiple rows for status changes). Consider putting this historical information in the same row. DynamoDB supports list data types and also has a generous 400KB item limit which may just allow you to capture all the desired historical information in the same record.

Top-50 click&time scored posts

We all know microscope from discovery meteor. The app is fine, it operates only with the number of upvotes. In the Best page it sorts posts by upvotes in descending order, upvote number is stored for each post and gets updated each time some user upvotes a post.
Imagine now we want to implement something like hacker news have - not only a click-based rating, but also a time-based rating. Lets now define that I will use word 'click' to describe an user action of clicking on post in the post list. This 'click' increases total number of clicks of this post by 1.
For thouse who do not knowe how hacker news algorithm work I will briefly explain. In common the total number of clicks of certain link (post) is divided by:
(T+2)^g
where T - total number of hours passed since post publishing time and now, and g is a "sensitivity" thing, lets call it that, which is just a number, 1.6, or 1.8, doesn't matter. This decrease influence of clicks as the time goes by. You can read more info (http://amix.dk/blog/post/19574)[here], for example.
Now, we want to have top-50 click&time-rated posts, so we need to query mongo to find all posts, sorted by score, calculated with formula from above.
I can see two major approaches to do so, and I find all of them quite bad.
First one, (the way I do now) subscribe to all posts, in template hepler prepare data for rendering by
rankedPosts: function() {
rawPosts = posts.find().map( function(item) { item.score = clicks/(T+2)^g; } ); // to add score for each post
rawPosts = _.sortBy( rawPosts, function(item) { return item.score*(-1); }) // to sort them by calculated score
rawPosts = _.first( rawPosts, 50 ); // to get only first 50
}
and then use rankedPosts for rendering. The bottleneck here is that each time I have to run through all posts.
Second one - somehow (I do not know how, or if it even possible) to subscribe for already scored/sorted/filtered collection, assuming meteor/mongodb can apply their magic to score/sort/filter (and recalculate score each new hour or new click) for me.
Now, obvious question, what will you recommend?
Thanks in advance.
Think about numbers. In a working page, you can have thousands of mosts, millions if the page is successful. Fetching all of them just to find the top 50 doesn't make sense.
I'd recommend storing the final calculated rating in a field. Then in subscription you apply sort by that field and desired limit. When post gain a new click, you simply recalculate the value and save it to db. Finally, in a cron job or meteor interval you update the rating of all items in the database.

Firebase reading ordered data reversed order?

I have an ordered list of firebase locations. I'm using a property ut (update time) as their priority. I want to make the list such that it's easy to get the latest updated documents.
So I set the priority to be negative ut.
var query = fb.child('view/documents').limit(20)
query.on('child_added', function(child) {
console.log(child.val())
console.log(child.getPriority())
})
I expect something like this to return the latest 20 documents, but it doesn't, it returns the oldest 20. In the forge I see the listing the way I expect it, the latest documents are on top, but the query is sending me the bottom 20. It seems contrary to my expectations for the query to send me the bottom 20 instead of the top 20.
What really confuses me is that the child_added returns the expected order, latest (smallest priority) first. But again it's the oldest in the list.
Am I doing something wrong or is this a bug in firebase.
Thanks.
I understand your confusion, but that's really how it's supposed to work: limit(20) returns the 20 greatest-priority children, starting with the 20th-greatest-priority child and ending with the absolute-greatest-priority child (and then updating whenever a new child is added whose priority is great enough to make the list).
You can see the example at https://www.firebase.com/docs/queries.html, where the priority is the Unix timestamp of when the message was sent, and messageListRef.limit(100) is used to get the 100 most recent messages (i.e., the 100 greatest-priority messages).
I think what you are looking for is : .startAt()
before the limit(), that will return the data in correct order, without the keyword you will always get the last specified number of children.
Here is the reference : https://www.firebase.com/docs/javascript/query/limit.html

Resources