Paginate data with offset on an non order firestore query - firebase

If I use offset and limit to paginate query, do I need to specify an order?
In other words, a query with no order specified, uses some implicit order, like key order?
In a SQL database, if I don't specify an order, SQL engine will return the results in the order he pleases. So the second time the query is run, the results may be staked in a different order, so offset and limit would not cut the result as wanted.

Making the assumption you want to order your documents by creation date, in Firestore this would need an extra field, as explained in the Firestore documentation (link):
Unlike "push IDs" in the Firebase Realtime Database, Cloud Firestore
auto-generated IDs do not provide any automatic ordering. If you want
to be able to order your documents by creation date, you should store
a timestamp as a field in the documents.

Related

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

Are Azure CosmosDB indexes split by partition

I am sending some IoT events into Azure Cosmos DB. I am partitioning by device id and I am always querying by device id. I want to know if the automatically created indexes are separated by partition key. Specifically if I do query like
SELECT TOP 5 ... FROM events WHERE deviceId = X ORDER BY timeStamp DESC
Will it use the automatically created index on timeStamp and if so is it effective. Basically what I am asking is if there are separate indexes on timeStamp for each partition key (deviceId in my case) because otherwise the index will be relatively useless because the range will contain a lot of irrelevant data from other devices. If this was SQL Server I would create an index on deviceId followed by timeStamp but I am not sure how Cosmos DB works by default.
Indexes sit within the partition so yes.
For this query you have you should also create a composite index with DESC sort order for the best performance.

Amazon SimpleDB & DynamoDB for storing blog posts

Consider a simple blog post schema has the following columns
ID
Author
Category
Status
CreatedDateTime
UpdatedDateTime
So assume the following queries
query by ID
query by Author, paginated
query by (Author, Status), sorted by CreatedDateTime, paginated
query by (Category, Status), sorted by CreatedDateTime, paginated
So seems without doing much works, SimpleDB would be more easy to implement the codes?
SimpleDB is barely supported by AWS any more - you can't even find it in the AWS console, so while it may work for you, personally I would be deciding between DynamoDB and DocumentDB (assuming you want NoSQL) - don't think there is any reason to start a new project on such an old offering at this point.
You should use DynamoDB because it has a lot of useful features such as Point in Time Recovery, transactions, encryption-at-rest, and activity streams that SimpleDB does not have.
If you're operating on a small scale, DynamoDB has the advantage that it allows you to set a maximum capacity for your table, which means you can make sure you stay in the free tier.
If you're operating at a larger scale, DynamoDB automatically handles all of the partitioning of your data (and has, for all practical purposes, limitless capacity), whereas SimpleDB has a limit of 10 GB per domain (aka "table") and you are required to manage any horizontal partitioning across domains that you might need.
Finally, there are signs that SimpleDB is already on a deprecation path. For example, if you look at the SimpleDB release notes, you will see that the last update was in 2011, whereas DynamoDB had several new features announced at the last re:Invent conference. Also, there are a number of reddit posts (such as here, here, and here) where the general consensus is that SimpleDB is already deprecated, and in some of the threads, Jeff Barr even commented and did not contradict any of the assertions that SimpleDB is deprecated.
That being said, in DynamoDB, you can support your desired queries.
You will need two Global Secondary Indexes, which use a composite sort key. Your queries can be supported with the following schema:
ID — hash key of your table
Author — hash key of the Author-Status-CreatedDateTime-index
Category — hash key of the Category-Status-CreatedDateTime-index
Status
CreatedDateTime
UpdatedDateTime
Status-CreatedDateTime — sort key of Author-Status-CreatedDateTime-index and Category-Status-CreatedDateTime-index. This is a composite attribute that exists to enable some of your queries. It is simply the value of Status with a separator character (I'll assume it's # for the rest of this answer), and CreatedDateTime appended to the end. (Personal opinion here: use ISO-8601 timestamps instead of unix timestamps. It will make troubleshooting a lot easier.)
Using this schema, you can satisfy all of your queries.
query by ID:
Simply perform a GetItem request on the main table using the blog post Id.
query by Author, paginated:
Perform a Query on the Author-Status-CreatedDateTime-index with a key condition expression of Author = :author.
query by (Author, Status), sorted by CreatedDateTime, paginated:
Perform a Query on the Author-Status-CreatedDateTime-index with a key condition expression of Author = :author and begins_with(Status-CreatedDateTime, :status). The results will be returned in order of ascending CreatedDateTime.
query by (Category, Status), sorted by CreatedDateTime, paginated:
Perform a Query on the Category-Status-CreatedDateTime-index with a key condition expression of Author = :author and begins_with(Status-CreatedDateTime, :status). The results will be returned in order of ascending CreatedDateTime. (Additionally, if you wanted to get all the blog posts in the "technology" category that have the status published and were created in 2019, you could use a key condition expression of Category = "technology" and begins_with(Status-CreatedDateTime, "published#2019").
The sort order of the results can be controlled using the ScanIndexForward field of the Query request. The default is true (sort ascending); but by setting it to false DynamoDB will return results in descending order.
DynamoDB has built in support for paginating the results of a Query operation. Basically, any time that there are more results that were not returned, the query response will contain a lastEvaluatedKey which you can pass into your next query request to pick up where you left off. (See Query Pagination for more details about how it works.)
On the other hand, if you're already familiar with SQL, and you want to make this as easy for yourself as possible, consider just using the Aurora Serverless Data API.

Does a logical partition scan in CosmosDB always returns items in the same order?

In CosmosDB using the SQL API ( hope API might not matter ) and queries that do not use ORDER BY over an specific Logic Partition ( e.g. WHERE CustomerId = 123 ), wondering if the response will return the results always in the same order.
A use case could be something like an Audit log, where it is possible that TimeStamp _ts is not granular enough so likely to find at some point the same value twice and the source or events doesn't allow to create an sequence that can be used for ordering.
wondering if the response will return the results always in the same
order.
Based on my previous test, if you do not set any sort rules, it will be sorted as default based on the time created in the database,whatever it is partitioned or not.
In above sample documents, the sort will not be changed if I change the id,partition key(that's name) or ts.

Cloud Firestore whereNotEqual

Does Firestore support something like whereNotEqual?
For example, I need to get exact documents where key "xyz" is missing.
In Firebase realtime db, we could get it by calling *.equalTo(null).
Thanks.
Firestore does not support a direct equivalent of !=. The supported query operators are <, <=, ==, >, or >= so there's no "whereNotEqual".
You can test if a field exists at all, because all filters and order bys implicitly create a filter on whether or not a field exists. For example, in the Android SDK:
collection.orderBy("name")
would return only those rows that contain a "name" field.
As with explicit comparison there's no way to invert this query to return those rows where a value does not exist.
There are a few work-arounds. The most direct replacement is to explicitly store null then query collection.whereEqualTo("name", null). This is somewhat annoying though because if you don't populate this from the outset you have to backfill existing data once you want to do this. If you can't upgrade all your clients you'll need to deploy a function to keep this field populated.
Another possibility is to observe that usually missing fields indicate that a document is only partially assembled perhaps because it goes through some state machine or is a sort of union of two non-overlapping types. If you explicitly record the state or type as a discriminant you can query on that rather than field non-presence. This works really well when there are only two states/types but gets messy if there are many states.
Cloud Firestore now supports whereNotEqualTo in database queries.
Keep in mind if you have more than one field in your query you may have to create a composite index in Cloud Firestore.

Resources