DynamoDB Set order - amazon-dynamodb

From DynamoDB docs:
An attribute of type String Set. For example:
"SS": ["Giraffe", "Hippo" ,"Zebra"]
Type: Array of strings
Required: No
This is all I could find. I did some testing but that's clearly not enough for production environments and I would like to get a confirmation/confutation from people who have actually worked with these Sets.
Do DynamoDB Sets maintain insertion order? Can I count on that fact & build logic around that?
Im mainly interested in String Set but it probably applies to all of them (String, Number, Binary).

Here is the documentation. SET data type doesn't preserve the order.
SET : The order of the values within a set are not preserved;
therefore, your applications must not rely on any particular order of
elements within the set.
LIST - A list type attribute can store an ordered collection of values
Similar discussion on AWS forum

Related

How can I limit and sort on document ID in firestore?

I have a collection where the documents are uniquely identified by a date, and I want to get the n most recent documents. My first thought was to use the date as a document ID, and then my query would sort by ID in descending order. Something like .orderBy(FieldPath.documentId, descending: true).limit(n). This does not work, because it requires an index, which can't be created because __name__ only indexes are not supported.
My next attempt was to use .limitToLast(n) with the default sort, which is documented here.
By default, Cloud Firestore retrieves all documents that satisfy the query in ascending order by document ID
According to that snippet from the docs, .limitToLast(n) should work. However, because I didn't specify a sort, it says I can't limit the results. To fix this, I tried .orderBy(FieldPath.documentId).limitToLast(n), which should be equivalent. This, for some reason, gives me an error saying I need an index. I can't create it for the same reason I couldn't create the previous one, but I don't think I should need to because they must already have an index like that in order to implement the default ordering.
Should I just give up and copy the document ID into the document as a field, so I can sort that way? I know it should be easy from an algorithms perspective to do what I'm trying to do, but I haven't been able to figure out how to do it using the API. Am I missing something?
Edit: I didn't realize this was important, but I'm using the flutterfire firestore library.
A few points. It is ALWAYS a good practice to use random, well distributed documentId's in firestore for scale and efficiency. Related to that, there is effectively NO WAY to query by documentId - and in the few circumstances you can use it (especially for a range, which is possible but VERY tricky, as it requires inequalities, and you can only do inequalities on one field). IF there's a reason to search on an ID, yes it is PERFECTLY appropriate to store in the document as well - in fact, my wrapper library always does this.
the correct notation, btw, would be FieldPath.documentId() (method, not constant) - alternatively, __name__ - but I believe this only works in Queries. The reason it requested a new index is without the () it assumed you had a field named FieldPath with a subfield named documentid.
Further: FieldPath.documentId() does NOT generate the documentId at the server - it generates the FULL PATH to the document - see Firestore collection group query on documentId for a more complete explanation.
So net:
=> documentId's should be as random as possible within a collection; it's generally best to let Firestore generate them for you.
=> a valid exception is when you have ONE AND ONLY ONE sub-document under another - for example, every "user" document might have one and only one "forms of Id" document as a subcollection. It is valid to use the SAME ID as the parent document in this exceptional case.
=> anything you want to query should be a FIELD in a document,and generally simple fields.
=> WORD TO THE WISE: Firestore "arrays" are ABSOLUTELY NOT ARRAYS. They are ORDERED LISTS, generally in the order they were added to the array. The SDK presents them to the CLIENT as arrays, but Firestore it self does not STORE them as ACTUAL ARRAYS - THE NUMBER YOU SEE IN THE CONSOLE is the order, not an index. matching elements in an array (arrayContains, e.g.) requires matching the WHOLE element - if you store an ordered list of objects, you CANNOT query the "array" on sub-elements.
From what I've found:
FieldPath.documentId does not match on the documentId, but on the refPath (which it gets automatically if passed a document reference).
As such, since the documents are to be sorted by timestamp, it would be more ideal to create a timestamp fieldvalue for createdAt rather than a human-readable string which is prone to string length sorting over the value of the string.
From there, you can simply sort by date and limit to last. You can keep the document ID's as you intend.

Can I reuse existing fields as Sharded timestamps in Firestore?

I was looking for a solution to Firestore's limitation of Sequential indexed fields which means the following from this doc.
"Sequential indexed fields" means any collection of documents that
contains a monotonically increasing or decreasing indexed field. In
many cases, this means a timestamp field, but any monotonically
increasing or decreasing field value can trigger the write limit of
500 writes per second.
As per the solution, I can add a shard field in my collection which will contain random value and create a composite index with the timestamp. I am trying to achieve this with the existing fields I have in my Document.
My document has the following fields:
{
users: string[],
createdDate: Firebase Timestamp
....
}
I already have a composite index created: users Arrays createdDate Descending. Also, I have created Exemptions for the fields field from Automatic index settings. The users field will contain a list of firebase auto-generated IDs so definitely its random. Now I am not sure whether the field users will do the job of field shard form the example doc. In this way we can avoid adding a new field and still increase the write rate. Can someone please help me with this?
While I don't have specific experience that says what you're trying to do definitely will or will not work the way you expect, I would assume that it works, based on the fact that the documentation says (emphasis mine):
Add a shard field alongside the timestamp field. Use 1..n distinct values for the shard field. This raises the write limit for the collection to 500*n, but you must aggregate n queries.
If each users array contains different and essentially random user IDs, then the array field values would be considered "distinct" (as two arrays are only equal if their elements are all equal to each other), and therefore suitable for sharding.

How are firestore arrays implemented?

I need to store a list of strings in a field and I need to be able to easily access each string by its value, but there isn't a "set" or "1d map" option in firestore. As a hack, I used a map and just stored each of the strings as "value": value. But firestore's array's methods and functionality seems to behave like a set, and if so it would be perfect for storing this type of data. Are firestore array's implemented as sets or more as a traditional array?
Firestore arrays are just arrays, but come with some special operators that allow you to use them as sets too.
Specifically, you can perform unions on arrays, which add an item to the array if it is not in there yet.
You can also query on documents where an array contains a specific, or one of a number of values.
Note that in all these cases (unions and queries), you need to specify the entire array value and cannot specify only one property of them. In your case with simple string values that makes sense, but when you store objects in arrays this also applies: you must specify the entire object for the query to match it.

ComsosDB index. Should I exclude it

In my SQL-CosmosDB I am not using any queries with WHERE condition other than by a partition key + sort by additional field (so a streamId which is a partition key and event position, as I use Cosmos to store my aggragate roots).
I wonder what will happen if I just exclude all paths from indexing in that collection, except maybe keeping the field I am using for sorting.
Alexander,according to you requirements,i think you could consider setting the index mode as None.Please refer to the explanations in this link.
If a container's indexing policy is set to None, indexing is
effectively disabled on that container. This is commonly used when a
container is used as a pure key-value store without the need for
secondary indexes. It can also help speeding up bulk insert
operations.
Of course,you could choose excluding the root path to selectively include paths that need to be indexed if you have special needs. BTW, as mentioned by #DraganB in the comments,change index policy only affects new records,you could see the statements in this link. So it's better to deliberate at the initial time.

Using timestamp as an Attribute in DynamoDB

I'm quite new to DynamoDB, but have some experience in Cassandra. I'm trying to adapt a pattern I followed in Cassandra, where each column represented a timestamped event, and wondering if it will carry over gracefully into DynamoDB or if I need to change my approach.
My goal is to query a set of documents within a date range by using the milliseconds-since-epoch timestamp as an Attribute name. I'm successfully storing the following as each report is generated with each new report being added under its own column:
{ PartitionKey:customerId,
SortKey:reportName_yyyymm,
'#millis_1#':{'report':doc_1},
'#millis_2#':{'report':doc_2},
. . .
'#millis_n#':{'report':doc_n}
}
My question is, given a millisecond-based date range, and the accompanying Partition and Sort keys, is it possible to query the set of Attributes that fall within that range or must I retrieve all columns for the matching keys and filter them at the client?
Welcome to the most powerful NoSQL database ;)
To kick off with the positive news, there is no way to query out specific attributes. You can project certain attributes in a query. But you would have to write your own logic to determine which attributes or columns should be included in the projected query. To get close to your solution you could use a map attribute inside an item with the milliseconds as a key. But there is another thing you have to be aware of when starting on this path.
There is a maximum total item size of 400KB for each item in DynamoDB, including key and attribute names.(Limits in DynamoDB Items) This means you can only store so many attributes in an item. This is especially true if you intend to put the actual report inside of the attribute. Which I would advise against, also because you will be burning up read capacity units every time you get one attribute out of the whole item. You would be better of putting this data in a separate table with the keys in the map. But truthfully in DynamoDB I would split this whole thing up, just add the milliseconds to the sort key and make every document its own item. That way you can directly query to these items and you can use the "between" where clause to select specific date-time ranges. Please let me you meant something else.

Resources