Riak inserting a list and querying a list - riak

I was wondering if there was a effecient way of handling arrays/lists in Riak. Right now I'm storing the whole array as a string and searching the string to find out if a element exists in the array.
ID (key) : int[] (Value)
And also How do I write a map/reduce query to give all the keys for which the value array contains a element
For example 1 : 2,3,4
2 : 2,5
How would I write a M/R query to give me all the keys for which value contains 2 the result is 1,2 in this case.
Any help is appreciated

If you are searching for a specific element in the list and are using the LevelDB backend, you could create a secondary index that will contain the values of the array. Secondary indexes in Riak may contain multiple values and can be searched for equality, which should allow you to search for single elements in the array without having to resort to MapReduce.
If you need to make more complicated queries based on either several elements in the list or other parameters, you could retrieve a subset of records based on the secondary index and then process them further on the client side or perhaps even through a MapReduce job.

Related

Querying on Global Secondary indexes with a usage of contains operator

I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not

dynamoDB conditional write only when there are N or less items in a map

We are storing a map in DynamoDB (not my design choice). So, basically a key can contain a list of values. Something like map[string][]someStruct.
We can only append a value to a given key if there are only N or less values for a given key. For example, if "key1" already has 3 values, I cannot append another value, but if it has less than 3 values, I can append one more value.
I looked at Conditional Writes, but couldn't find a conditional expression that would help with this. Any help is greatly appreciated. Thanks!
You would need to store the number of records in the list along side the list, perhaps something like this would work:
{
"key1": {
"list": [1,2,3,4],
"listLength": 4
}
}
You'll need to make sure that the list and listLength are kept in sync.
Alternatively you can get the item, check the length and then update with a condition to make sure it hasn't been updated between you get and update operation.

Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.
Key -> metadata of the value such as length and subkey format
Key-l1 -> value of item 1 in list
Key-l2 -> value of item 2 in list
Key-ln -> the lastest value in the list
I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.
Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata.
To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.
This design seems to be ok?
Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?
Thanks
You should use Merge Operator, it's designed for such value append use case. Your design is read-before-write, which has performance penalty, in general it should be avoided if possible: What's read-before-write in NoSQL?.
Options options;
options.merge_operator.reset(new StringAppendOperator(','));
DB::Open(options, kDBPath, &db)
...
db->Merge(WriteOptions(), "key", "value1");
db->Merge(WriteOptions(), "key", "value2");
db_->Get(ReadOptions(), "key", &result); // return "value1,value2"
The above example uses a predefined StringAppendOperator, which simply append new values at the end. You can defined your own MergeOperator to customize the merge operation.
In the backend, the merge operation is done on the read path (and compaction to reduce the version number), details: Merge Operator Implementation.

Dealing with PL/SQL Collections

I have following declaration for collection
TYPE T_TABLE1 IS TABLE OF TABLE_1%ROWTYPE INDEX BY BINARY_INTEGER;
tbl1_u T_TABLE1;
tbl1_i T_TABLE1;
This table will keep growing and at the end, will be used in FORALL loop to do insert or update on TABLE_1.
Now there might be cases, where I want to delete a certain element. So i am planning to create a procedure, which will take the KEY (unique) and matched the element if that key is found
PSEDUO CODE
FOR i in tbl1_u.FIST..tbl1_u.LAST
LOOP
if tbl1_u(i).key = key then
tbl1.delete(i);
end if;
END LOOP;
My question is,
Once i delete the particular element, would be collection adjust automatically i.e., the index i would be replaced by next element or would that particular index will remain null/invalid and could possibly give me exception if i use it in FORALL INSERT/UPDATE?
I don't think that i can pass TABLE_1%ROWTYPE object to a procedure, do i have to create a record type ?
Any other tip regarding managing collection for bull delete/update/insert would be appreciate. Remeber, I would be dealing with 2 tables, if i am inserting/updating in table_1 then it means i am deleting it from table_2 and vice-versa.
Given that TABLE_1.KEY is unique you might consider using that as the index to your associative arrays. That way you can delete from the collections using the KEY value, which according to the pseudocode is available when doing the deletions. This would also save you having to iterate through the table to find the KEY you want, as the KEY would be the index - so your "deletion" pseudo-code would become:
tbl1_u.delete(key);
To answer your questions:
Since you're using associative arrays, when an element is deleted there is no "empty" space in the collection. The indexes for the elements, however, don't actually change. Therefore you need to use the collection.PRIOR and collection.NEXT methods to loop through the collection. But again, if you use the KEY value as the index you may not need to loop through the collections at all.
You can pass a TABLE_1%ROWTYPE as a parameter to a PL/SQL procedure or function.
You might want to consider using a MERGE statement which could handle doing the inserts and updates in one step. This might allow you to maintain only a single collection. Might be worth looking in to.
Share and enjoy.

DynamoDB ordered list

I'm trying to store a List as a DynamoDB attribute but I need to be able to retrieve the list order. At the moment the only solution I have come up with is to create a custom hash map by appending a key to the value and converting the complete value to a String and then store that as a list.
eg. key = position1, value = value1, String to be stored in the DB = "position1#value1"
To use the list I then need to filter out, organise, substring and reconvert to the original type. It seems like a long way round but at the moment its the only solution I can come up with.
Does anybody have any better solutions or ideas?
The List type in the newly added Document Types should help.
Document Data Types
DynamoDB supports List and Map data types, which can be nested to represent complex data structures.
A List type contains an ordered collection of values.
A Map type contains an unordered collection of name-value pairs.
Lists and maps are ideal for storing JSON documents. The List data type is similar to a JSON array, and the Map data type is similar to a JSON object. There are no restrictions on the data types that can be stored in List or Map elements, and the elements do not have to be of the same type.
I don't believe it is possible to store an ordered list as an attribute, as DynamoDB only supports single-valued and (unordered) set attributes. However, the performance overhead of storing a string of comma-separated values (or some other separator scheme) is probably pretty minimal given the fact that all the attributes for row must together be under 64KB.
(source: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/DataModel.html)
Add a range attribute to your primary keys.
Composite Primary Key for Range Queries
A composite primary key enables you to specify two attributes in a table that collectively form a unique primary index. All items in the table must have both attributes. One serves as a “hash partition attribute” and the other as a “range attribute.” For example, you might have a “Status Updates” table with a composite primary key composed of “UserID” (hash attribute, used to partition the workload across multiple servers) and a “Time” (range attribute). You could then run a query to fetch either: 1) a particular item uniquely identified by the combination of UserID and Time values; 2) all of the items for a particular hash “bucket” – in this case UserID; or 3) all of the items for a particular UserID within a particular time range. Range queries against “Time” are only supported when the UserID hash bucket is specified.

Resources