Is it possible not to overwrite earlier values in RocksDB? - rocksdb

For example, the logic for current RocksDB is:
put(key1, value1);
put(key1, value2);
And you can find (key1, value1) nowhere.
Is there a property or something like it, that can keep the earlier key-value pair like (key1, value1) whether in the memtable or the SST files?

Related

Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.
Key -> metadata of the value such as length and subkey format
Key-l1 -> value of item 1 in list
Key-l2 -> value of item 2 in list
Key-ln -> the lastest value in the list
I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.
Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata.
To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.
This design seems to be ok?
Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?
Thanks
You should use Merge Operator, it's designed for such value append use case. Your design is read-before-write, which has performance penalty, in general it should be avoided if possible: What's read-before-write in NoSQL?.
Options options;
options.merge_operator.reset(new StringAppendOperator(','));
DB::Open(options, kDBPath, &db)
...
db->Merge(WriteOptions(), "key", "value1");
db->Merge(WriteOptions(), "key", "value2");
db_->Get(ReadOptions(), "key", &result); // return "value1,value2"
The above example uses a predefined StringAppendOperator, which simply append new values at the end. You can defined your own MergeOperator to customize the merge operation.
In the backend, the merge operation is done on the read path (and compaction to reduce the version number), details: Merge Operator Implementation.

DynamoDB query 1 field greate than

I have games table.
To keep it simple, I will add only two fields for the question.
gameId:
deadlineToPlay:
I want to query for all games with deadlineToPlay > than today.
How would I set up the index for this? I thought I could create an index with just deadlineToPlay, but if I understand correctly when querying on hashkey, it has to be exact value. Can't use >.
I would also not like to use a scan, due to costs.
A way to workaround this would be to create or use an existing field which will have constant value (for example, field hasDeadline with value true).
Now you can create the table key like this: hasDeadline as HASH key and deadlineToPlay as SORT key (if the table is already created, you can define this key in a new GSI).
This way you will be able to query by hasDeadline = true and deadlineToPlay > today.

Erlang determine if record has a field

I am updating a record schema that I keep in mnesia. The new schema contains a new field and I would like to, after reading the record by id, check to see if the record has the field and, if not, update the record to the new schema.
So, for example our old record is like so:
-record(cust, {id, name, street_address, city, state, zip}).
The new record adds the field street_address2:
-record(cust, {id, name, street_address, street_address2, city, state, zip}).
I would like to be able to upgrade the schema of existing records on-the-fly. To do so with the current logic I would need to lookup the record by id, check the record for the existence of the address_line2 field. If it doesn't exist, set it to the atom undefined and save back to mnesia. For some reason I am having a hard time finding a good way to do this.
Any guidance would be appreciated.
According to a reply from Ulf Wiger at https://groups.google.com/forum/#!topic/erlang-programming/U6Q0-_Usb50 you do need to transform the table using mnesia:transform_table(Tab, Fun, NewAttributeList) call.
http://erldocs.com/R16B03-1/mnesia/mnesia.html?i=1&search=mnesia#mnesia
This function applies the argument Fun to all records in the table. Fun is a function which takes a record of the old type and returns a transformed record of the new type.
Alex is correct. Here's an example of using transform_table for what you described:
-record(cust, {id, name, street_address, street_address2, city, state, zip}). % This should be the record definition
mnesia:transform_table(
cust,
fun({cust,
Id,
Name,
StreetAddress,
City,
State,
Zip
}) ->
{cust,
Id,
Name,
StreetAddress,
undefined, % This is setting it to the atom undefined. You could also do "", or anything you want.
City,
State,
Zip
}
end,
record_info(fields, cust)
).
What happens is that the variables in the first tuple (Id, Name, StreetAddress, etc) get set automatically from the existing record. Then the record is transformed into the second tuple (the fun's return), using those set variables to assign the new values. This process is applied to every existing record in the table.
Keep in mind the function isn't magical in any way, so you can do anything in there that you need to, for example checking ids or whatever. But for simply adding a field to the record, you can do it like I show here.
If you're doing it from the console, be sure to load in the record definition using rr() or something.
Here's the docs for transform_table: http://www.erlang.org/doc/man/mnesia.html#transform_table-3

sqlite copy values within a table

I'm completely new to sqlite, so bear with me. I am updating a database and need to copy values within the same table, named "custom". I used pragma to get the table info, it's:
0|ticket|integer|0||0
1|name|text|0||0
2|value|0||0
Using select * from custom where ticket = (some value) I get, among other results,
(some value)|block|
(some value)|required|(another value)
I want to copy (another value) to "block" anywhere this value exists in "required". How do I make that happen? Everything I've tried has failed miserably to this point.
My pseudo code version would be something like
update custom
where required has a value
copy it to block
How do I turn that into actual sqlite commands?
UPDATE custom
SET value = (SELECT value
FROM custom AS c2
WHERE c2.ticket = custom.ticket
AND c2.name = 'required')
WHERE name = 'block'
AND value IS NULL

SqliteDataReader and duplicate column names when using LEFT JOIN

Is there a documentation/specification about Sqlite3 that would describe would is supposed to happen in the following case?
Take this query:
var cmd = new SqliteCommand("SELECT Items.*, Files.* FROM Items LEFT JOIN Files ON Files.strColName = Items.strColName");
Both Items and Files have a column name "strColName". If an entry exists in Files, it will be joined to the result, if not, it will be NULL.
Let's assume I always need the value of strColName, no matter if it is coming from Items or from Files. If I execute a reader:
var reader = cmd.ExecuteReader();
If there is a match in Files, reader["strColName"] will obviously contain the correct result because the value is set and it is the same in both tables. But if there wasn't a match in Files, will the NULL value of Files overwrite the non-NULL value of Items?
I'm really looking for some specification that defines how a Sqlite3 implementation has to deal with this case, so that I can trust either result.
SQLite has no problem returning multiple columns labelled with the same name.
However, the columns will always be returned in exactly the same order they are written in the SELECT statement.
So, when you are searching for "strColName", you will find the first one, from Items.
It is recommended to use explicit column names instead of * so that the order is clear, and you can access values by their column index, if needed (and you detect incompatible changes in the table structure).

Resources