How do I specify a Range including everything in a RocksDB column? - rocksdb

I want to publish metrics about my RocksDB instance, including the disk size of each column over time. I've come across a likely API method to calculate disk size for a column:
GetApproximateSizes():
Status s = GetApproximateSizes(options, column_family, ranges.data(), NUM_RANGES, sizes.data());
This is a nice API, but I don't know how to provide a Range that will specify my entire column. Is there a way to do so without finding the min/max key in the column?

For the whole database, you can approximate it using 0x00 or the empty byte string and an arbitrarily big key as end such as 0xFFFFFF.
Otherwise, if the column share a common prefix, use the following function to compute the end key:
def strinc(key):
key = key.rstrip(b"\xff")
return key[:-1] + int2byte(ord(key[-1:]) + 1)
strinc will compute the next byte string that is not prefix of key, together they describe the whole keyspace having KEY as prefix.

Related

Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.
Key -> metadata of the value such as length and subkey format
Key-l1 -> value of item 1 in list
Key-l2 -> value of item 2 in list
Key-ln -> the lastest value in the list
I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.
Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata.
To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.
This design seems to be ok?
Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?
Thanks
You should use Merge Operator, it's designed for such value append use case. Your design is read-before-write, which has performance penalty, in general it should be avoided if possible: What's read-before-write in NoSQL?.
Options options;
options.merge_operator.reset(new StringAppendOperator(','));
DB::Open(options, kDBPath, &db)
...
db->Merge(WriteOptions(), "key", "value1");
db->Merge(WriteOptions(), "key", "value2");
db_->Get(ReadOptions(), "key", &result); // return "value1,value2"
The above example uses a predefined StringAppendOperator, which simply append new values at the end. You can defined your own MergeOperator to customize the merge operation.
In the backend, the merge operation is done on the read path (and compaction to reduce the version number), details: Merge Operator Implementation.

Redis Multiple Key Set Counts

So I add keys to my Redis implementation for wallpaper view counts like this...
(the values are there for demonstration purposes but the overall format is the same)
SADD wallpapers:100:2015-12-31 "127.0.0.1"
SADD wallpapers:100:2016-01-01 "127.0.0.1"
SADD wallpapers:100:2016-01-01 "192.168.1.1"
SADD wallpapers:100:2016-01-02 "127.0.0.1"
So that should add the IP's in the associated sets. So my question is, do they allow some sort of pattern based counts?
SCARD wallpapers:100:2016:01-01
For example the above command would return "2", as there are two IPs stored in the set, but is there a way to run something like the below command to get all counts for all the dates?
SCARD wallpapers:100:*
Actually it's easier than you've ever thought: store less specific sets to be able to get what you want.
For example, if you need wallpapers:100:* it means that you just need a set called wallpapers:100 where you store unique IP addresses there.
That is, whenever you add an IP addresses to one of specific sets (i.e. daily sets), also add it to the global set for a given wallpaper identifier.
Redis is like working with a manual index. Index your data in a way you can efficiently use it. That's all! This means that data redundancy is a good approach.
EVAL "local total = 0 for _, key in ipairs(redis.call('keys', ARGV[1])) do total = total + redis.call('scard', key) end return total" 0 wallpapers:100:*
This command returns you the total number of elements in keys wallpapers:100:*.
If you want total number of unique values from all keys combined,
EVAL "return redis.call('SUNIONSTORE', 'wallpapers:temp', unpack(redis.call('keys', ARGV[1])))" 0 wallpapers:100:*
This will return the number of unique values from all keys combined and also creates a key wallpapers:temp
You can delete this key later del wallpapers:temp
I used SUNIONSTORE for the second command.
Refer EVAL.

Delete a key-value pair in BerkeleyDB

Is there any way to delete key-value pair where the key start with sub-string1 and ends with sub-string2 in BerkeleyDB without iterating through all the keys in the DB?
For ex:
$sub1 = "B015";
$sub2 = "5646";
I want to delete
$key = "B015HGUJJ75646"
Note: It is guaranteed that there will be only one key for the combination of $sub1 and $sub2.
This can be done by taking an iterator of the DB and checking every key for the condition, but that will be very in-efficient for large DBs. Is there any way to do it without iterating through the complete DB?
If you're using a RECNO database, you're probably out of luck. But, if you can use a BTREE, you have a couple of options.
First, and probably easiest is to iterate over only the portion of the database that makes sense. Assuming you're using the default key comparison function, you can use DB_SET_RANGE to position the starting cursor (iterator) at the start of your partial key string. In your example, this might be "B0150000000000". You then scan forwards with DB_NEXT, looking at each key in turn. When either you find the key you're looking for, or if the key you find doesn't start with "B015", you're done.
Another technique that could be applicable to your situation is to redefine the key comparison function. If, as you state, there is only one combination of $sub1 and $sub2, then perhaps you only need to compare those sections of the keys to guarantee uniqueness? Here's an example of a full string comparison (I'm assuming you're using perl, just from the syntax you supplied above) from https://www2.informatik.hu-berlin.de/Themen/manuals/perl/DB_File.html :
sub Compare
{
my ($key1, $key2) = #_ ;
"\L$key1" cmp "\L$key2" ;
}
$DB_BTREE->{compare} = 'Compare' ;
So, if you can rig things such that you're only comparing the starting and ending four characters, you should be able to drop the database iterator directly onto the key you're interested in.

What exactly are hashtables?

What are they and how do they work?
Where are they used?
When should I (not) use them?
I've heard the word over and over again, yet I don't know its exact meaning.
What I heard is that they allow associative arrays by sending the array key through a hash function that converts it into an int and then uses a regular array. Am I right with that?
(Notice: This is not my homework; I go too school but they teach us only the BASICs in informatics)
Wikipedia seems to have a pretty nice answer to what they are.
You should use them when you want to look up values by some index.
As for when you shouldn't use them... when you don't want to look up values by some index (for example, if all you want to ever do is iterate over them.)
You've about got it. They're a very good way of mapping from arbitrary things (keys) to arbitrary things (values). The idea is that you apply a function (a hash function) that translates the key to an index into the array where you store the values; the hash function's speed is typically linear in the size of the key, which is great when key sizes are much smaller than the number of entries (i.e., the typical case).
The tricky bit is that hash functions are usually imperfect. (Perfect hash functions exist, but tend to be very specific to particular applications and particular datasets; they're hardly ever worthwhile.) There are two approaches to dealing with this, and each requires storing the key with the value: one (open addressing) is to use a pre-determined pattern to look onward from the location in the array with the hash for somewhere that is free, the other (chaining) is to store a linked list hanging off each entry in the array (so you do a linear lookup over what is hopefully a short list). The cases of production code where I've read the source code have all used chaining with dynamic rebuilding of the hash table when the load factor is excessive.
Good hash functions are one way functions that allow you to create a distributed value from any given input. Therefore, you will get somewhat unique values for each input value. They are also repeatable, such that any input will always generate the same output.
An example of a good hash function is SHA1 or SHA256.
Let's say that you have a database table of users. The columns are id, last_name, first_name, telephone_number, and address.
While any of these columns could have duplicates, let's assume that no rows are exactly the same.
In this case, id is simply a unique primary key of our making (a surrogate key). The id field doesn't actually contain any user data because we couldn't find a natural key that was unique for users, but we use the id field for building foreign key relationships with other tables.
We could look up the user record like this from our database:
SELECT * FROM users
WHERE last_name = 'Adams'
AND first_name = 'Marcus'
AND address = '1234 Main St'
AND telephone_number = '555-1212';
We have to search through 4 different columns, using 4 different indexes, to find my record.
However, you could create a new "hash" column, and store the hash value of all four columns combined.
String myHash = myHashFunction("Marcus" + "Adams" + "1234 Main St" + "555-1212");
You might get a hash value like AE32ABC31234CAD984EA8.
You store this hash value as a column in the database and index on that. You now only have to search one index.
SELECT * FROM users
WHERE hash_value = 'AE32ABC31234CAD984EA8';
Once we have the id for the requested user, we can use that value to look up related data in other tables.
The idea is that the hash function offloads work from the database server.
Collisions are not likely. If two users have the same hash, it's most likely that they have duplicate data.

Sqlite3: Disabling primary key index while inserting?

I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)
The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).
So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.
My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.
What would be a correct way to do something like that?
UPDATE:
I forgot to specify the flags I'm already using:
PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY
UPDATE 2:
I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.
You can't remove embedded index since it's the only address of row.
Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
Set page size for new DB at least 4096
Remove ANY additional index except primary
Fill in data in the SORTED order so that primary key is growing.
Reuse commands, don't create each time them from string
Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
Commit every 50000 items.
If you have additional indexes - create them only AFTER ALL data is in table
If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.
Are you doing the INSERT of each new as an individual Transaction?
If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.
See faster-bulk-inserts-in-sqlite3.

Resources