Values that can be stored in LMDB - openldap

LMDB is a key-value store. What types of keys and values can be stored here? Examples shows either int or char arrays.. Also I would like to know if it is possible to store related data in lmdb like we store all data related to a student in a table in RDBMS

It sounds like you're not a C programmer, but in the context of the C language, the data for keys and values in LMDB are both (void *) - that is, generic pointers to anything. That means any data type that can be expressed in the programming language can be used. In other languages, through whatever bindings are provided, your options may be more limited.
LMDB doesn't care what you store, all it sees are blobs. Obviously it is not an RDBMS. If you want to store structured data, you need to manage that structure yourself. You could take a complex data structure and serialize it into a blob, and store it under a single key. This is the approach used in OpenLDAP's slapd. Or, you could setup individual columns of your data as separate named DBs in LMDB and store individual values in their respective DBs. (For example, indices in OpenLDAP and SQLite/SQLightning are handled this way.) So yes, while LMDB does not provide any functions for managing relations itself, you can certainly use it as the backing store of an RDBMS if you want to. (Again, see SQLightning for example. Backends for MySQL/MariaDB or Postgres are doable as well, but involve a lot more glue code between their frontends and the LMDB API.)

Both keys and data can contain any combination of zero or more bytes.
So your data blobs could be comma-separated text strings, JSON or some form of packet binary data (like flatbuffers).
It also means you must invest own work if you want to add multiple keys to locate the same data or if you want to store multiple types of data similar to multiple tables in a RMDB.
One simple way to store data could be like:
born;1980-01-01;2 -> <null>
born;1983-05-17;1 -> <null>
born;1983-05-17;3 -> <null>
born;1992-09-11;4 -> <null>
db;nextuser -> 5
names;benny;2 -> <null>
names;jenny;3 -> <null>
names;john;1 -> <null>
names;sue;4 -> <null>
occupation;student;3 -> <null>
occupation;student;4 -> <null>
occupation;teacher;1 -> <null>
occupation;teacher;2 -> <null>
students;1;3 -> <null>
students;2;3 -> <null>
students;2;4 -> <null>
teachers;3;1 -> <null>
teachers;4;1 -> <null>
teachers;4;2 -> <null>
users;0001 -> {"name":"john","born":"1983-05-17","occupation":"teacher"}
users;0002 -> {"name":"benny","born":"1980-01-01","occupation":"teacher"}
users;0003 -> {"name":"jenny","born":"1983-05-17","occupation":"student"}
users;0004 -> {"name":"sue","born":"1992-09-11","occupation":"student"}
The above data would allow the users to be iterated on insertion order (id) by using the users;<id> keys. Or by age using the born;<date>;<id> keys.
Iterating through students;<teacher-id>;<student-id> with teacher-id = 2 would give the id of all students mapped to user "benny".
Iterating through teachers;<student-id>;<teacher-id> with student-id = 4 would give the id of all teachers for user "sue".
Instead of using key prefixes for different type of data, it's also possible to create multiple key/value databases in LMDB.

Related

Brute-Force encryption algo/key based on mapping encrypted-value <-> unencrypted-value (hashcat)

i have a list of encrypted values and i know the unencrypted values for each entry.
Is there a possibility to brute force the encryption mechanism, so that i'm able to decrypt new unknown encrypted values?
this is my list "unencrypted -> encrypted":
XXXXXXXXXXXXXXXXX -> AAPBbXxBtdNhUH2nc3w3DWajRHeG5OmunJQ97n9/Ooccih07+8EsMNNW2zqbzXvQ1bl+yBwUcj1ZzcxNIem0zPr1TeiphXPh/UF9r7XzRfI4w7bMuyM=
YYYYYYYYYYYYYYYYY -> AALKq4wVvSIbsn3h5azSGT7Z5HKGH1YNGKC1+MVPLWKaEMHR+VbdcVcwnZYB32OHjYf/T7tpo1FjFV8qEPltpzdWxe4OFwLiB9nJe6HIan0zn4Jsf2Q=
ZZZZZZZZZZZZZZZZZ -> AANdNV4zeqvH7jVi0HjnMBkSvvAXcQavyNDOJVYUGKT/LKC97iPDB1t3xTnz/9T5kkeHxtH2lXjRnPChY3AwfVuPImQ4CF8/7sHvpQQCM3fSHAy+lV0=
...
The mechanism is the same for each entry (no salts).
every single value can be (en)crypted without additional values.
Is this possible using hashcat?
BR
John

Optimal data struct for k-k-v mapping

I have a large mapping table with 1.4 billion records. The data struct is now like {<Key1, Key2>: List<Value>}.
Key1 and Value are from same set, let's say A, with ~0.1 billion unique elements.
Key2 are from another set, let's say B, with only 32 unique elements.
List<Value> is variable length list with up to 200 maximum elements.
Can someone recommend any better data structure or retrieval algorithm for quick online retrieval and proper space consumption.
You could use an extendible hash table for this:
https://en.wikipedia.org/wiki/Extendible_hashing
If you don't want to implement it yourself, then you could try using something like Redis or Memcached to serve as an external implementation of a persistable hash table.
To create the hashing key, just combine Key1 and Key2 (concatenate? xor?) and use that as a hash key.
If in RAM use a hash table with a dynamic array for your list. That should work well.
Unless you care about the order of the keys, hash tables should do the job.
If you want to get all Key1s associated to a Key2 you can do that as well by maintaining a separate hash table for that. Or if you're actually implementing this you could link the keys so that all keys that have Key2 in them, form a linked list.

Query latest record for each ID in DynamoDB

We have a table like this:
user_id | video_id | timestamp
1 2 3
1 3 4
1 3 5
2 1 1
And we need to query latest timestamp for each video viewed by a specific user.
Currently it's done like this:
response = self.history_table.query(
KeyConditionExpression=Key('user_id').eq(int(user_id)),
IndexName='WatchHistoryByTimestamp',
ScanIndexForward=False,
)
It queries all timestamps for all videos of specified user, but it does way huge load to database, because there can be thousands of timestamps of thousands videos.
I tried to find solution on Internet, but as I can see, all SQL solutions uses GROUP BY, but DynamoDB has no such features
There are 2 ways I know of doing this:
Method 1 GSI Global Secondary Index
GroupBy is sort of like partition in DynamoDB, (but not really). Your partition is currently user_id I assume, but you want video_id as the partition key, and timestamp as the sort key. You can do that creating a new GSI, and specify your new sort key timestamp & partition key video_id. This gives you the ability to query for a given video, the latest timestamp, as this query will only use 1 RCU and be super fast just add --max-items 1 --page-size 1. But you will need to supply the video_id.
Method 2 Sparse Index
The problem with 1 is you need to supply an ID, whereas you might just want to have a list of videos with their latest timestamp. There are a couple of ways to do this, one way I like is using a Sparse Index, if you have an attribute, called latest & set that to true for the latest timestamp, you can create a GSI and choose that attribute key latest, but not you will have to manually set and unset this value yourself, which you have to do in lambda streams or your app.
That does seem weird but this is how NoSQL works as opposed to SQL, which I myself am battling with now on a current project, where I am having to use some of these techniques myself, each time I do it just doesn't feel right but hopefully we'll get used to it.

How can I query system information and metadata?

In the datawarehouse which is build on Teradata how can I find out how many Databases exist in the whole datawarehouse, how many data marts exist in the warehouse, which databases have the most tables, which databases are most frequently used. This is certainly a programming question, because I am asking how to query the Datawarehouse to get the desired informations.
I would like to get a look and feel about the datawarehouese. Similar informations or suggestions would certainly help - what should I keep an eye on? What is the "heart" ot the Data warehouse. What is the first thing you need to look when you start to work with complete new Datawarehouse?
Go to the Teradata Documentation web site and find the "Data Dictionary" book for the version of Teradata you are using. There are numerous dictionary views available.
The one in particular that includes all databases in the environment is called "dbc.databases", so run this:
select *
from dbc.databases
where DBKind = 'D'
The other value for DBKind is 'U', which would include users on the system.
Information about tables is in dbc.tables and other views. I'm not aware of any Teradata concept of "data mart" so I can't help you there.
Answering a question like "most frequently used" would require using one of the query log tables (DBQL). However, you should ask your system DBA if these views are available to you.
-- how many databases exist
SEL COUNT(*)
FROM dbc.databases
WHERE dbkind = 'D'
-- which databases have the most tables?
SEL databasename, COUNT(*)
FROM dbc.tables
WHERE tablekind = 'T' GROUP BY 1 ORDER BY 2 DESC
TABLEKIND definitions
A: aggregate UDF
B: COMBINED AGGREGATE AND ORDERED ANALYTICAL FUNCTION
E: EXTERNAL STORED PROCEDURE
F: SCALAR UDF
G: TRIGGER
H: INSTANCE OR CONSTRUCTOR METHOD
I: JOIN INDEX
J: JOURNAL
M: MACRO
N: HASH INDEX
P: STORED PROCEDURE
Q: QUEUE TABLE
R: TABLE FUNCTION
S: ORDERED ANALYTICAL FUNCTION
T: TABLE
U: USER-DEFINED DATA TYPE
V: VIEW
X: AUTHORIZATION
-- which databases are most frequently used.
SEL DatabaseName, AccessCount, LastAccessTimeStamp
FROM dbc.databases ORDER BY AccessCount
Also be sure to check out the dbc.columns table for information on what columns are in each table, their datatypes, etc.

Oracle: Export related data

I'm looking for a way to export related data spread over several tables, and to import that data in another schema. I'm working with an Oracle 11g Database.
To simplify my case I have tables A, B and C where B has a foreign key on A, and C has a foreign key to B. Having 1 entry in A, I would like to extract all entries relating to this entry from A, B and C and insert them into another schema. Please keep in mind that in my real-world scenario its not A, B and C, but 102 separate tables (don't ask, not my design ;-)).
What I am looking for is a tool that will use the knowledge of the relations between the tables to do the export, without the need for me to specify which tables are connected through which fields.
Is there a way to do that and stay sane?
Data pump will let you supply a predicate per table for extracting the data, so it's a "simple" matter of relating each table to the one that specifies the data for which related data is to be exported. Typically the predicate would be something like "customer_id in (select customer_id from customers).

Resources