does lmdb support "recno" type, or workalike? - berkeley-db

I'm playing w/ lmdb (coming from a BDB background) and wondering if lmdb supports "recno" style operation ? "recno" (record number) is a logical indexing method where the 20th record might be deleted, and a new record becomes the 20th record, or a record is inserted after 7, and the following records previously known as 8 - n become 9 - n+1.
Ideas?

No.
(filler for 30 character minimum)

Related

sqlite using blob for epoch datetime

I'm trying to decide what is the best way to store a datetime in sqlite. The date will be in epoch.
I've been reading on wiki about the 2038 problem (it's very much like the year 2000 problem). Taking this into account with what I've been reading on tutorialspoint:
From https://www.tutorialspoint.com/sqlite/sqlite_data_types.htm
Tutorialspoint suggests using the below data types form datetime.
SQLite does not have a separate storage class for storing dates and/or times, but SQLite is capable of storing dates and times as TEXT, REAL or INTEGER values.
But when I looked at the type descriptions, BLOB didn't have a size limit and represents the data as it is inserted into the database.
BLOB The value is a blob of data, stored exactly as it was input.
INTEGER The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
I saw on tutorials point that they suggest using sqlite type INTEGER for datetime. But taken with 2038 problem, I'm thinking that using BLOB is a better choice if I'm focusing on future proofing because BLOB does not have a dependence on a specific number of bytes like INTEGER does depend.
I'm new to database design, so I'm wondering what's best to do?
INTEGER as it says can be up to 8 bytes i.e. a 64 bit signed integer. Your issue is not SQLite being able to store values not subject to the 2038 issue with 32 bits. Your issue will be in retrieving a time from something that is not subject to the issue, that is unless you are trying to protect against the year 292,277,026,596 problem.
There is no need to use a BLOB and the added complexity and additional processing of converting between a BLOB and the time.
It may even be that you can use SQLite itself to retrieve suitable values, if you wanted to store the current time or a time based upon the current time aka now.
Perhaps consider the following :-
DROP TABLE IF EXISTS timevalues;
/* Create the table with 1 column with a weird type and a default value as now (seconds since Jan 1st 1970)*/
CREATE TABLE IF NOT EXISTS timevalues (dt typedoesnotmatterthtamuch DEFAULT (strftime('%s','now')));
/* INSERT 2 rows with dates of 1000 years from now */
INSERT INTO timevalues VALUES
(strftime('%s','now','+1000 years')),
((julianday('now','+1000 years') - 2440587.5)*86400.0);
/* INSERT a row using the DEFAULT */
INSERT INTO timevalues (rowid) /* specify the rowid column so there is no need to supply value for the dt column */
VALUES ((SELECT count() FROM timevalues)+1 /* get the highest rowid + 1 */);
/* Retrieve the data rowid, the value as stored in the dt column and the dt column converted to a user friendly format */
SELECT rowid,*, datetime(dt,'unixepoch') AS userfriendly FROM timevalues;
/* Cleanup the Environment */
DROP TABLE IF EXISTS timevalues;
Which results in :-
You would probably want to have a read of Date And Time Functions e.g. for strftime, julianday and now
rowid is a special normally hidden column that exists for all table unless it is WITHOUT ROWID table. It wouldn't typically be used, or if so aliased by using INTEGER PRIMARY KEY
see SQLite Autoincrement to find out about rowid and alias thereof and why not to use AUTOINCREMENT.
a column type of typedoesnotmatterthtamuch see Datatypes In SQLite Version 3 as to why this can be.

How to encode SQLite FTS3 strings

I'm working with a SQLite FTS3 table as explained here: https://www.sqlite.org/fts3.html
I'm interested in the field end_block, described as:
This field may contain either an integer or a text field consisting of
two integers separated by a space character (unicode codepoint 0x20).
The first, or only, integer is the blockid that corresponds to the
interior node with the largest blockid that belongs to this segment
b-tree. Or zero if the entire segment b-tree fits on the root node. If
it exists, this node is always an interior node.
The second integer, if it is present, is the aggregate size of all
data stored on leaf pages in bytes. If the value is negative, then the
segment is the output of an unfinished incremental-merge operation,
and the absolute value is current size in bytes.
I'm trying to make a consistency checker to make sure some FTS3 tables haven't been modified.
I need a way to encode strings in FTS3 to get the block_number but haven't been able to find anything on the internet. Some example of encoding:
Good morning! How is it going? - 0 96
Everything is okey - 0 71
Okay I will get back to you once everything is in place - 0 167
EDIT: To clarify the question, what I really need is some method to input a string as "Everything is okey" and get the second integer of the end_block field (71).
Any idea?

Is it possible to find out a value that is the most different with pure Sqlite3?

Lets say I have a list of url's and I want to find out the domain that is the that appears the fewest times. Here is an example of the database:
3598 ('www.emp.de/blog/tag/fear-factory/')
3599 ('www.emp.de/blog/tag/white-russian/')
3600 ('www.emp.de/blog/musik/die-emp-plattenkiste-zum-07-august-2015/')
3601 ('www.emp.de/Warenkorb/car_/')
3602 ('www.emp.de/ter_dataprotection/')
3603 ('hilfe.monster.de/my20/faq.aspx#help_1_211589')
3604 ('jobs.monster.de/l-nordrhein-westfalen.aspx')
3605 ('karriere-beratung.monster.de')
3606 ('karriere-beratung.monster.de')
In this case it should return jobs.monster.de or hilfe.monster.de. I only want one return value. Is that possible with pure Sqlite3?
It should be some kind of counting of the main url before the ".de"
At this moment I do it this way:
con.execute("select url, date from urls_to_visit ORDER BY RANDOM() LIMIT 1")
Here's a query which should handle this correctly:
SELECT substr(url, 1, instr(url, '.de')-1)
FROM urls_to_visit
WHERE url LIKE '%.de%'
-- insurance, can leave out if you're sure the whole table matches
GROUP BY substr(url, 1, instr(url, '.de')-1)
ORDER BY count(*) ASC, RANDOM()
LIMIT 1;
Group on the thing we want to sort by, then order by count(*). This expression extracts the part of the URL before the .de/:
substr(url, 1, instr(url, '.de')-1)
The RANDOM() ensures that ties are broken randomly instead of by following the table's natural ordering.* It only comes into play if there is a tie, as described in the SQLite documentation.
* Technically, the rows would not appear in natural order, but in arbitrary order. That means whatever order is most convenient for the query planner. Database systems often use merge sort or a variant, which is a stable sort, so ties will be consistently broken in the order the rows were fed into the sorting algorithm. Unless the query can benefit significantly from index lookups, which this one almost certainly can't, the most likely query plan is a full table scan, so the sort will typically end up following natural order. But you can't rely on any of this, since the standard does not formally require it.

SQLite integer size: individually sized or for the entire group

Taken straight off of SQLite's site "The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value."
Does this mean that if you have 1 value that requires 8 bytes, ALL values in that column will be treated as 8 bytes. Or, if the rest are all 1 byte, and one value is 8 bytes, will only that value be using 8 bytes and the rest will remain at 1?
I'm more used to SQL in which you specify the integer size accordingly.
I know the question seems trivial, but based on the answer will determine how I handle a piece of the database.
The sqlite database structure is different in the way it handles data types. Each field can have a different type...
Here is the documentation from sqlite:
Most SQL database engines use static typing. A datatype is associated with each column
in a table and only values of that particular datatype are allowed to be stored in that
column. SQLite relaxes this restriction by using manifest typing. In manifest typing, the
datatype is a property of the value itself, not of the column in which the value is
stored. SQLite thus allows the user to store any value of any datatype into any column
regardless of the declared type of that column. (There are some exceptions to this rule:
An INTEGER PRIMARY KEY column may only store integers. And SQLite attempts to coerce
values into the declared datatype of the column when it can.)

Hbase schema design -- to make sorting easy?

I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should increment the counter by 1 for "Obama" and "president".
And from time to time, I want to see the top 100 words (most queried words). If I use Hbase to store the counter, what schema should I use? -- I have not come up an efficient one yet.
If I use word in my dictionary as row key, and "counter" as column key, then updating counter(increment) is very efficient. But it's very hard to sort and return the top 100.
Anyone can give a good advice? Thanks.
You can use the natural schema (row key as word and column as count) and use IHBase to get a secondary index on the count column. See https://issues.apache.org/jira/browse/HBASE-2037 for the initial implementation; the current code lives at http://github.com/ykulbak/ihbase.
From Adobe's presentation at HBaseCon 2012 (slide 28 in particular), I suggest using two tables and this sort of data structure for the row key:
name
President => 1000
Test => 900
count
429461296:President => dummyvalue
429461396:Test => dummyvalue
The second table's row keys are derived by using Long.MAX_VALUE - count at that point of time.
As you get new words, just add the "count:word" as a row key to the count table. That way, you always have the top words returned first when you scan the table.
Sorting 1M longs can be done in memory, so what?
Store words x,y,z issued at time t as key:t cols:word:x=1 word:y=1 word:z=1 in a table. Then use a MapRed job to sum up counts for words and get the top 100.
This also enables further analysis.

Resources