SQLite: What can cause indexing to fail?

SQLite: What can cause indexing to fail? - sqlite

This is a very general question:
Can you think of a reason why the following would break on very large tables (> 1 billion rows)?
sqlite3 sample_DB.db "CREATE INDEX IF NOT EXISTS sample_index ON sample_table(sample_row)"
I have tried this a couple of times and it seems that it does not even give any error message, but the processing stops at some point and there is no index upon .schema.
The hard-disk is not running full. There is basically no memory consumption in the processing and if there were, there is plenty available.
The database is more than 800GB large, but I thought that the file-limit for ext4 is 2TB.
In the current state of the database:
PRAGMA page_size returns 4096
PRAGMA page_count returns 185974887
PRAGMA max_page_count returns 1073741823
PRAGMA freelist_count returns 0

Related

ROCKSDB Failed to acquire lock due to rocksdb_max_row_locks limit

Version: 10.4.8-MariaDB
engine: ROCKSDB
I have a table labor with 40 Mio rows and a table map with 200,000 rows and I wanted to update some columns of labor with. Since I got performance problems as the table increased I decided to migrate from InnoDB to ROCKSDB engine:
ALTER TABLE some.labor ENGINE=RocksDB;
When I wanted to partition:
ALTER TABLE some.labor PARTITION BY KEY() PARTITIONS 10;
I got this error:
SQL Fehler (1296): Got error 10 'Operation aborted: Failed to acquire lock due to rocksdb_max_row_locks limit' from ROCKSDB
I found this solution in SO:
SET session rocksdb_bulk_load=1;
After I changed the status the table could be partioned.
After that I wanted to do an update:
UPDATE some.labor r
INNER JOIN some.map m
ON r.analysis_1 <=> m.analysis_1
AND r.analysis_2 <=> m.analysis_2
AND r.unit <=> m.unit
AND r.praxis_id <=> m.praxis_id
SET r.analysis = m.analysis_new
, r.unit = m.unit_new
;
<=> is needed since some fields contains NULL. All 4 join variables are indexed.
I got the same error as before
SQL Fehler (1296): Got error 10 'Operation aborted: Failed to acquire lock due to rocksdb_max_row_locks limit' from ROCKSDB
although the status remained
SET session rocksdb_bulk_load=1;
Any idea how I can cope with this?

PLSQL: Query to know how many CPUs are available for use in PARALLEL clauses

I am not sure if this could be possible using sql statement.
I am looking for sql query to know number of CPUs available on hardware to be included in PARALLEL clause. We do not have access to our Linux environment and hence has we are seeking any possiblity to know this value. Is it possible using SQL? Kindly suggest.
Actually my index creation script is taking longer then expected time and it was implemented with "NOLOGGING PARALLEL COMPRESS" clause.
Kindly suggest if leaving number "N" in PARALLEL and COMPRESS clause is ok.
How Oracle manage the degree of parallelism in case we miss number of CPU information in PARALLEL clause.

In sqlplus you can use the below command to see number of cpu.
show parameter CPU_COUNT;
SQL> show parameter CPU_COUNT;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 2
Alternatively you can query v$parameter to get the same value
SQL> select value from v$parameter where name like 'cpu_count';
VALUE
--------------------------------------------------------------------------------
2
Creating an index with NOLOGGING PARALLEL COMPRESS is optional but they bring in some values when you use it. Compressed indexes save space and reduce CPU time cause they take less space. If you have to scan 100 blocks -- do 100 latches, 100 consistent gets that takes a certain amount of CPU. Now, if it is compressed -- you have to do many 60 blocks, 60% of the CPU as before. Apart you store more index entries per leaf block.
For how oracle works in parallel read below:
https://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm

If the PARALLEL clause is specified but no degree of parallelism is listed, the object gets the default DOP. Default parallelism uses a formula to determine the DOP based on the system configuration, as in the following:
For a single instance, DOP = PARALLEL_THREADS_PER_CPU x CPU_COUNT
For an Oracle RAC configuration, DOP = PARALLEL_THREADS_PER_CPU x
CPU_COUNT x INSTANCE_COUNT
you can query v$parameter to get the above two parameters name - cpu_count and parallel_threads_per_cpu
SQL> SELECT name, value FROM v$parameter WHERE UPPER(name) IN ('CPU_COUNT','PARALLEL_THREADS_PER_CPU');

SQLite3 + Rasperry Pi3 - 1st select on startup slow

I am using the pi to record video surveillance data in a single, but highly populated table per camera. The table consists of 3 columns - Timestamp, offset and frame length. The video data is stored in a separate file on the filesystem. My code is written in C.
Timestamp is the date/time for a video frame in the stream, offset is the fseek/ftell offsets into the streaming data file and frame length is the length of the frame. Pretty self explanatory. The primary and only index is on the timestamp column.
There is one database writer forked process per camera and there could be multiple forked read-only processes querying the database at any time.
These processes are created by socket listeners in the classic client/server architecture which accept video streams from other processes that manage the surveillance cameras and clients that query it.
When a read-only client connects, it selects the first row in the database for a selected camera. For some reason, the select takes > 60 secs and subsequent selects on the same query are very snappy (much less than 1 sec). I've debugged the code to determine this is the cause.
I have these pragma's configured both for the reader and writer forked processes and have tried greater and lesser values with minimal if any impact:
pragma busy_timeout=7000
pragma cache_size=-4096
pragma mmap_size=4194304
I am assuming the cause is due to populating the SQLite3 caches when a read-only client connects, but I'm not sure what else to try.
I've implemented my own write caching/buffering strategy to help prevent locks, which helped significantly, but it did not solve the delay at startup problem.
I've also split the table by weekday in an attempt to help control the table population size. It seems once the population nears 100,000 rows, the problem starts occurring. The population for a table can be around 2.5 million rows per day.
Here is the query:
sprintf(sql, "select * from %s_%s.VIDEO_FRAME where TIME_STAMP = "
"(select min(TIME_STAMP) from %s_%s.VIDEO_FRAME)",
cam_name, day_of_week, cam_name, day_of_week);
(edit)
$ uname -a
Linux raspberrypi3 4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux
$ sqlite3
sqlite> .open Video_Camera_02__Belkin_NetCam__WED.db
sqlite> .tables VIDEO_FRAME
sqlite> .schema VIDEO_FRAME CREATE TABLE VIDEO_FRAME(TIME_STAMP UNSIGNED BIG INT NOT NULL,FRAME_OFFSET BIGINT, FRAME_LENGTH INTEGER,PRIMARY KEY(TIME_STAMP));
sqlite> explain query plan
...> select * from VIDEO_FRAME where TIME_STAMP = (select min(TIME_STAMP) from VIDEO_FRAME);
0|0|0|SEARCH TABLE VIDEO_FRAME USING INDEX sqlite_autoindex_VIDEO_FRAME_1 (TIME_STAMP=?)
0|0|0|EXECUTE SCALAR SUBQUERY 1 1|0|0|SEARCH TABLE VIDEO_FRAME USING COVERING INDEX sqlite_autoindex_VIDEO_FRAME_1 –
After some further troubleshooting, the culprit seems to be with the forked db writer process. I tried starting the r/o clients with no streaming data being written and the select returned immediately. I haven't found the root problem, but at least have isolated where it is coming from.
Thanks!

PostgreSQL stack depth limit exceeded in hstore query though query is < 2 MB

I run into stack depth limit exceeded when trying to store a row from R to PostgreSQL. In order to address bulk upserts I have been using a query like this:
sql_query_data <- sprintf("BEGIN;
CREATE TEMPORARY TABLE
ts_updates(ts_key varchar, ts_data hstore, ts_frequency integer) ON COMMIT DROP;
INSERT INTO ts_updates(ts_key, ts_data) VALUES %s;
LOCK TABLE %s.timeseries_main IN EXCLUSIVE MODE;
UPDATE %s.timeseries_main
SET ts_data = ts_updates.ts_data,
ts_frequency = ts_updates.ts_frequency
FROM ts_updates
WHERE ts_updates.ts_key = %s.timeseries_main.ts_key;
INSERT INTO %s.timeseries_main
SELECT ts_updates.ts_key, ts_updates.ts_data, ts_updates.ts_frequency
FROM ts_updates
LEFT OUTER JOIN %s.timeseries_main ON (%s.timeseries_main.ts_key = ts_updates.ts_key)
WHERE %s.timeseries_main.ts_key IS NULL;
COMMIT;",
values, schema, schema, schema, schema, schema, schema, schema)
}
So far this query worked quite well for updating millions of records while holding the number of inserts low. Whenever I ran into stack size problems so far I simply split my records into multiple chunks and go on from there.
However, this strategy faces some trouble now. I don't have a lot of records anymore, but a handful in which the hstore is a little bit bigger. But it's not really 'large' by any means. I read suggestions by #Craig Ringer who advises not to near the limit of 1GB. So I assume the size of the hstore itself is not the problem, but I receive this message:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
)
EDIT: I did increase the limit to 7 MB and ran into the same error stating 7 MB is not enough. This is really odd to me, because I the query itself is only 1.7 MB (checked it by pasting it to a text file). Can anybody shed some light on this?

Increase the max_stack_depth as suggested by the hint. [From the official documentation]
(http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html):
The ideal setting for this parameter is the actual stack size limit enforced by the kernel (as set by ulimit -s or local equivalent), less a safety margin of a megabyte or so.
and
The default setting is two megabytes (2MB), which is conservatively small and unlikely to risk crashes.
Super Users can alter this setting per connection, or it can be set for all users through the postgresql.conf file (requires postgres server restart).

Sqlite3: Disabling primary key index while inserting?

I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)
The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).
So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.
My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.
What would be a correct way to do something like that?
UPDATE:
I forgot to specify the flags I'm already using:
PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY
UPDATE 2:
I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.

You can't remove embedded index since it's the only address of row.
Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
Set page size for new DB at least 4096
Remove ANY additional index except primary
Fill in data in the SORTED order so that primary key is growing.
Reuse commands, don't create each time them from string
Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
Commit every 50000 items.
If you have additional indexes - create them only AFTER ALL data is in table
If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.

Are you doing the INSERT of each new as an individual Transaction?
If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.

See faster-bulk-inserts-in-sqlite3.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex