Fast batch executions in PostgreSQL - qt

I have a lots of data and I want to insert to DB in the least time. I did some tests. I created a table (using the below script) in PostgreSQL:
CREATE TABLE test_table
(
id serial NOT NULL,
item integer NOT NULL,
count integer NOT NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table OWNER TO postgres;
I wrote test code, created 1000 random values and insert to test_table in two different ways. First, using QSqlQuery::exec()
int insert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
query.bindValue(":item", qrand());
query.bindValue(":count", qrand());
if (!query.exec()) {
qDebug() << query.lastQuery();
qDebug() << query.lastError();
}
} //end of for i
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
Second using QSqlQuery::execBatch:
int batchInsert() {
QSqlDatabase db = QSqlDatabase::addDatabase("QPSQL");
db.setHostName("127.0.0.1");
db.setDatabaseName("TestDB");
db.setUserName("postgres");
db.setPassword("1234");
if (!db.open()) {
qDebug() << "can not open DB";
return -1;
}
QString queryString = QString("INSERT INTO test_table (item, count)"
" VALUES (:item, :count)");
QSqlQuery query;
query.prepare(queryString);
QVariantList itemList;
QVariantList CountList;
QDateTime start = QDateTime::currentDateTime();
for (int i = 0; i < 1000; i++) {
itemList.append(qrand());
CountList.append(qrand());
} //end of for i
query.addBindValue(itemList);
query.addBindValue(CountList);
if (!query.execBatch())
qDebug() << query.lastError();
QDateTime end = QDateTime::currentDateTime();
int diff = start.msecsTo(end);
return diff;
}
I found that there is no difference between them:
int main() {
qDebug() << insert() << batchInsert();
return 1;}
Result:
14270 14663 (milliseconds)
How can I improve it?
In http://doc.qt.io/qt-5/qsqlquery.html#execBatch has been cited:
If the database doesn't support batch executions, the driver will
simulate it using conventional exec() calls.
I'm not sure my DBMS support batch executions or not?
How can I test it?

In not sure what the qt driver does, but PostgreSQL can support running multiple statements in one transaction. Just do it manually instead of trying to use the built in feature of the driver.
Try changing your SQL statement to
BEGIN TRANSACTION;
For every iteration of loop run an insert statement.
INSERT HERE;
Once end of loop happens for all 1000 records issue this. On your same connection.
COMMIT TRANSACTION;
Also 1000 rows is not much to test with, you might want to try 100,000 or more to make sure the qt batch really wasn't helping.

By issuing 1000 insert statements, you have 1000 round trips to the database. This takes quite some time (network and scheduling latency). So try to reduce the number of insert statements!
Let's say you want to:
insert into test_table(item, count) values (1000, 10);
insert into test_table(item, count) values (1001, 20);
insert into test_table(item, count) values (1002, 30);
Transform it into a single query and the query will need less than half of the time:
insert into test_table(item, count) values (1000, 10), (1001, 20), (1002, 30);
In PostgreSQL, there is another way to write it:
insert into test_table(item, count) values (
unnest(array[1000, 1001, 1002])
unnest(array[10, 20, 30]));
My reason for presenting the second way is that you can pass all the content of a big array in a single parameter (tested with in C# with the database driver "Npgsql"):
insert into test_table(item, count) values (unnest(:items), unnest(:counts));
items is a query parameter with the value int[]{100, 1001, 1002}
counts is a query parameter with the value int[]{10, 20, 30}
Today, I have cut down the running time of 10,000 inserts in C# from 80s to 550ms with this technique. It's easy. Furthermore, there is not any hassle with transactions, as a single statement is never split into multiple transactions.
I hope this works with the Qt PostgreSQL driver, too. On the server side, you need PostgreSQL >= 8.4., as older versions do not provide unnest (but there may be work arounds).

You can use QSqlDriver::hasFeature with argument QSqlDriver::BatchOperations
In the 4.8 sources, I found that only oci (oracle) support the BatchOperations. Don't know why not use the COPY statement for postgresql in the psql driver.

Related

Is there a way to workaround the limit of 255 types in a flatbuffers union?

I am using flatbuffers to serialize rows from sql tables. I have a Statement.fbs that defines a statement as Insert, Update, Delete, etc. The statement has a member "Row" that is a union of all sql table types. However, I have more than 255 tables and I get this error when compiling with flatc:
$ ~/flatbuffers/flatc --cpp -o gen Statement.fbs
error: /home/jkl/fbtest/allobjects.fbs:773: 18: error: enum value does not fit [0; 255]
I looked through the flatbuffers code and I see that an enum is automatically created for union types and that the underlying type of this enum is uint8_t.
I do not see any options for changing this behavior.
I am able to create an enum that handles all my tables by specifying the underlying type to be uint16 in my flatbuffer schema file.
The statement schema:
include "allobjects.fbs";
namespace Database;
enum StatementKind : byte { Unknown = 0, Insert, Update, Delete, Truncate }
table Statement {
kind:StatementKind;
truncate:[TableKind];
row:Row;
}
root_type Statement;
The allobjects Row union is a bit large to include here.
union Row {
TypeA,
TypeB,
TypeC,
Etc,
...
}
I suppose this is a design decision for flatbuffers that union types should only use one byte. I can accept that, but I would really like a workaround.
This sadly is a bit of a design mistake, and there is no workaround yet. Fixing this to be configurable is possible, but would be a fair bit of work given the amount of language ports that rely on it being a byte. See e.g. here: https://github.com/google/flatbuffers/issues/4209
Yes, multiple unions is a clumsy workaround.
An alternative could be to define the type as an enum. Now you have the problem that you don't have a typesafe way to store the table, though. That could be achieved with a "nested flatbuffer", i.e. storing the union value as a vector of bytes, which you can then cheaply call GetRoot on with the correct type, once you inspected the enum.
Another option may be an enum + a union, if the number of unique kinds of records is < 256. For example, you may have multiple row types that even though they have different names, their contents is just a string, so they can be merged for the union type.
Another hack could be to have declare a table RowBaseClass {} or whatever, which would be the type of the field, but you would never actually instantiate this table. You then cast back and forth to that type to store the actual table, dependending on the language you're using.
The nested buffer solution to the 255 limit of unions is pretty straight forward.
allobjects.fbs:
namespace Database;
table Garbage {
gid:ulong;
type:string;
weight:uint;
}
... many more ...
Statement.fbs:
include "allobjects.fbs";
namespace Database;
enum StatementKind : byte { Unknown = 0, Insert, Update, Delete, Truncate }
// suppose this enum holds the > 255 Row types
enum TableKind : uint16 { Unknown = 0, Garbage, Etc... }
// this is the "union", but with a type enum beyond ubyte size
table Row {
kind:TableKind;
// this payload will be the nested flatbuffer
payload:[ubyte];
}
table Statement {
kind:StatementKind;
truncate:[TableKind];
row:Row;
}
root_type Statement;
main.c:
#include <iostream>
#include "Statement_generated.h"
void encodeInsertGarbage(unsigned long gid,
const std::string& type,
unsigned int weight,
std::vector<uint8_t>& retbuf)
{
flatbuffers::FlatBufferBuilder fbb;
// create Garbage flatbuffer
// I used the "Direct" version so I didn't have to create a flatbuffer string object
auto garbage = Database::CreateGarbageDirect(fbb, gid, type.c_str(), weight);
fbb.Finish(garbage);
// make [ubyte] from encoded "Garbage" object
auto payload = fbb.CreateVector(fbb.GetBufferPointer(), fbb.GetSize());
// make the generic Row homebrewed union
auto obj = Database::CreateRow(fbb, Database::TableKind_Garbage, payload);
fbb.Finish(obj);
// create the Statement - 0 for "truncate" since that is not used for Insert
auto statement = Database::CreateStatement(fbb, Database::StatementKind_Insert, 0, obj);
fbb.Finish(statement);
// copy the resulting flatbuffer to output vector
// just for this test program, typically you write to a file or socket.
retbuf.assign(fbb.GetBufferPointer(), fbb.GetBufferPointer() + fbb.GetSize());
}
void decodeInsertGarbage(std::vector<uint8_t>& retbuf)
{
auto statement = Database::GetStatement(retbuf.data());
auto tableType = statement->row()->kind();
auto payload = statement->row()->payload();
// just using a simple "if" statement here, but a full solution
// could use an array of getters, indexed by TableKind, then
// wrap it up nice with a template function to cast the return type
// like rowGet<Garbage>(payload);
if (tableType == Database::TableKind_Garbage)
{
auto garbage = Database::GetGarbage(payload->Data());
std::cout << " gid: " << garbage->gid() << std::endl;
std::cout << " type: " << garbage->type()->c_str() << std::endl;
std::cout << " weight: " << garbage->weight() << std::endl;
}
}
int main()
{
std::vector<uint8_t> iobuf;
encodeInsertGarbage(0, "solo cups", 12, iobuf);
decodeInsertGarbage(iobuf);
return 0;
}
Output:
$ ./fbtest
gid: 0
type: solo cups
weight: 12

QSqlRelationalTableModel - insert record greater than 256

I have a table node={id,name}, and a table segment={id,nodeFrom,nodeTo} in a SQLite db, where node.id and segment.id are AUTOINCREMENT fields.
I'm creating a QSqlTableModel for Node, as follows:
nodeModel = new QSqlTableModel(this,db);
nodeModel->setTable("Node");
nodeModel->setEditStrategy(QSqlTableModel::OnFieldChange);
and I use the following code for inserting nodes:
int addNode(QString name) {
QSqlRecord newRec = nodeModel->record();
newRec.setGenerated("id",false);
newRec.setValue("name",name);
if (not nodeModel->insertRecord(-1,newRec))
qDebug() << nodeModel->lastError();
if (not nodeModel->submit())
qDebug() << nodeModel->lastError();
return nodeModel->query().lastInsertId().toInt();
}
This seems to work. Now, for segments I define a QSqlRelationalTableModel, as follows:
segModel = new QSqlRelationalTableModel(this,db);
segModel->setTable("Segment");
segModel->setEditStrategy(QSqlTableModel::OnManualSubmit);
segModel->setRelation(segModel->fieldIndex("nodeFrom"),
QSqlRelation("Node","id","name"));
segModel->setRelation(segModel->fieldIndex("nodeTo"),
QSqlRelation("Node","id","name"));
And then I have the following code for inserting segments:
int addSegment(int nodeFrom, int nodeTo) {
QSqlRecord newRec = segModel->record();
newRec.setGenerated("id",false);
newRec.setValue(1,nodeFrom);
newRec.setValue(2,nodeTo);
if (not segModel->insertRecord(-1,newRec)) // (*)
qDebug() << segModel->lastError();
if (not segModel->submitAll())
qDebug() << segModel->lastError(); // (*)
}
I can add successfully 280 nodes using addNode(). I can also add segments sucessfully if nodeFrom<=256 and nodeTo<=256. For any segment referencing a node greater or equal to 256 I get a
QSqlError("19", "Unable to fetch row", "Segment.nodeTo may not be NULL")
in one of the lines marked with a (*) of the addSegment function.
I've googled and found out that people are having other (apparently unrelated) problems when they hit the magical 256 record count. No solution seems to work with this particular problem.
What am I doing wrong?
Thanks!
The reason of this error lies in the void QRelation::populateDictionary() method which uses such a loop for (int i=0; i < model->rowCount(); ++i). If you use the database that does not report the size of the query back (e.g. SQLite), the rowCount() method will return this magical 256 value.
You can solve this by populating the relation model before using data(...) or setData(...). At first you can try with:
setRelation(nodeFromCol, QSqlRelation("Node", "id", "name"));
QSqlTableModel *model = relationModel(nodeFromCol);
while(model->canFetchMore())
model->fetchMore();
Try this way to fix
newRec.setValue(1,QVariant(nodeFrom));
newRec.setValue(2,QVariant(nodeTo));

Access all fields of an SQLite query record result?

I'm learning Sqlite in Qt but have run into a problem accessing record values returned by a QSqlQuery.
The details are below but the gist is: I get a QSqlRecord back from a query and want to access all fields of the record but QSqlRecord.count is reporting only one column when there clearly are two (in the example they are id and keyword).
Am I misunderstanding SQLite and what a query does, or is this a problem with how I am trying to access the records?
This is my schema:
This is my test data:
Full code:
void MainWindow::on_addKeywordBtn_clicked()
{
// find a matching keyword
QSqlQuery query(db);
query.prepare("SELECT keyword FROM keywords WHERE keyword = ?");
query.addBindValue(QString("blue"));
query.exec();
while (query.next()) {
QString k = query.value(0).toString();
qDebug() << "found" << k;
QSqlRecord rec = query.record();
qDebug() << "Number of columns: " << rec.count();
int idIndex = rec.indexOf("id");
int keywordIndex = rec.indexOf("keyword");
qDebug() << query.value(idIndex).toString() << query.value(keywordIndex).toString();
}
}
Console output:
found "blue"
Number of columns: 1
QSqlQuery::value: not positioned on a valid record
"" "blue"
Your mistake is in this line, actually query
query.prepare("SELECT keyword FROM keywords WHERE keyword = ?");
in your code you explicitly instruct database to return you only one column, proper solutions would be:
query.prepare("SELECT * FROM keywords WHERE keyword = ?");
or
query.prepare("SELECT id, keyword FROM keywords WHERE keyword = ?");

multiple sql statements in QSqlQuery using the sqlite3 driver

I have a file containing several SQL statements that I'd like to use to initialize a new sqlite3 database file. Apparently, sqlite3 only handles multiple statements in one query via the sqlite3_exec() function, and not through the prepare/step/finalize functions. That's all fine, but I'd like to use the QtSQL api rather than the c api directly. Loading in the same initializer file via QSqlQuery only executes the first statement, just like directly using the prepare/step/finalize functions from the sqlite3 api. Is there a way to get QSqlQuery to run multiple queries without having to have separate calls to query.exec() for each statement?
As clearly stated in Qt Documentation for QSqlQuery::prepare() and QSqlQuery::exec(),
For SQLite, the query string can contain only one statement at a time.
If more than one statements are give, the function returns false.
As you have already guessed the only known workaround to this limitation is having all the sql statements separated by some string, split the statements and execute each of them in a loop.
See the following example code (which uses ";" as separator, and assumes the same character not being used inside the queries..this lacks generality, as you may have the given character in string literals in where/insert/update statements):
QSqlDatabase database;
QSqlQuery query(database);
QFile scriptFile("/path/to/your/script.sql");
if (scriptFile.open(QIODevice::ReadOnly))
{
// The SQLite driver executes only a single (the first) query in the QSqlQuery
// if the script contains more queries, it needs to be splitted.
QStringList scriptQueries = QTextStream(&scriptFile).readAll().split(';');
foreach (QString queryTxt, scriptQueries)
{
if (queryTxt.trimmed().isEmpty()) {
continue;
}
if (!query.exec(queryTxt))
{
qFatal(QString("One of the query failed to execute."
" Error detail: " + query.lastError().text()).toLocal8Bit());
}
query.finish();
}
}
I wrote a simple function to read SQL from a file and execute it one statement at a time.
/**
* #brief executeQueriesFromFile Read each line from a .sql QFile
* (assumed to not have been opened before this function), and when ; is reached, execute
* the SQL gathered until then on the query object. Then do this until a COMMIT SQL
* statement is found. In other words, this function assumes each file is a single
* SQL transaction, ending with a COMMIT line.
*/
void executeQueriesFromFile(QFile *file, QSqlQuery *query)
{
while (!file->atEnd()){
QByteArray readLine="";
QString cleanedLine;
QString line="";
bool finished=false;
while(!finished){
readLine = file->readLine();
cleanedLine=readLine.trimmed();
// remove comments at end of line
QStringList strings=cleanedLine.split("--");
cleanedLine=strings.at(0);
// remove lines with only comment, and DROP lines
if(!cleanedLine.startsWith("--")
&& !cleanedLine.startsWith("DROP")
&& !cleanedLine.isEmpty()){
line+=cleanedLine;
}
if(cleanedLine.endsWith(";")){
break;
}
if(cleanedLine.startsWith("COMMIT")){
finished=true;
}
}
if(!line.isEmpty()){
query->exec(line);
}
if(!query->isActive()){
qDebug() << QSqlDatabase::drivers();
qDebug() << query->lastError();
qDebug() << "test executed query:"<< query->executedQuery();
qDebug() << "test last query:"<< query->lastQuery();
}
}
}
http://www.fluxitek.fi/2013/10/reading-sql-text-file-sqlite-database-qt/
https://gist.github.com/savolai/6852986

SQLite3: Insert BLOB with NULL characters in C++

I'm working on the development of a C++ API which uses custom-designed plugins
to interface with different database engines using their APIs and specific SQL
syntax.
Currently, I'm attempting to find a way of inserting BLOBs, but since NULL is
the terminating character in C/C++, the BLOB becomes truncated when constructing
the INSERT INTO query string. So far, I've worked with
//...
char* sql;
void* blob;
int len;
//...
blob = some_blob_already_in_memory;
len = length_of_blob_already_known;
sql = sqlite3_malloc(2*len+1);
sql = sqlite3_mprintf("INSERT INTO table VALUES (%Q)", (char*)blob);
//...
I expect that, if it is at all possible to do it in the SQLite3 interactive console, it should be possible to construct the query string with properly escaped NULL characters. Maybe there's a way to do this with standard SQL which is also supported by SQLite SQL syntax?
Surely someone must have faced the same situation before. I've googled and found some answers but were in other programming languages (Python).
Thank you in advance for your feedback.
Thank you all again for your feedback. This time I'm reporting how I solved the problem with the help of the indications provided here. Hopefully this will help others in the future.
As suggested by the first three posters, I did use prepared statements — additionally because I was also interested in getting the columns' data types, and a simple sqlite3_get_table() wouldn't do.
After preparing the SQL statement in the form of the following constant string:
INSERT INTO table VALUES(?,?,?,?);
it remains the binding of the corresponding values. This is done by issuing as many sqlite3_bind_blob() calls as the columns. (I also resorted to sqlite3_bind_text() for other "simple" data types because the API I'm working on can translate integers/doubles/etc into a string). So:
#include <stdio.h>
#include <string.h>
#include <sqlite3.h>
/* ... */
void* blobvalue[4] = { NULL, NULL, NULL, NULL };
int blobsize[4] = { 0, 0, 0, 0 };
const char* tail = NULL;
const char* sql = "INSERT INTO tabl VALUES(?,?,?,?)";
sqlite3_stmt* stmt = NULL;
sqlite3* db = NULL;
/* ... */
sqlite3_open("sqlite.db", &db);
sqlite3_prepare_v2(db,
sql, strlen(sql) + 1,
&stmt, &tail);
for(unsigned int i = 0; i < 4; i++) {
sqlite3_bind_blob(stmt,
i + 1, blobvalue[i], blobsize[i],
SQLITE_TRANSIENT);
}
if(sqlite3_step(stmt) != SQLITE_DONE) {
printf("Error message: %s\n", sqlite3_errmsg(db));
}
sqlite3_finalize(stmt);
sqlite3_close(db);
Note also that some functions (sqlite3_open_v2(), sqlite3_prepare_v2()) appear on the later SQLite versions (I suppose 3.5.x and later).
The SQLite table tabl in file sqlite.db can be created with (for example)
CREATE TABLE tabl(a TEXT PRIMARY KEY, b TEXT, c TEXT, d TEXT);
You'll want to use this function with a prepared statement.
int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*));
In C/C++, the standard way of dealing with NULLs in strings is to either store the beginning of the string and a length, or store a pointer to the beginning of a string and one to the end of the string.
You want to precompile the statement sqlite_prepare_v2(), and then bind the blob in using sqlite3_bind_blob(). Note that the statement you bind in will be INSERT INTO table VALUES (?).

Resources