How to export large table from Teradata - teradata

What would be the best way to export large table (e.g. over 10 billion rows) from Teradata? In terms of speed and resources consumption.
I know FastExport tool which uses fastexport mode, but it still requires the results to be put in the spool before sending to the client. Before it allowed avoiding spool by forcing nospoolonly mode but this seems to be broken in recent releases. Therefore if I use it with select * from table query the whole table will be copied, which would require massive spool file allowance.
I also came across JDBC driver and PT API, however they seem to be using the same underlying mechanisms.
Is there a better way?

Related

Can SqlServer in-memory tables be replacement with ASP.NET cache?

Our current system was architectured like;
We have around 5 million records in a DB table. Depending on the need, we get, say a resultset of 1 million records and keep them in cache throughout the application, when we are done, get rid of them.
Now, instead of using .NET application's memory, "Is it possible to use in-memory tables to keep those 1 million records in an in-memory table whilst disk based table still keeps 5 million records?"
That's possible. Performance will still be less than with in-process data. One of the most expensive things to do when executing a cheap SQL statement is all the overhead of executing anything at all (network, serialization, ...).
You will need to measure (or have a good understanding of) whether the now reduced performance is still enough.
If the existing system works without problems there is no need to change anything.
SQL Server already has advanced caching algorithms built-in. How big is your table? 5 million rows are not that big these days, the entire table might be already cached into memory and you can just use SELECT queries by primary key.

Is it possible and meaningful to execute an Array DML command containing BLOB data?

Is it possible to execute an Array DML INSERT or UPDATE statement passing a BLOB field data in the parameter array ? And the more important part of my question, if it is possible, will Array DML command containing BLOB data still be more efficient than executing commands one by one ?
I have noticed that TADParam has a AsBlobs indexed property so I assume it might be possible, but I haven't tried this yet because there's no mention of performance nor example showing this and because the indexed property is of type RawByteString which is not much suitable for my needs.
I'm using FireDAC and working with SQLite database (Params.BindMode = pbByNumber, so I'm using native SQLite INSERT with multiple VALUES). My aim is to store about 100k records containing pretty small BLOB data (about 1kB per record) as fast as possible (in cost of the FireDAC's abstraction).
The main point in your case is that you are using a SQLIte3 database.
With SQLite3, Array DML is "emulated" by FireDAC. Since it is a local instance, not a client-server instance, there is no need to prepare a bunch of rows, then send them at once to avoid network latency (as with Oracle or MS SQL).
Using Array DML may speed up your insertion process a little bit with SQLite3, but I doubt it will very high. Good plain INSERT with binding per number will work just fine.
The main tips about performance in your case will be:
Nest your process within a single transaction (or even better, use one transaction per 1000 rows of data);
Prepare an INSERT statement, then re-execute it with a bound parameter each time;
By default, FireDAC initialize SQLite3 with the fastest options (e.g. disabling LOCK), so let it be.
SQlite3 is very good about BLOB process.
From my tests, FireDAC insertion timing is pretty good, very close to direct SQlite3 access. Only reading is slower than a direct SQLite3 link, due to the overhead of the Delphi TDataSet class.

Multiple files for a single SQLite database

Afaik, SQLite stores a single database in a single file. Since this would decrease the performance when working with large databases, is it possible to explicitly tell SQLite not to store the whole DB in a single file and store different tables in different files instead?
I found out, that it is possible.
Use:
sqlite3.exe MainDB.db
ATTACH DATABASE 'SomeTableFile.db' AS stf;
Access the table from the other database file:
SELECT * FROM stf.SomeTable;
You can even join over several files:
SELECT *
FROM MainTable mt
JOIN stf.SomeTable st
ON (mt.id = st.mt_id);
https://www.sqlite.org/lang_attach.html
tameera said there is a limit of 62 attached databases but I never hit that limit so I can't confirm that.
The big advantage besides some special cases is that you limit the fragmentation in the database files and you can use the VACUUM command separately on each table!
If you don't need a join between these tables you can manually split the DB and say which tables are in which DB (=file).
I don't think that it's possible to let SQLite split your DB in multiple files, because you connect to a DB by telling the filename.
SQLite database files can grow quite large without any performance penalties.
The things that might degrade performance are:
file-locking contention
table size (if using indexes and issuing write queries)
Also, by default, SQLite limits the number of attached databases to 10.
Anyway, try partition your tables. You'll see that SQLite can grow enormously this way.

How can i improve the performance of the SQLite database?

Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.

sqlite3 bulk insert from C?

I came across the .import command to do this (bulk insert), but is there a query version of this which I can execute using sqlite3_exec().
I would just like to copy a small text file contents into a table.
A query version of this one below,
".import demotab.txt mytable"
Sqlite's performance doesn't benefit from bulk insert. Simply performing the inserts separately (but within a single transaction!) provides very good performance.
You might benefit from increasing sqlite's page cache size; that depends on the number of indexes and/or the order in which the data is inserted. If you don't have any indexes, for a pure insert, the cache size is likely not to matter much.
Be sure to use a prepared query, as opposed to regenerating a query plan in the innermost loop. It's extremely important to wrap the statements in a transaction since this avoids the need for the filesystem to sync the database to disk - afterall, partially a written transaction is atomically aborted anyhow, meaning that all fsync()'s are delayed until the transaction completes.
Finally, indexes will limit your insert performance since their creation is somewhat expensive. If you're really dealing with a lot of data and start off with an empty table, it may be beneficial to add the indexes after the data - though this isn't a huge factor.
Oh, and you might want to get one of those intel X25-E SSD's and ensure you have an AHCI controller ;-).
I'm maintaining an app with sqlite db's with about 500000000 rows (spread over several tables) - much of which was bulk inserted using plain old begin-insert-commit: it works fine.

Resources