Import BLOBs from a CSV to an SQLite table - sqlite

How should I put UUIDs into a CSV file in order to make SQLite .import command to load them into table as 128-bit BLOBs?

As far as I know, the only ways to generate a blob from the sqlite3 shell are using the zeroblob(), randomblob() and readfile() sql functions, CASTing a value, or as a base 16 blob literal (X'1234ABCD').
If your UUIDs are already represented as big endian 128 bit binary numbers in the CVS file, you might be able to do something like UPDATE table SET uuid = CAST(uuid AS BLOB); after the import. If they're a textual representation like 123e4567-e89b-12d3-a456-426655440000 you could write a user-defined function to do the conversion, and using it with a similar post-import UPDATE.

SQLite is unable to import BLOBs from CSV.
The solution is to convert CSV to an SQL-statement file and execute it:
sqlite3 database.db < database.sql
If you pipe from an application, 100000-row chunks are the most optimal amount per process instance.
If you try to pipe many gigabytes at once, sqlite3 will crash with an Out of memory error.

Related

loading a UCS-2LE file in Netezza

I have multiple 30GB/1billion records files which I need to load into Netezza. I am connecting using pyodbc and running the following commands.
create temp table tbl1(id bigint, dt varchar(12), ctype varchar(20), name varchar(100)) distribute on (id)
insert into tbl1
select * from external 'C:\projects\tmp.CSV'
using (RemoteSource 'ODBC' Delimiter '|' SkipRows 1 MaxErrors 10 QuotedValue DOUBLE)
Here's a snippet from the nzlog file
Found bad records
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic,
"text consumed"[last char examined]
----------------------------------------------------------------------------------------
1: 2(0) [1, INT8] contents of field, ""[0x00<NUL>]
2: 3(0) [1, INT8] contents of field, ""[0x00<NUL>]
and the nzbad file has "NUL" between every character.
I created a new file with the first 2million rows. Then I ran iconv on it
iconv -f UCS-2LE -t UTF-8 tmp.CSV > tmp_utf.CSV
The new file loads perfectly with no errors using the same commands. Is there any way for me to load the files without the iconv transformation? It is taking a really long time to run iconv.
UCS-2LE is not supported by Netezza, i hope for your sake that UTF-8 is enough for the data you have (no ancient languages or the like ?)
You need to focus on doing the conversion faster by:
searching the internet for a more cpu efficient implementation than/of iconv
Convert multiple files in parallel at a time (same as your number of CPU-cores minus one Is probably max). You may need to split the original files before you do it. The netezza loader prefers relatively large files though, so you may want to put them back together while loading for extra speed in that step :)

How to read in more than 250,000 characters XML CLOB field from Oracle into R or SAS?

I need to read in this XML COLB column from Oracle table. I tried the simple read in like below:
xmlbefore <- dbGetQuery(conn, "select ID, XML_TXT from XML_table")
But I can only read in about 225,000 characters. When I compare with the sample XML file, it only read in maybe 2/3 or 3/4 of the entire field.
I assume R has limitation of maybe 225,000 characters and SAS has even less, like about only 1000 Characters.
How can I read in the entire field with all characters (I think it is about 250,000-270,000)?
SAS dataset variables have a 32k char limit, macro variables 64k. LUA variables in SAS however have no limit (other than memory) so you can read your entire XML file into a single variable in one go.
PROC LUA is available from SAS 9.4M3 (check &sysvlong for details). If you have an earlier version of SAS, you can still process your XML by parsing it a single character at a time (RECFM=N).

Hi, I need to know how to insert a large amount of text into SQLite because I create an application for a book with Ionic

"i am developing a mobile application with Ionic for a book that has a lot of content, grouping in
several parts, subpart, title and chapter. I chose to use SQLite, and I'm looking for a way to
load all its contents into my SQLite database and if anyone has an idea or suggestion to help me
do things well I'll be delighted."
There are several ways to import a text file, e.g. as a BLOB, on a line-to-row basis, using sqlite's "archive" mode, and so on.
If you want the entire file as a single TEXT cell in a table, then (subject to the limitation described below) you can use .import by carefully selecting values for the separators.
Multibyte separators are not supported when importing, but control characters such as Control-A and Control-B can be used.
So you could proceed like so:
CREATE TABLE text("text" text)';
.separator ^A ^B
.import text.txt text
where ^A should be replaced by a literal control-A, and similarly for ^B.
Limitation
The maximum number of bytes in a string or BLOB in SQLite is defined by the preprocessor macro SQLITE_MAX_LENGTH. The default value of this macro is 1 billion (1 thousand million or 1,000,000,000).
The following illustrates how to use the python sqlite3 module to import a (UTF-8) text file:
import sqlite3
def create_table():
conn.execute("""CREATE TABLE IF NOT EXISTS mytext( mytext TEXT );""")
def insert_file(filename):
sql = "INSERT INTO mytext(mytext) VALUES(?)"
cursor.execute(sql, (open(filename, "r").read(),))
conn.commit()
db_file_name = 'text-file.db'
conn = sqlite3.connect(db_file_name)
cursor = conn.cursor()
create_table()
insert_file("text.txt")
conn.close()
(Tested with python3.)

Easiest way to recast variables in SQLite

New to databasing, so please let me know if I'm going about this entirely wrong.
I want to use databases to store large datasets (I use R to analyze data, which cannot load datasets larger than available RAM) and and I'm using SQLite-Manager in FireFox to import .csv files. 99% of the time I use reals, but would like to avoid all the clicking to manually cast each of 100 columns as REAL (the default in SQLite-Manager is TEXT).
Is there a way I can I can quickly/easily cast all columns as REAL? Thanks!
why don't you make a script to be interpreted by the SQLite shell?
Run sqlite my_db < script.txt with contents of scripts.txt following:
CREATE TABLE foo(
col1 REAL,
col2 REAL,
[...] generate those lines with a decent text editor
);
.separator ;
.import 'my/csv/file.csv' foo
.q
Note that dot-commands of the SQLite shell are available using “.help”. Imports are rudimentary and won't work if you have double quotes (remove them). Only the , is interpreted as a separator, you cannot escape it. If needed you can use a multicharacter separator.
Also be sure that file.csv is UTF8-encoded.

Fixing Unicode Byte Sequences

Sometimes when copying stuff into PostgreSQL I get errors that there's invalid byte sequences.
Is there an easy way using either vim or other utilities to detect byte sequences that cause errors such as: invalid invalid byte sequence for encoding "UTF8": 0xde70 and whatnot, and possibly and easy way to do a conversion?
Edit:
What my workflow is:
Dumped sqlite3 database (from trac)
Trying to replay it in postgresql
Perhaps there's an easier way?
More Edit:
Also tried these:
Running enca to detect encoding of the file
Told me it was ASCII
Tried iconv to convert from ASCII to UTF8. Got an error
What did work is deleting the couple erroneous lines that it complained about. But that didn't really solve the real problem.
Based on one short sentence, it sounds like you have text in one encoding (e.g. ANSI/ASCII) and you are telling PostgreSQL that it's actually in another encoding (Unicode UTF8). All the different tools you would be using: PostgreSQL, Bash, some programming language, another programming language, other data from somewhere else, the text editor, the IDE, etc., all have default encodings which may be different, and some step of the way, the proper conversions are not being done. I would check the flow of data where it crosses these kinds of boundaries, to ensure that either the encodings line up, or the encodings are properly detected and the text is properly converted.
If you know the encoding of the dump file, you can convert it to utf-8 by using recode. For example, if it is encoded in latin-1:
recode latin-1..utf-8 < dump_file > new_dump_file
If you are not sure about the encoding, you should see how sqlite was configured, or maybe try some trial-and-error.
I figured it out. It wasn't really an encoding issue.
SQLite's output escaped strings differently than Postgres expects. There were some cases where 'asdf\xd\foo' was outputted. I believe the '\x' was causing it to expect the following characters to be unicode encoding.
Solution to this is dumping each table individually in CSV mode in sqlite 3.
First
sqlite3 db/trac.db .schema | psql
Now, this does the trick for the most part to copy the data back in
for table in `sqlite3 db/trac.db .schema | grep TABLE | sed 's/.*TABLE \(.*\) (/\1/'`
do
echo ".mode csv\nselect * from $table;" | sqlite3 db/trac.db | psql -c "copy $table from stdin with csv"
done
Yeah, kind of a hack, but it works.

Resources