Easiest way to recast variables in SQLite - sqlite

New to databasing, so please let me know if I'm going about this entirely wrong.
I want to use databases to store large datasets (I use R to analyze data, which cannot load datasets larger than available RAM) and and I'm using SQLite-Manager in FireFox to import .csv files. 99% of the time I use reals, but would like to avoid all the clicking to manually cast each of 100 columns as REAL (the default in SQLite-Manager is TEXT).
Is there a way I can I can quickly/easily cast all columns as REAL? Thanks!

why don't you make a script to be interpreted by the SQLite shell?
Run sqlite my_db < script.txt with contents of scripts.txt following:
CREATE TABLE foo(
col1 REAL,
col2 REAL,
[...] generate those lines with a decent text editor
);
.separator ;
.import 'my/csv/file.csv' foo
.q
Note that dot-commands of the SQLite shell are available using “.help”. Imports are rudimentary and won't work if you have double quotes (remove them). Only the , is interpreted as a separator, you cannot escape it. If needed you can use a multicharacter separator.
Also be sure that file.csv is UTF8-encoded.

Related

loading a UCS-2LE file in Netezza

I have multiple 30GB/1billion records files which I need to load into Netezza. I am connecting using pyodbc and running the following commands.
create temp table tbl1(id bigint, dt varchar(12), ctype varchar(20), name varchar(100)) distribute on (id)
insert into tbl1
select * from external 'C:\projects\tmp.CSV'
using (RemoteSource 'ODBC' Delimiter '|' SkipRows 1 MaxErrors 10 QuotedValue DOUBLE)
Here's a snippet from the nzlog file
Found bad records
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic,
"text consumed"[last char examined]
----------------------------------------------------------------------------------------
1: 2(0) [1, INT8] contents of field, ""[0x00<NUL>]
2: 3(0) [1, INT8] contents of field, ""[0x00<NUL>]
and the nzbad file has "NUL" between every character.
I created a new file with the first 2million rows. Then I ran iconv on it
iconv -f UCS-2LE -t UTF-8 tmp.CSV > tmp_utf.CSV
The new file loads perfectly with no errors using the same commands. Is there any way for me to load the files without the iconv transformation? It is taking a really long time to run iconv.
UCS-2LE is not supported by Netezza, i hope for your sake that UTF-8 is enough for the data you have (no ancient languages or the like ?)
You need to focus on doing the conversion faster by:
searching the internet for a more cpu efficient implementation than/of iconv
Convert multiple files in parallel at a time (same as your number of CPU-cores minus one Is probably max). You may need to split the original files before you do it. The netezza loader prefers relatively large files though, so you may want to put them back together while loading for extra speed in that step :)

teradata : to calulate cast as length of column

I need to use cast function with length of column in teradata.
say I have a table with following data ,
id | name
1|dhawal
2|bhaskar
I need to use cast operation something like
select cast(name as CHAR(<length of column>) from table
how can i do that?
thanks
Dhawal
You have to find the length by looking at the table definition - either manually (show table) or by writing dynamic SQL that queries dbc.ColumnsV.
update
You can find the maximum length of the actual data using
select max(length(cast(... as varchar(<large enough value>))) from TABLE
But if this is for FastExport I think casting as varchar(large-enough-value) and postprocessing to remove the 2-byte length info FastExport includes is a better solution (since exporting a CHAR() will results in a fixed-length output file with lots of spaces in it).
You may know this already, but just in case: Teradata usually recommends switching to TPT instead of the legacy fexp.

Exporting SAS DataSet on to UNIX as a text file....with delimiter '~|~'

I'm trying to export a SAS data set on to UNIX folder as a text file with delimiter as '~|~'.
Here is the code I'm using....
PROC EXPORT DATA=Exp_TXT
OUTFILE="/fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt"
DBMS=DLM REPLACE;
DELIMITER="~|~";
PUTNAMES=YES;
RUN;
Here is the output I'm getting on UNIX.....Missing part of delimiter in the data but getting whole delimiter in variable names....
Num~|~Name~|~Age
1~A~10
2~B~11
3~C~12
Any idea why I'm getting part of delimiter in the data only????
Thanks,
Sam.
My guess is that PROC EXPORT does not support using multiple character delimiters. Normally, column delimiters are just a single character. So, you will probably need to write your own code to do this.
PROC EXPORT for delimited files generates plain old SAS code that is then executed. You should see the code in the SAS log, from where you can grab it and alter it as needed.
Please see my answer to this other question for a SAS macro that might help you. You cannot use it exactly as written, but it should help you create a version that meets your needs.
The problem is referenced on the SAS manual page for the FILE statement
http://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n15o12lpyoe4gfn1y1vcp6xs6966.htm
Restriction:Even though a character string or character variable is accepted, only the first character of the string or variable is used as the output delimiter. The FILE DLM= processing differs from INFILE DELIMITER= processing.
However, there is (as of some version, anyhow) a new statement, DLMSTR. Unfortunately you can't use DLMSTR in PROC EXPORT, but if you can't easily write the variables out, you can generate the log from a PROC EXPORT and paste it into your program and modify DELIMITER to DLMSTR. You could even dynamically do so - use PROC PRINTTO to generate a file with the log, then read in that file, parse out the line numbers and the non-code, change DELIMITER to DLMSTR, and %include the code.
Since you are using unix, why not make use of unix tools to fix this?
You can call the unix command from your sas program with the X statement:
http://support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/viewer.htm#xcomm.htm
after your export, use sed to fix the file
PROC EXPORT DATA=Exp_TXT
OUTFILE="/fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt"
DBMS=DLM REPLACE;
DELIMITER="~";
PUTNAMES=YES;
RUN;
X sed 's/~/~|~/g' /fbrms01/dev/projects/tadis003/Export_txt_OF_New.txt > /fbrms01/dev/projects/tadis003/Export_txt_OF_New_v2.txt ;
It might take tweaking depending on your unix, but this works on AIX. Some versions of sed can use the -i flag to edit in place so you don't have to type out the filename twice.
It is a much simpler and easier single-line solution than a big macro.

creating text files from sqlite data

I have started to use (or, try to use) sqlite for a simple catalog. What I want to do is be able to take out the information for each catalogued item from the sqlite, and export it into a text file.
e.g.
Title1, Genre1, Author1
Title2, Genre2, Author2
Title3, Genre3, Author3
I don't want these to be in columns, just a single line. Also, is there a way to use multiple different separators?
This seems like it should be relatively easy to do, but I am totally new to this and can't figure it out.
sqlite -list -separator ', ' db.db 'select * from thetable'
should do.
Like #hacker said, you can use sqlite -list when you need to specify the delimiter character. You can also use simple output such as sqlite -csv if you don't need to be terribly specific.
You probably want to refer to the SQLite command line documentation for more information about generating textual output from an interactive SQLite command, or you could man sqlite too, although it's similar information.

Fixing Unicode Byte Sequences

Sometimes when copying stuff into PostgreSQL I get errors that there's invalid byte sequences.
Is there an easy way using either vim or other utilities to detect byte sequences that cause errors such as: invalid invalid byte sequence for encoding "UTF8": 0xde70 and whatnot, and possibly and easy way to do a conversion?
Edit:
What my workflow is:
Dumped sqlite3 database (from trac)
Trying to replay it in postgresql
Perhaps there's an easier way?
More Edit:
Also tried these:
Running enca to detect encoding of the file
Told me it was ASCII
Tried iconv to convert from ASCII to UTF8. Got an error
What did work is deleting the couple erroneous lines that it complained about. But that didn't really solve the real problem.
Based on one short sentence, it sounds like you have text in one encoding (e.g. ANSI/ASCII) and you are telling PostgreSQL that it's actually in another encoding (Unicode UTF8). All the different tools you would be using: PostgreSQL, Bash, some programming language, another programming language, other data from somewhere else, the text editor, the IDE, etc., all have default encodings which may be different, and some step of the way, the proper conversions are not being done. I would check the flow of data where it crosses these kinds of boundaries, to ensure that either the encodings line up, or the encodings are properly detected and the text is properly converted.
If you know the encoding of the dump file, you can convert it to utf-8 by using recode. For example, if it is encoded in latin-1:
recode latin-1..utf-8 < dump_file > new_dump_file
If you are not sure about the encoding, you should see how sqlite was configured, or maybe try some trial-and-error.
I figured it out. It wasn't really an encoding issue.
SQLite's output escaped strings differently than Postgres expects. There were some cases where 'asdf\xd\foo' was outputted. I believe the '\x' was causing it to expect the following characters to be unicode encoding.
Solution to this is dumping each table individually in CSV mode in sqlite 3.
First
sqlite3 db/trac.db .schema | psql
Now, this does the trick for the most part to copy the data back in
for table in `sqlite3 db/trac.db .schema | grep TABLE | sed 's/.*TABLE \(.*\) (/\1/'`
do
echo ".mode csv\nselect * from $table;" | sqlite3 db/trac.db | psql -c "copy $table from stdin with csv"
done
Yeah, kind of a hack, but it works.

Resources