SQLite3 Import CSV & exclude/skip header - sqlite

I'm trying to get my data files (of which there are a dozen or so) into tables within SQLite. Each file has a header and I'll be receiving them a few times over the coming year so I'd like to:
Avoid editing each file to remove the header when I receive them;
Avoid falling back on shell scripts or Python to do this.
I define my table and import data...
> .separator "\t"
> .headers on
> CREATE TABLE clinical(
patid VARCHAR(20),
eventdate CHAR(10),
sysdate CHAR(10),
constype INT,
consid INT,
medcode INT,
staffid VARCHAR(20),
textid INT,
episode INT,
enttype INT,
adid INT);
> .import "Sample_Clinical001.txt" clinical
> SELECT * FROM clinical LIMIT 10;
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
471001 30/01/1997 09/03/1997 4 68093 180 0 0 0 20 11484
471001 30/01/1997 09/03/1997 2 68093 60 0 0 0 4 11485
My first thought was to DELETE the offending row, but that didn't work as expected, instead it deleted the whole table...
> DELETE FROM clinical WHERE patid = "patid";
> SELECT * FROM clinical LIMIT 3;
>
Did I get the syntax for testing equality wrong? I'm not sure; the docs don't seem to distinguish between the two. I thought I'd try again ...
> .import "Sample_Clinical001.txt" clinical
> SELECT * FROM clinical LIMIT 3;
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
471001 30/01/1997 09/03/1997 4 68093 180 0 0 0 20 11484
471001 30/01/1997 09/03/1997 2 68093 60 0 0 0 4 11485
> DELETE FROM clinical WHERE patid == "patid";
> SELECT * FROM clinical LIMIT 3;
>
Am I even on the correct track here or am I doing something stupid?
I would have expected there to be an easy option to skip the header row when calling .import as having header rows in text files is a fairly common situation.

patid is a column name.
"patid" is a quoted column name.
'patid' is a string.
The condition WHERE patid = "patid" compares the value in the patid column with itself.
(SQLite allows strings with double quotes for compatibility with MySQL, but only where a string cannot be confused with a table/column name.)

This worked for me:
.read schema.sql
.mode csv
.import --skip 1 artist_t.csv artist_t
or if you just have one file to import, you can do it like this:
.import --csv --skip 1 artist_t.csv artist_t
https://sqlite.org/cli.html#importing_csv_files

A alternative response to #steven-penny
You can also use a bash command during sqlite import
.mode csv
.import '| tail -n +2 artist_t.csv' artist_t

import the csv to a new table and copy the new table's data to original target table, will that work?

Related

Error with csv file while importing it into sqlite3

So I am trying to import a csv file into a table with sqlite3 and I get an error, this is the table schema and the command I use to import the file and the error it appears in order:
CREATE TABLE Artist(
Art_ID int not null primary key,
Art_Name varchar(30),
Followers int,
Art_Genres varchar(200),
NumAlbums int,
YearFirstAlbum int,
Gender char(1),
Group_Solo varchar(5)
);
sqlite>
sqlite> .import '| tail -n +2 /Users/adrianogiunta/Desktop/artistDF.csv' Artist
<pipe>:1: expected 8 columns but found 1 - filling the rest with NULL
<pipe>:2: expected 8 columns but found 1 - filling the rest with NULL
<pipe>:3: expected 8 columns but found 1 - filling the rest with NULL
<pipe>:4: expected 8 columns but found 1 - filling the rest with NULL
<pipe>:5: expected 8 columns but found 1 - filling the rest with NULL
<pipe>:6: expected 8 columns but found 1 - filling the rest with NULL
and this for the rest of the 1035 rows.
These are the first lines of the csv file:
X,Artist,Followers,Genres,NumAlbums,YearFirstAlbum,Gender,Group.Solo
0,Ed Sheeran,52698756,"pop,uk pop",8,2011,M,Solo
1,Justin Bieber,30711450,"canadian pop,dance pop,pop,post-teen pop",10,2009,M,Solo
2,Jonas Brothers,3069527,"boy band,dance pop,pop,post-teen pop",10,2006,M,Group
3,Drake,41420478,"canadian hip hop,canadian pop,hip hop,pop rap,rap,toronto rap",11,2010,M,Solo
4,Chris Brown,9676862,"dance pop,pop,pop rap,r&b,rap",6,2005,M,Solo
5,Taylor Swift,23709128,"dance pop,pop,post-teen pop",10,2006,F,Solo
This is what my table shows afterward:
sqlite> SELECT * FROM Artist LIMIT 5;
0,Ed Sheeran,52698756,"pop,uk pop",8,2011,M,Solo|||||||
1,Justin Bieber,30711450,"canadian pop,dance pop,pop,post-teen pop",10,2009,M,Solo|||||||
2,Jonas Brothers,3069527,"boy band,dance pop,pop,post-teen pop",10,2006,M,Group|||||||
3,Drake,41420478,"canadian hip hop,canadian pop,hip hop,pop rap,rap,toronto rap",11,2010,M,Solo|||||||
4,Chris Brown,9676862,"dance pop,pop,pop rap,r&b,rap",6,2005,M,Solo|||||||
sqlite>
Thanks in advance for the help!
The .import command by default looks for SQLite's dump format. You need to enter .mode csv before trying to import a CSV. As the documentation says.

how can you add multiple rows in sqlcl?

i am trying to add multiple rows in my table. i tried to follow some of the online solutions but i keep getting ORA-00933: SQL command not properly ended.
how do i add multiple rows at once.
insert into driver_detail values(1003,'sajuman','77f8s0990',1),
(1004,'babu ram coi','2g64s8877',8);
INSERT ALL is one way to go.
SQL> create table driver_detail (id integer, text1 varchar2(20), text2 varchar2(20), some_num integer);
Table DRIVER_DETAIL created.
SQL> insert all
2 into driver_detail (id, text1, text2, some_num) values (1003, 'sajuman', '77f8s0090', 1)
3 into driver_detail (id, text1, text2, some_num) values (1004, 'babu ram coi', '2g64s887', 8)
4* select * from dual;
2 rows inserted.
SQL> commit;
Commit complete.
SQL> select * from driver_detail;
ID TEXT1 TEXT2 SOME_NUM
_______ _______________ ____________ ___________
1003 sajuman 77f8s0090 1
1004 babu ram coi 2g64s887 8
But SQLcl is a modern CLI for the Oracle Database, surely there might be a better way?
Yes.
Put your rows into a CSV.
Use the LOAD command.
SQL> delete from driver_detail;
0 rows deleted.
SQL> help load
LOAD
-----
Loads a comma separated value (csv) file into a table.
The first row of the file must be a header row. The columns in the header row must match the columns defined on the table.
The columns must be delimited by a comma and may optionally be enclosed in double quotes.
Lines can be terminated with standard line terminators for windows, unix or mac.
File must be encoded UTF8.
The load is processed with 50 rows per batch.
If AUTOCOMMIT is set in SQLCL, a commit is done every 10 batches.
The load is terminated if more than 50 errors are found.
LOAD [schema.]table_name[#db_link] file_name
SQL> load hr.driver_detail /Users/thatjeffsmith/load_example.csv
--Number of rows processed: 4
--Number of rows in error: 0
0 - SUCCESS: Load processed without errors
SQL> select * from driver_detail;
ID TEXT1 TEXT2 SOME_NUM
_______ _________________ ______________ ___________
1003 'sajuman' '77f8s0990' 1
1004 'babu ram coi' '2g64s8877' 8
1 'hello' 'there' 2
2 'nice to' 'meet you' 3
SQL>

sqlite join select will necessary fully scan a table?

here is my case:
CREATE TABLE estimateperiod(estimatePeriodId int, periodTypeId int, companyId int, fiscalChainSeriesId int, fiscalQuarter int,fiscalYear int, calendarQuarter int, calendarYear int, periodEndDate datetime,advanceDate datetime);
CREATE INDEX estimateperiod_estimateperiodid_companyid on estimateperiod(estimateperiodid, companyid);
CREATE TABLE isinenhancedsymbol(symbolid int, symboltypeid int, symbolvalue char(64), relatedcompanyid char(64), exchangeid int, objectid int, symbolstartdate date, symbolenddate date, activeflag int);
CREATE INDEX isinenhancedsymbol_relatedcompanyid_isin on isinenhancedsymbol(relatedcompanyid, symbolvalue);
when I run this:
sqlite> explain query plan **select ep.estimateperiodid, ep.companyid , isin.symbolvalue from estimateperiod ep, isinenhancedsymbol isin where ep.estimateperiodid = 100 and ep.companyid = isin.relatedcompanyid;**
orde from deta
---- ------------- ----
0 1 TABLE isinenhancedsymbol AS isin
1 0 TABLE estimateperiod AS ep WITH INDEX estimateperiod_estimateperiodid_companyid
So, isinenhancedsymbol table is fully scanned, this cost long time. All fields in select are in covering index, why isinenhancedsymbol cannot be searched using the index?
SQLite version 3.6.20 is quite a few years out of date.
Covering indexes are supported with any somewhat current version:
sqlite> .eqp on
sqlite> select ep.estimateperiodid, ep.companyid , isin.symbolvalue from estimateperiod ep, isinenhancedsymbol isin where ep.estimateperiodid = 100 and ep.companyid = isin.relatedcompanyid;
--EQP-- 0,0,0,SEARCH TABLE estimateperiod AS ep USING COVERING INDEX estimateperiod_estimateperiodid_companyid (estimatePeriodId=?)
--EQP-- 0,1,1,SCAN TABLE isinenhancedsymbol AS isin USING COVERING INDEX isinenhancedsymbol_relatedcompanyid_isin

Trouble with Sqlite subquery

My CustomTags table may have a series of "temporary" records where Tag_ID is 0, and Tag_Number will have some five digit value.
Periodically, I want to clean up my Sqlite table to remove these temporary values.
For example, I might have:
Tag_ID Tag_Number
0 12345
0 67890
0 45678
1 12345
2 67890
In this case, I want to remove the first two records because they are duplicated with actual Tag_ID 1 and 2. But I don't want to remove the third record yet because it hasn't been duplicated yet.
I have tried a number of different types of subqueries, but I just can't get it working. This is the last thing I tried, but my database client complains of an unknown syntax error. (I have tried with and without AS as an alias)
DELETE FROM CustomTags t1
WHERE t1.Tag_ID = 0
AND (SELECT COUNT(*) FROM CustomTags t2 WHERE t1.Tag_Number = t2.Tag_Number) > 1
Can anyone offer some insight? Thank you
There are many options, but the simplest are probably to use EXISTS;
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND EXISTS(
SELECT 1 FROM CustomTags c
WHERE c.Tag_ID <> 0 AND c.Tag_Number = CustomTags.Tag_Number
)
An SQLfiddle to test with.
...or NOT IN...
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND Tag_Number IN (
SELECT Tag_Number FROM CustomTags WHERE Tag_ID <> 0
)
Another SQLfiddle.
With your dataset like so:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
0 12345
0 67890
0 45678
You can run:
delete from test
where rowid not in (
select a.rowid
from test a
inner join (select tag_number, max(tag_id) as mt from test group by tag_number) b
on a.tag_number = b.tag_number
and a.tag_id = b.mt
);
Result:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
Please do test this out with a few more test cases than you have to be entirely sure that's what you want. I'd recommend creating a copy of your database before you run this on a large dataset.

SQLite date compare very strange

I store dates as String in my database, in this format:
YYYY-MM-DD HH:MM
and in my db I have rows (all columns are strings):
COL1 | COL2
----------------------------------
'2012-06-21 18:53' | 'item1'
'2012-06-21 18:54' | 'item2'
'2012-06-21 18:55' | 'item3'
Now I want to compare these stored dates (well, strings), and this is very very strange:
this query
select *
from MyTable
where col1 > Datetime('2012-06-21 18:53')
returns 2 rows (all except first) - this is correct.
but this query
select *
from MyTable
where col1 >= Datetime('2012-06-21 18:53')
return also only 2 rows, but it should return all 3 rows, as I used >= instead of >.
What did I wrong?
sqlite> SELECT datetime('2012-06-21 18:53');
2012-06-21 18:53:00
datetime() returns a string in a different format than the fields of your database. You can use just the string for WHERE, e.g.
select *
from MyTable
where col1 >= '2012-06-21 18:53'

Resources