SQLite: Stop insertion on: expected n columns but found m - sqlite

I'm using the .import command to insert a large csv into an SQLite db. Unfortunately, some of the lines contain the delimeter inside the value for the field. For example, each line is in the form:
id, title, first name, last name, location
but some lines have values like:
1, Mr, Bob, Saget, Sydney, Australia
and the comma in the location field causes an expected 5 columns but found 6 error. Luckily, it's not important to me to insert every line, so I'd like to just not insert any lines that raise this error. Is this possible?
Parsing the csv with regex is a last resort as the file is very large and could take minutes every time I need to import it.

Related

Error in loading a data to neo4j in case statement in cypher query

try to upload CSV data with a case statement in the query, but the following error appears:
cypher:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH(a:test_t{tid:line.pid})
CASE
WHEN line.key !='NA' THEN
WITH split(line.key,",") as name
UNWIND name as x
MERGE(k:test_key{key_term:toLower(x)})
MERGE(a)-[:contains]->(k)
END
Error
Neo.ClientError.Statement.SyntaxError: Invalid input 'S': expected 'l/L' (line 5, column 3 (offset: 137))
"CASE"
Can anyone help me?
The CASE clause does not support embedding other Cypher clauses (but it can invoke functions). In fact, a CASE clause is not actually needed for your use case.
This query should work (the :auto at the beginning is needed in neo4j 4.0+):
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line FIELDTERMINATOR ';'
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
UNWIND split(line.key, ',') as x
MERGE (k:test_key {key_term: TOLOWER(x)})
MERGE (a)-[:contains]->(k)
This query filters out all unwanted lines as soon as they are obtained from the file. Reducing the number of rows of data being worked on as early as possible is good practice.
Also, you have a second issue. Your data file cannot use the comma as both the (default) field terminator AND as the delimiter between your x values.
To resolve this ambiguity, the above query chose to use the FIELDTERMINATOR ';' option to specify that the ";" character will be used as the field terminator. A sample data file would look like this:
pid;key
123;NA
234;Foo,Bar
345;Bar,Baz
456;NA
567;Baz
You are using the CASE incorrectly. You cannot have update clauses inside of a CASE statement. Instead you can use a WHERE clause to filter the rows of the file. For instance, adding WHERE line.key != 'NA' while processing the file befor you move onto the update will work. Something like this should fit the bill.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH (a:test_t {tid: line.pid})
WITH line
WHERE line.key <> 'NA'
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)
It looks like,from your logic you could even move the test up above the MATCH. So this might be better (fewer unecessary matches).
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

sqlloader: how to use substr in when-clause correctly?

I'm having a problem that I thought to be rather common, but trying to look it up in the "Oracle Database 10g2 Utilities_b14215.pdf" didn't help. After that I've surfed through numerous threads but no luck so far.
I'm having a tab-delimited file (x'09') e. g. name, userid, persnr. The values for the userids begin with either P, R or T e. g. P2198, P2199, R7288, T1229.
I want to load only the records with userids beginning with P.
Isolating a single record with a controlfile like this works splendidly:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN USERID = 'P2198'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
But every attempt at using SUBSTR in the when-clause fails.
This:
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN SUBSTR(USERID, 1, 1) = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-350: Syntax-Error.
This
OPTIONS (SKIP=1)
LOAD DATA
INFILE UserlistLoader.dat
APPEND
INTO TABLE Z_USERLIST
WHEN "SUBSTR(:USERID, 1, 1)" = 'P'
FIELDS TERMINATED BY x'09'
TRAILING NULLCOLS
(name, userid, persnr)
ends in an SQL*Loader-403: Referenced column USERID not present in table Z_USERLIST.
But IT IS PRESENT - as the first example proves. I've found that the column should be preceded by : but that obviously isn't the issue.
What am I doing wrong?
From SQL Loader docs the left-hand side of a WHEN condition can only be a full field name e.g. USERID or a position spec e.g. (3:5).
The docs aren't very clear though on what is allowed - e.g. can LIKE be used as the operator?
USERID LIKE 'P%'
I strongly suspect it can't though.
I would load the entire file into a staging table that matches the file layout, then run a procedure that inserts the rows you want from there into the production table. That is a more common way to handle loads with criteria like this without having to edit source data.
If you can preprocess the source file, move the userid to the first field or copy the first letter of the userid to it's own field and construct the WHEN like this so sqlldr looks at the first position (this will cause sqlldr to return non-zero though, as not all rows meet WHEN clause criteria):
WHEN (1) = 'P'

BizTalk Varying Length Flat File using Single Schema for Transform

I have a pipe delimited .txt Flat File that I'm using to do bulk insert to SQL. Everything works well for straight one to one. However, the Flat File now contains 2 new fields that can repeat an unknown number of times.
Is there a way to create a single flat file schema where I can have an unbounded child within the main unbounded child? I think the place I'm getting tripped up is how to make the ChildRoot listed below just a "group heading" like Root is where ChildRoot doesn't correspond to a location in the flat file. How do I insert something like that?
Schema:
-Roots
--Root (unbounded)
---ChildID
---ChildName
Roots gets a direct link to my sql stored procedure to do a bulk insert on as many "Root" rows that come in.
Now I have:
Schema:
-Roots
--Root (unbounded)
---Child
---ChildName
---ChildRoot (unbounded)
----ChildRootID
----ChildRootName
**EDIT
I should also add that ChildRootID & ChildRootName can repeat an indefinite number of times until the row delimiter (carriage return) is found

Integer zero, "0' will be ignored when upload to SQL Server

i have a page that allow user to upload an excel file and insert the data in excel file to the SQL Server. Now i have a small issue that, there is a column in excel file with values, such as "001", "029", "236". When it's insert to the SQL Server, the zero in front will be ignored in SQL, so the data would become "1", "29", "239". The data type for the column in SQL is varchar(10). How to solve this?
Excel seems to automatically convert cell values to numbers. Try prefixing the cell contents with a single quote in the Excel sheet prior to processing. Eg '001. If you can't trust the users to do that, use a string formatting routine to left pad the numbers with zeroes.
Something must be converting the data in the excel cell from a string to an integer. How are you performing the insert?
If a user enters 001 into Excel, it will be converted to the number 1.
If the user enters '001 into Excel, it will be saved in the cell as text.
If the cell is pre-formatted with the number format "#", then when the user enters 001 into the cell it will be entered as the text "001". The "#" number format tells Excel that the cell is a text cell and any entry (whether it looks like a number, date, time, fraction, etc...) should simply be placed in the cell as is - as a text cell.
Can you tell your users to pre-format this column with "#"? This is generally the most reliable way to handle this since the user does not have to remember to enter '001.
Maybe setting up the datatype "Text" for an Excel cell will help.
Excel is probably the culprit here. Try converting your file to CSV and see how it comes out. If the leading zeros are gone in the new CSV file, Excel is the problem.
Excel always does this, and its a nuissance. There are three workarounds I know of:
BEFORE entering the data in any cell in Excel format the cell as text (you can do a whole column if needed.) This only works if you control the spreadsheets and users, which is basically never :-).
Assume you'll get a mix of numbers and/or text in the Excel data, and fix it in Excel before import: add a column to the spreadsheet and use the TEXT() function to convert the number to text, as in =TEXT(A2, "000"); fill down. Also assumes you can edit the worksheet.
Assume you have to fix the numbers upon insert in your code. Depending on how you are loading the data, that could happen in T-SQL or in your other code. In TSQL this expression works to pad with zeros to a width of 3 characters: right( '000' + cast ( 2 as varchar(3) ), 3 )

Resources