how to sanitize data in sqlite3 column - sqlite

I have a sqlite3 table with a column by the name Title, which stores the names of the some movies.
Table name - table1
Column name - Title
Examples data: "Casablanca" (1983) {The Cashier and the Belly Dancer (#1.4)}
I have another sqlite3 table with a column that stores movie titles.
Table name - table2
Column name - Title
Examples data: casa blanca
Both these tables were created using different datasets, and as such although the movie name is the same (casa blanca vs "Casablanca" (1983) {The Cashier and the Belly Dancer (#1.4)}), both are stored with extra text.
What i would like to do is to SANITIZE the already stored data in both the columns. By sanitization, I would like to strip the cell content of:
1. spaces
2. spl chars like !, ', ", comma, etc..
3. convert all to lower case
I hope with that atleast some level of matching can be had between both the columns.
My question is, how do i perform these sanitizations on data that is already stored in sqlite tables. I do not have an option to sanitize before loading, as i only have access to the loaded database.
I am using sqlite 3.7.13, and i am using sqlite manager as the gui.
Thank You.

This task is too specialized to be done in SQL only.
You should write simple Perl or Python script which will scan your table, read data row by row, scrub it to meet your requirements and write it back.
This is example in Perl:
use DBI;
my $dbh = DBI->connect("dbi:mysql:database=my.db");
# replace rowid with your primary key, but it should work as is:
my $sth = $dbh->prepare(qq{
SELECT rowid,*
FROM table1
});
while (my $row = $sth->fetchrow_hashref()) {
my $rowid = $row->{rowid};
my $title = $row->{title};
# sanitize title:
$title = lc($title); # convert to lowercase
$title =~ s/,//g; # remove commas
# do more sanitization as you wish
# ...
# write it back to database:
$dbh->do(
qq{
UPDATE table1
SET title = ?
WHERE rowid = ?
}, undef,
$title,
$rowid,
);
}
$sth->finish();
$dbh->disconnect();

Related

How to implement NOT EXISTS in OPEN QUERY statement in PROGRESS 4GL - OpenEdge 10.2A

I want to create a browse such that it will show all the records from one table if the values of a field do NOT exist in another table.
It is possible to get the records using SQL as:
SELECT myField FROM pub.myTable WHERE
NOT EXISTS (SELECT myField FROM pub.myTable2 WHERE myTable2.myField=myTable.myField)
It is also possible using 4GL as:
FOR EACH myTable WHERE
NOT CAN-FIND(FIRST myTable2 WHERE myTable2.myField=myTable.myField)
The problem is when I put this query in a browse as:
OPEN QUERY myBrowse
FOR EACH myTable WHERE
NOT CAN-FIND(FIRST myTable2 WHERE myTable2.myField=myTable.myField)
it gives an error message
CAN-FIND is invalid within an OPEN QUERY. (3541)
The question is, is it possible to write such an OPEN QUERY statement?
I didn't come up with this, Steve Moore shared it on https://community-archive.progress.com/forums/00026/27143.html
define temp-table ttNoOrder
field field1 as char.
create ttNoOrder.
define query q1 for Customer, Order, ttNoOrder.
open query q1 for each Customer no-lock,
first Order of Customer outer-join no-lock,
first ttNoOrder where not available(Order).
get first q1.
repeat while not query-off-end("q1"):
display Customer.CustNum Customer.Name available(Order).
get next q1.
end.
Works even with dynamic queries:
DEFINE TEMP-TABLE ttNoOrder
FIELD field1 AS CHARACTER .
DEFINE VARIABLE hQuery AS HANDLE NO-UNDO.
CREATE ttNoOrder.
CREATE QUERY hQuery .
hQuery:SET-BUFFERS (BUFFER Customer:HANDLE,
BUFFER Order:HANDLE,
BUFFER ttNoOrder:HANDLE) .
hQuery:QUERY-PREPARE ("for each Customer no-lock, ~
first Order of Customer outer-join no-lock, ~
first ttNoOrder where not available(Order)") .
hQuery:QUERY-OPEN() .
hQuery:GET-FIRST () .
REPEAT WHILE NOT hQuery:QUERY-OFF-END:
DISPLAY Customer.CustNum FORMAT ">>>>>>>>>9" Customer.Name AVAILABLE(Order).
hQuery:GET-NEXT ().
END.

Combining tables in SQLITE while directly replacing values from another table

I have a sqlite db with 4 tables. I now want to retrieve the contents of one of them and replace the contents of one of its rows with the respective content of one of the other tables and then return the result.
E.g.
Table 1:
Row0, Row1, Row2, ..., Row20
'some numbers' 'some numbers' 'name string' .... 'some numbers'
Table 2:
Row0, Row1
'name string' 'name (cleaned and formatted)'
So every time I access a row in table 1 instead of returning the value for Table1.Row2 it should return my the corresponding value of Table2.Row1 if there is no such value then just use the existing one from Table1.Row2
Just using
select * from table1 inner join row2 on table1.row2 = table2.row0 doesn't work.
If this question has already been answered, I'm happy to head over to the post, yet I have no clue on how to search for my problem, which is why I'm writing it here.
Thanks
Liz

Strange SQLite behavior: Not returning results on simple queries

Ok, so I have a basic table called "ledger", it contains fields of various types, integers, varchar, etc.
In my program, I used to use a query with no "from" predicate to collect all of the rows, which of course works fine. But... I changed my code to allow selecting one row at a time using "where acctno = x" (where X is the account number I want to select at the time).
I thought this must be a bug in the client library for my programming language, so I tested it in the SQLite command-line client - and it still doesn't work!
I am relatively new to SQLite, but I have been using Oracle, MS SQL Server, etc. for years and never seen this type of issue before.
Other things I can tell you:
* Queries using other integer fields also don't work
* Queries on char fields work
* Querying it as a string (with the account number on quotes) still doesn't work. (I thought maybe the numbers were stored as a string inadvertently).
* Accessing rows by rowid works fine - which is why I can edit the database with GUI tools with no noticeable problem.
Examples:
Query with no WHERE (works fine):
1|0|0|JPY|8|Paid-In Capital|C|X|0|X|0|0||||0|0|0|
0|0|0|JPY|11|Root Account|P|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
3|0|0|JPY|13|Mitsubishi Bank Futsuu|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
4|0|0|JPY|14|Japan Post Bank|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
...
Query with WHERE clause: (no results)
sqlite> select * from ledger where acctno=1;
sqlite>
putting quotes around the 1 above changes nothing.
Interestingly enough, "select * from ledger where acctno > 1" returns results! However since it returns ALL results, it's not terrible useful.
I'm sure someone will ask about the table structure, so here goes:
sqlite> .schema ledger
CREATE TABLE "LEDGER" (
"ACCTNO" integer(10,0) NOT NULL,
"drbal" integer(20,0) NOT NULL,
"crbal" integer(20,0) NOT NULL,
"CURRKEY" char(3,0) NOT NULL,
"TEXTKEY" integer(10,0),
"TEXT" VARCHAR(64,0),
"ACCTYPECD" CHAR(1,0) NOT NULL,
"ACCSTCD" CHAR(1,0),
"PACCTNO" number(10,0) NOT NULL,
"CATCD" number(10,0),
"TRANSNO" number(10,0) NOT NULL,
"extrefno" number(10,0),
"UPDATEUSER" VARCHAR(32,0),
"UPDATEDATE" text(8,0),
"UPDATETIME" TEXT(6,0),
"PAYEECD" number(10,0) NOT NULL,
"drbal2" number(10,0) NOT NULL,
"crbal2" number(10,0) NOT NULL,
"delind" boolean,
PRIMARY KEY("ACCTNO"),
CONSTRAINT "fk_curr" FOREIGN KEY ("CURRKEY") REFERENCES "CURRENCY" ("CUR
RKEY") ON DELETE RESTRICT ON UPDATE CASCADE
);
The strangest thing is that I have other similar tables where this works fine!
sqlite> select * from journalhdr where transno=13;
13|Test transaction ATM Withdrawel 20130213|20130223||20130223||
TransNo in that table is also integer (10,0) NOT NULL - this is what makes me thing it is something to do with the values.
Another clue is that the sort order seems to be based on ascii, not numeric:
sqlite> select * from ledger order by acctno;
0|0|0|JPY|11|Root Account|P|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
1|0|0|JPY|8|Paid-In Capital|C|X|0|X|0|0||||0|0|0|
10|0|0|USD|20|Sallie Mae|L|X|0|X|0|0|SYSTEM|20121209|153900|0|0|0|
21|0|0|USD|21|Skrill|A|X|0|X|0|0|SYSTEM|20121209|154000|0|0|0|
22|0|0|USD|22|AES|L|X|0|X|0|0|SYSTEM|20121209|154200|0|0|0|
23|0|0|JPY|23|Marui|L|X|0|X|0|0|SYSTEM|20121209|154400|0|0|0|
24|0|0|JPY|24|Amex JP|L|X|0|X|0|0|SYSTEM|20121209|154500|0|0|0|
3|0|0|JPY|13|Mitsubishi Bank Futsuu|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
Of course the sort order on journalhdr (where the select works properly) is numeric.
Solved! (sort-of)
The data can be fixed like this:
sqlite> update ledger set acctno = 23 where rowid = 13;
sqlite> select * from ledger where acctno = 25;
25|0|0|JPY|0|Test|L|X|0|X|0|0|SYSTEM|20130224|132500|0|0|0|
Still, if it was stored as strings, then that leave a few questions:
1. Why couldn't I select it as a string using the quotes?
2. How did it get stored as a string since it is a valid integer?
3. How would you go about detecting this problem normally besides noticing bizzarre symptoms?
Although the data would normally be entered by my program, some of it was created by hand using Navicat, so I assume the problem must lie there.
You are victim of SQLite dynamic typing.
Even though SQLite defines system of type affinity, which sets some rules on how input strings or numbers will be converted to actual internal values, but it does NOT prevent software that is using prepared statements to explicitly set any type (and data value) for the column (and this can be different per row!).
This can be shown by this simple example:
CREATE TABLE ledger (acctno INTEGER, name VARCHAR(16));
INSERT INTO ledger VALUES(1, 'John'); -- INTEGER '1'
INSERT INTO ledger VALUES(2 || X'00', 'Zack'); -- BLOB '2\0'
I have inserted second row not as INTEGER, but as binary string containing embedded zero byte. This reproduces your issue exactly, see this SQLFiddle, step by step. You can also execute these commands in sqlite3, you will get the same result.
Below is Perl script that also reproduces this issue
This script creates just 2 rows with acctno having values of integer 1 for first, and "2\0" for second row. "2\0" means string consisting of 2 bytes: first is digit 2, and second is 0 (zero) byte.
Of course, it is very difficult to visually tell "2\0" from just "2", but this is what script below demonstrates:
#!/usr/bin/perl -w
use strict;
use warnings;
use DBI qw(:sql_types);
my $dbh = DBI->connect("dbi:SQLite:test.db") or die DBI::errstr();
$dbh->do("DROP TABLE IF EXISTS ledger");
$dbh->do("CREATE TABLE ledger (acctno INTEGER, name VARCHAR(16))");
my $sth = $dbh->prepare(
"INSERT INTO ledger (acctno, name) VALUES (?, ?)");
$sth->bind_param(1, "1", SQL_INTEGER);
$sth->bind_param(2, "John");
$sth->execute();
$sth->bind_param(1, "2\0", SQL_BLOB);
$sth->bind_param(2, "Zack");
$sth->execute();
$sth = $dbh->prepare(
"SELECT count(*) FROM ledger WHERE acctno = ?");
$sth->bind_param(1, "1");
$sth->execute();
my ($num1) = $sth->fetchrow_array();
print "Number of rows matching id '1' is $num1\n";
$sth->bind_param(1, "2");
$sth->execute();
my ($num2) = $sth->fetchrow_array();
print "Number of rows matching id '2' is $num2\n";
$sth->bind_param(1, "2\0", SQL_BLOB);
$sth->execute();
my ($num3) = $sth->fetchrow_array();
print "Number of rows matching id '2<0>' is $num3\n";
Output of this script is:
Number of rows matching id '1' is 1
Number of rows matching id '2' is 0
Number of rows matching id '2<0>' is 1
If you were to look at resultant table using any SQLite tool (including sqlite3), it will print 2 for second row - they all get confused by trailing 0 inside a BLOB when it gets coerced to string or number.
Note that I had to use custom param binding to coerce type to BLOB and permit null bytes stored:
$sth->bind_param(1, "2\0", SQL_BLOB);
Long story short, it is either some of your client programs, or some of client tools like Navicat which screwed it up.

SQLite View with functions?

So let's say I have an FTS[34] table data, and a regular table info, that are bound together by a info.ID = data.rowid. I have a view master that represents this join, that I use to query overall information.
Here are the CREATE TABLE statements:
CREATE VIRTUAL TABLE data USING fts4(
Name,
Keywords,
Aliases
Description);
CREATE TABLE info (
ID INTEGER PRIMARY KEY,
-- A bunch of other columns...
);
CREATE VIEW master AS SELECT
info.ID AS ID
data.Name AS Name,
data.Keywords AS Keywords,
data.Aliases AS Aliases,
data.Description AS Description,
-- A bunch of other columns...
FROM info JOIN data ON info.ID = data.rowid;
Now what I want to do is create another view that instead of selecting the entire cell from the data table, selects the result of the SQLite snippet function as performed on each column of the data table:
CREATE VIEW search AS SELECT
master.*,
snippet(data '<b>', '</b>', '...', 0) AS sn_Name,
snippet(data '<b>', '</b>', '...', 1) AS sn_Keywords,
snippet(data '<b>', '</b>', '...', 2) AS sn_Aliases,
snippet(data '<b>', '</b>', '...', 3) AS sn_Description
FROM master JOIN data ON master.ID = data.rowid;
However, when I execute that statement from within Python I get the following:
sqlite3.OperationalError: near "'<b>'": syntax error
The comma between the data and '<b>' parameter values is missing.

SQLite: possible to update row or insert if it doesn't exist?

I'm sure I can check if a row exists by selecting it but I'm wondering if there's a slicker way that I'm just not aware of -- seems like a common enough task that there might be. This SQLite table looks something like this:
rowID QID ANID value
------ ------ ----- ------
0 axo 1 45
1 axo 2 12
If the combination of QID and ANID already exists, that value should be updated, if the combination of QID and ANID doesn't already exist then it should be inserted. While its simple enough to write:
SELECT * where QID = 'axo' and ANID = 3;
And check if the row exists then branch and either insert/update I can't help but look for a better way. Thanks in advance!
Beware: REPLACE doesn't really equal 'UPDATE OR INSERT'...REPLACE replaces the entire row. Therefore if you don't specify values for EVERY column, you'll replace the un-specified columns with NULL or the default values.
In a simple example such as above, it's likely fine, but if you get into the habit of using REPLACE as 'UPDATE OR INSERT' you'll nuke data when you forget to specify a value for every field...just a warning from experience.
The insert documentation details a REPLACE option
INSERT OR REPLACE INTO tabname (QID,ANID,value) VALUES ('axo',3,45)
The "INSERT OR REPLACE" can abbreviated to REPLACE.
Example in Perl
Revised the following example to utilize a composite primary key
use strict;
use DBI;
### Connect to the database via DBI
my $dbfile = "simple.db";
unlink $dbfile;
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");
### Create a table
$dbh->do("CREATE TABLE tblData (qid TEXT, anid INTEGER, value INTEGER, PRIMARY KEY(qid, anid))");
### Add some data
my $insert = $dbh->prepare("INSERT INTO tblData (qid,anid,value) VALUES (?,?,?)");
$insert->execute('axo', 1, 45);
$insert->execute('axo', 2, 12);
$insert->finish;
### Update data
my $insert_update = $dbh->prepare("REPLACE INTO tblData (qid,anid,value) VALUES (?,?,?)");
$insert_update->execute('axo', 2, 500);
$insert_update->execute('axo', 10, 500);
$insert_update->finish;
### Print out the data
my $select = $dbh->prepare("SELECT * FROM tblData ORDER BY 1,2");
$select->execute;
while (my #row = $select->fetchrow_array()) {
printf "Row: %s\n", join(" - ", #row);
}
$select->finish;
$dbh->disconnect;
exit 0;
Produces the following output, demonstrating the update of one row and the insert of another
Row: axo - 1 - 45
Row: axo - 2 - 500
Row: axo - 10 - 500
you can use the REPLACE command. Docs
In this particular situation it turns out that there's no easy solution - at least not that I've been able to find, despite the efforts of Mark to suggest one.
The solution I've ended up using might provide useful to someone else so I'm posting it here.
I've defined a multi-column primary key as QID + ANID and attempt to insert into the table. If that particular combination exists, it will produce an error code of 23000 which I can then use an an indicator that it should be an UPDATE instead. Here's the basic gist (in PHP):
try {
$DB->exec("INSERT INTO tblData (QID,ANID,value) VALUES ('xxx','x',1)");
}
catch(PDOException $e) {
if($e->getCode() == 23000) {
$DB->exec("UPDATE tblData SET value=value+1 WHERE ANID='x' AND QID='xxx'");
}
}
The actual code I've used is a bit more complex using prepare/placeholders and handling the error if the code isn't 23000, etc.

Resources