SQLITE: Replacing highly redundant with index to another table - sqlite

I have a table t with around 500,000 rows. One of the columns (stringtext) contains a very long string and I have now discovered that that there are in fact only 80 distinct strings. I'd like to declutter table t by moving the strings into a separate table, s, and merely referencing them in t.
I have created a separate table of the long strings, including what is effectively an explicit row-index number using:
CREATE TEMPORARY TABLE stmp AS
SELECT DISTINCT
stringtext
FROM t;
CREATE TABLE s AS
SELECT _ROWID_ AS stringindex, stringtext
FROM stmp;
(It was creating this table that showed me there were only a few distinct strings).
How can I now replace stringtext in t with the corresponding stringindex from s?

I would think about something like Update t set stringtext = (select stringindex from s where s.stringtext = t.stringtext) and would recommend first making an index on s(stringtext) as SQLite might not be smart enough to build a temporary index. And then a VACUUMing would be in order.
Untested.

Related

How can I save row numbers to a table in SQLite?

I have added row numbers to a table (merged) thus:
SELECT ROW_NUMBER() OVER (ORDER BY Pclass) RowNum, *
FROM merged;
Which returns:
1|1|0|58|0|0|146.5208|0|20|0|1|1|0.53043592
2|1|0|31|1|0|113.275|0|23|0|1|1|0.671198682
3|1|0|38|0|0|227.525|0|29|0|1|1|0.888825796
4|1|0|36|0|2|71|0|23|1|0|1|0.49853335
However, when I then check merged, the row numbers are no longer present (note that this produces unordered results, but nevertheless shows the point I am making):
SELECT * FROM merged;
2|0|24|0|0|13|0|38|1|0|0|0.505845678
3|1|61|0|0|6.2375|0|25|1|0|0|0.128146005
2|0|17|0|0|12|0|21|0|1|1|0.465261004
2|1|18|0|0|11.5|0|26|1|0|0|0.458356337
I suspect that the way to achieve this is to update merged by adding a new column and then adding the row numbers to said column, but I don't know how to go about it.
As such, my question is this: how can I save row numbers to merged?
SELECT statement won't change the merged table. Updating the existing table will be a bit complicated, and I guess there is no trivial way to do that. So the easier way is to create a new table, drop the previous one and renaming the new one to be the old one.
This code should work:
CREATE TABLE new_merged
AS (SELECT ROW_NUMBER() OVER (ORDER BY Pclass) RowNum, * FROM merged);
DROP TABLE merged;
ALTER TABLE new_merged RENAME TO merged;

How to Query for Recent Rows in SQLITE3

I'm using SQLite3 and trying to query for recent rows. So I'm having SQLite3 insert a unix timestamp into each row with strftime('%s','now'). My Table looks like this:
CREATE TABLE test(id INTEGER PRIMARY KEY, time);
INSERT INTO test (time) VALUES (strftime('%s','now')); --Repeated
SELECT * FROM test;
1|1516816522
2|1516816634
3|1516816646 --etc lots of rows
Now I want to query for only recent entries, for example, I'm trying to get all rows with a time within the last hour. I'm trying the following SQL query:
SELECT * FROM test WHERE time > strftime('%s','now')-60*60;
However, that always returns all rows regardless of the value in the time column. I really don't know what's going on.
Also, if I put WHERE time > strftime('%s','now') it'll return nothing (which is expected) but if I put WHERE time > strftime('%s','now')-1 then it'll return everything. I don't know why.
Here's one more example:
sqlite> SELECT , strftime('%s','now')-1 AS window FROM test WHERE time > window;
1|1516816522|1516817482
2|1516816634|1516817482
3|1516816646|1516817482
It seems that SQLite3 thinks the values in the middle column are greater than the values in the right column!?
This isn't at all what I expect. Can someone please tell me what's going on? Thanks!
The purpose of strftime() is to format values, so it returns a string.
When you try to do computations with its return value, the database must convert it into a number. And numbers and strings cannot be compared directly with each other.
You must ensure that both values in a comparison have the same data type.
The best way to do this is to store numbers in the table:
INSERT INTO test (time)
VALUES (CAST(strftime('%s','now') AS MAKE_THIS_A_NUMBER_PLEASE));
(Or just declare the column type as something with numeric affinity.)

Comparison operators behave differently after indexing

With this schema:
CREATE TABLE temperatures(
sometext TEXT,
lowtemp INT,
hightemp INT,
moretext TEXT);
When I do search
select * from temperatures where lowtemp < 20 and hightemp > 20;
I get the correct result which is always one record (due to the specifics of the data).
Now, when I index the table:
CREATE INDEX ltemps ON temperatures(lowtemp);
CREATE INDEX htemps ON temperatures(hightemp);
The exact same query above stops providing expected results -- now I get many records, including ones where the lowtemp and hightemp obviously don't meet the comparison test.
I'm running this on the same sqlite3 database, same table. The only difference is adding the above 2 index statements after table creation.
Can someone explain how indexing influences this behavior?

SQLite - Update with random unique value

I am trying to populate everyrow in a column with random ranging from 0 to row count.
So far I have this
UPDATE table
SET column = ABS (RANDOM() % (SELECT COUNT(id) FROM table))
This does the job but produces duplicate values, which turned out to be bad. I added a Unique constraint but that just causes it to crash.
Is there a way to update a column with random unique values from certain range?
Thanks!
If you want to later read the records in a random order, you can just do the ordering at that time:
SELECT * FROM MyTable ORDER BY random()
(This will not work if you need the same order in multiple queries.)
Otherwise, you can use a temporary table to store the random mapping between the rowids of your table and the numbers 1..N.
(Those numbers are automatically generated by the rowids of the temporary table.)
CREATE TEMP TABLE MyOrder AS
SELECT rowid AS original_rowid
FROM MyTable
ORDER BY random();
UPDATE MyTable
SET MyColumn = (SELECT rowid
FROM MyOrder
WHERE original_rowid = MyTable.rowid) - 1;
DROP TABLE MyOrder;
What you seem to be seeking is not simply a set of random numbers, but rather a random permutation of the numbers 1..N. This is harder to do. If you look in Knuth (The Art of Computer Programming), or in Bentley (Programming Pearls or More Programming Pearls), one suggested way is to create an array with the values 1..N, and then for each position, swap the current value with a randomly selected other value from the array. (I'd need to dig out the books to check whether it is any arbitrary position in the array, or only with a value following it in the array.) In your context, then you apply this permutation to the rows in the table under some ordering, so row 1 under the ordering gets the value in the array at position 1 (using 1-based indexing), etc.
In the 1st Edition of Programming Pearls, Column 11 Searching, Bentley says:
Knuth's Algorithm P in Section 3.4.2 shuffles the array X[1..N].
for I := 1 to N do
Swap(X[I], X[RandInt(I,N)])
where the RandInt(n,m) function returns a random integer in the range [n..m] (inclusive). That's nothing if not succinct.
The alternative is to have your code thrashing around when there is one value left to update, waiting until the random number generator picks the one value that hasn't been used yet. As a hit and miss process, that can take a while, especially if the number of rows in total is large.
Actually translating that into SQLite is a separate exercise. How big is your table? Is there a convenient unique key on it (other than the one you're randomizing)?
Given that you have a primary key, you can easily generate an array of structures such that each primary key is allocated a number in the range 1..N. You then use Algorithm P to permute the numbers. Then you can update the table from the primary keys with the appropriate randomized number. You might be able to do it all with a second (temporary) table in SQL, especially if SQLite supports UPDATE statements with a join between two tables. But it is probably nearly as simple to use the array to drive singleton updates. You'd probably not want a unique constraint on the random number column while this update is in progress.

How to create a view that returns a 2x2 (or NxN) matrix of results

So I know enough SQL just to be really dangerous (I don't normally work the back-end) but cannot get the following view to be created successfully ;) The result set I'm after is a data set that has rows assigned as a column alias from multiple tables (instead of a 1xN flat of all columns). There is a many-to-one relationship when looking at the main table, based on foreign keys associated to the row id of the appropriate related table.
Ideally I'd like a data set that looks like this in the return:
dataset.transaction_row[n]: col1, col2, col3, coln... (columns from the transaction table)
dataset.category_row[n]: col1, co2, col3, coln... (columns from the category table)
and so on...
I get the following error:
Query Error: near "AS": syntax error Unable to execute statement
From:
CREATE VIEW view_unreconciled_transactions
AS SELECT account_transaction.* AS transaction_row,
category.* AS category_row,
memorized.name_rule_replace OR account_transaction.name AS payee
FROM account_transaction
LEFT JOIN memorized ON account_transaction.memorized_key = memorized.id
LEFT JOIN category ON account_transaction.category_key = category.id
WHERE status != 2
ORDER BY account_transaction.dt_posted DESC
It seems easy enough since the result-column selector is repeatable which includes expressions (referencing sqlite's syntax diagrams). In reference to the error, I'm assuming it's complaining about the 2nd 'AS' where I'm trying to get table.* assigned as an alias. Any help in the right direction is appreciated. If I had to, I suppose I could explicitly state all columns but that feels like a kludge.
The AS modifier can only be applied to a single column, not to a collection such as the * you used. You will have to break them out into specific names, (which is best practice IMHO anyway)
It looks like you want to make a "pivot table". They can be tricky to make in a database. I can say that if you a data result, where each row comes from a different table source, and the columns form each table are IDENTICAL, then you could try using a UNION statement to join the different results together like they are just one dataset.
NOTE that the columns all take their naming cue from the first dataset in a UNION and the datatype all need to be the same.

Resources