SQLite How to handle Negative numbers with Negative Sign at the end - sqlite

We've never experienced this before. We are importing Tab Delimited TXT files that include numeric columns. Negative numbers have the indicator behind the number and not in front (e.g. 550.00- rather than -550.00).
We are using SQLite Expert Professional. When reviewing the import results in the table, any number with the negative sign in the back was converted to the negative sign in the front but everyone of these cells are highlighted in blue (we are not sure why SQLite Expert is doing this but assume it has meaning). In addition, when querying and summing they are being ignored causing the resulting value to be higher than expected.
The field types are FLOAT and DECIMAL
We have googled and cannot find any results about negative sign location.
Appreciate any assistance on how to handle this.

An UPDATE query could alter the values by subtracting the incorrect value from 0 after replacing the negative sign with nothing.
The UPDATE could be based upon :-
UPDATE vneg SET iv = (0 - replace(iv,'-','')) WHERE instr(iv,'-') > 1;
e.g. :-
DROP TABLE IF EXISTS vneg;
CREATE TABLE IF NOT EXISTS vneg (iv);
INSERT INTO vneg VALUES ('100.35'),('133.44-'),('25.453-');
SELECT * FROM vneg; -- First Result (before)
UPDATE vneg SET iv = (0 - replace(iv,'-','')) WHERE instr(iv,'-') > 1;
SELECT * FROM vneg; -- Second result (after)
Before the update :-
After the Update :-
And then using SELECT sum(iv) AS summed FROM vneg; results in :-
-58.543

Related

How to parse accounting number format in SQLite field?

I am a somewhat newbie to SQLite (and KMyMoney). KMyMoney (an open source personal finance manager) allows one-click exporting data into an SQLite database.
On browsing the SQLite database output, the dollar amount data is stored in a table called kmmSplits as several text fields in a strange format based on “value” and “valueFormatted” (see screen shot below). The “value” field is apparently written as a division equation (in a text format) which apparently yields the “valueFormatted” field (again in text format). The “valueFormatted is the correct number amount but the problem is that parenthesis are used to indicate a negative number instead of a simple minus in front of the value. This is apparently an accounting number format, but I don’t know how to parse this into a float value for running calculated SQL queries, etc. The positive values (without parenthesis) are no problem to convert to FLOATS.
I’ve tried using the CAST to FLOAT function but this does not do the division math, nor does it convert parenthesis into negative values (see screen shot).
The basic question is: how to parse a text value containing parenthesis in the “valueFormatted field (accounting money format) into a common number format OR, alternatively, how to convert a division equation in the “value” field to an actual calculation.
Use a CASE expression to check if valueFormatted is a numeric value inside parentheses and if it is multiply -1 with the substring starting from the 2nd char (the closing parenthesis will be discarded by SQLite during this implicit type casting):
SELECT *,
CASE
WHEN valueFormatted LIKE '(%)' THEN (-1) * SUBSTR(valueFormatted, 2)
ELSE valueFormatted
END AS value
FROM kmmSQLite;
Or, replace '(' with ''-'' and add 0 to covert the result to a number:
SELECT *,
REPLACE(valueFormatted, '(', '-') + 0 AS value
FROM kmmSQLite;

Count with limit and offset in sqlite

am am trying to write a function in python to use sqlite and while I managed to get it to work there is a behavior in sqlite that I dont understand when using the count command. when I run the following sqlite counts as expected, ie returns an int.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10
however when I add, shown below, an offset to the end sqlite returns an emply list, in other words nothing.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10 OFFSET 82
while omitting the offset is an easy fix I don't understand why sqlite returns nothing. Is this the expected behavior for the command I gave?
thanks for reading
When you execute that COUNT(*) it will return you only a single row.
The LIMIT function is for limiting the number of rows returned. You are setting the limit to 10 which doesn't have any effect here (Because it is returning only a single row).
OFFSET is for offsetting/skipping specified number of rows. Which also doesn't have any effect here.
In simple terms your query translates to COUNT number of rows, then return 10 rows starting from 83rd position. Since you've a single row it will always return empty.
Read about LIMIT and OFFSET

Snowflake, Python/Jupyter analysis

I am new to Snowflake, and running a query to get a couple of day's data - this returns more than 200 million rows, and take a few days. I tried running the same query in Jupyter - and the kernel restars/dies before the query ends. Even if it got into Jupyter - I doubt I could analyze the data in any reasonable timeline (but maybe using dask?).
I am not really sure where to start - I am trying to check the data for missing values, and my first instinct was to use Jupyter - but I am lost at the moment.
My next idea is to stay within Snowflake - and check the columns there with case statements (e.g. sum(case when column_value = '' then 1 else 0 end) as number_missing_values
Does anyone have any ideas/direction I could try - or know if I'm doing something wrong?
Thank you!
not really the answer you are looking for but
sum(case when column_value = '' then 1 else 0 end) as number_missing_values`
when you say missing value, this will only find values that are an empty string
this can also be written is a simpler form as:
count_if(column_value = '') as number_missing_values
The data base already knows how many rows are in a column, and it knows how many null columns there are. If loading data into a table, it might make more sense to not load empty strings, and use null instead then, for not compute cost you can run:
count(*) - count(column) as number_empty_values
also of note, if you have two tables in snowflake you can compare the via the MINUS
aka
select * from table_1
minus
select * from table_2
is useful to find missing rows, you do have to do it in both directions.
Then you can HASH rows, or hash the whole table via HASH_AGG
But normally when looking for missing data, you have an external system, so the driver is 'what can that system handle' and finding common ground.
Also in the past we where search for bugs in our processing that cause duplicate data (where we needed/wanted no duplicates) so then the above, and COUNT DISTINCT like commands come in useful.

How to Query for Recent Rows in SQLITE3

I'm using SQLite3 and trying to query for recent rows. So I'm having SQLite3 insert a unix timestamp into each row with strftime('%s','now'). My Table looks like this:
CREATE TABLE test(id INTEGER PRIMARY KEY, time);
INSERT INTO test (time) VALUES (strftime('%s','now')); --Repeated
SELECT * FROM test;
1|1516816522
2|1516816634
3|1516816646 --etc lots of rows
Now I want to query for only recent entries, for example, I'm trying to get all rows with a time within the last hour. I'm trying the following SQL query:
SELECT * FROM test WHERE time > strftime('%s','now')-60*60;
However, that always returns all rows regardless of the value in the time column. I really don't know what's going on.
Also, if I put WHERE time > strftime('%s','now') it'll return nothing (which is expected) but if I put WHERE time > strftime('%s','now')-1 then it'll return everything. I don't know why.
Here's one more example:
sqlite> SELECT , strftime('%s','now')-1 AS window FROM test WHERE time > window;
1|1516816522|1516817482
2|1516816634|1516817482
3|1516816646|1516817482
It seems that SQLite3 thinks the values in the middle column are greater than the values in the right column!?
This isn't at all what I expect. Can someone please tell me what's going on? Thanks!
The purpose of strftime() is to format values, so it returns a string.
When you try to do computations with its return value, the database must convert it into a number. And numbers and strings cannot be compared directly with each other.
You must ensure that both values in a comparison have the same data type.
The best way to do this is to store numbers in the table:
INSERT INTO test (time)
VALUES (CAST(strftime('%s','now') AS MAKE_THIS_A_NUMBER_PLEASE));
(Or just declare the column type as something with numeric affinity.)

SQLite - Update with random unique value

I am trying to populate everyrow in a column with random ranging from 0 to row count.
So far I have this
UPDATE table
SET column = ABS (RANDOM() % (SELECT COUNT(id) FROM table))
This does the job but produces duplicate values, which turned out to be bad. I added a Unique constraint but that just causes it to crash.
Is there a way to update a column with random unique values from certain range?
Thanks!
If you want to later read the records in a random order, you can just do the ordering at that time:
SELECT * FROM MyTable ORDER BY random()
(This will not work if you need the same order in multiple queries.)
Otherwise, you can use a temporary table to store the random mapping between the rowids of your table and the numbers 1..N.
(Those numbers are automatically generated by the rowids of the temporary table.)
CREATE TEMP TABLE MyOrder AS
SELECT rowid AS original_rowid
FROM MyTable
ORDER BY random();
UPDATE MyTable
SET MyColumn = (SELECT rowid
FROM MyOrder
WHERE original_rowid = MyTable.rowid) - 1;
DROP TABLE MyOrder;
What you seem to be seeking is not simply a set of random numbers, but rather a random permutation of the numbers 1..N. This is harder to do. If you look in Knuth (The Art of Computer Programming), or in Bentley (Programming Pearls or More Programming Pearls), one suggested way is to create an array with the values 1..N, and then for each position, swap the current value with a randomly selected other value from the array. (I'd need to dig out the books to check whether it is any arbitrary position in the array, or only with a value following it in the array.) In your context, then you apply this permutation to the rows in the table under some ordering, so row 1 under the ordering gets the value in the array at position 1 (using 1-based indexing), etc.
In the 1st Edition of Programming Pearls, Column 11 Searching, Bentley says:
Knuth's Algorithm P in Section 3.4.2 shuffles the array X[1..N].
for I := 1 to N do
Swap(X[I], X[RandInt(I,N)])
where the RandInt(n,m) function returns a random integer in the range [n..m] (inclusive). That's nothing if not succinct.
The alternative is to have your code thrashing around when there is one value left to update, waiting until the random number generator picks the one value that hasn't been used yet. As a hit and miss process, that can take a while, especially if the number of rows in total is large.
Actually translating that into SQLite is a separate exercise. How big is your table? Is there a convenient unique key on it (other than the one you're randomizing)?
Given that you have a primary key, you can easily generate an array of structures such that each primary key is allocated a number in the range 1..N. You then use Algorithm P to permute the numbers. Then you can update the table from the primary keys with the appropriate randomized number. You might be able to do it all with a second (temporary) table in SQL, especially if SQLite supports UPDATE statements with a join between two tables. But it is probably nearly as simple to use the array to drive singleton updates. You'd probably not want a unique constraint on the random number column while this update is in progress.

Resources