Let us say I have this contrived/simplified sql:
SELECT
Column1/Column2
FROM SomeTable;
where Column2 could be 0 resulting in a div by zero and thus the whole query falling over. I could do something like this:
SELECT
Column1/Column2
FROM SomeTable
where Column2 <> 0;
or use a COALESCE, CASE statement etc. The challenge I have is that the 'formula' e.g. Column1/Column2 can be dynamic (no danger of sql injection).
Ideally, I would like to wrap the 'computed column's' calculation in some kind of try catch and return -9999 for any failed calculation and otherwise the result of the calculation. I hope this makes sense. If so, can this be achieved somehow?
Related
I am new to Snowflake, and running a query to get a couple of day's data - this returns more than 200 million rows, and take a few days. I tried running the same query in Jupyter - and the kernel restars/dies before the query ends. Even if it got into Jupyter - I doubt I could analyze the data in any reasonable timeline (but maybe using dask?).
I am not really sure where to start - I am trying to check the data for missing values, and my first instinct was to use Jupyter - but I am lost at the moment.
My next idea is to stay within Snowflake - and check the columns there with case statements (e.g. sum(case when column_value = '' then 1 else 0 end) as number_missing_values
Does anyone have any ideas/direction I could try - or know if I'm doing something wrong?
Thank you!
not really the answer you are looking for but
sum(case when column_value = '' then 1 else 0 end) as number_missing_values`
when you say missing value, this will only find values that are an empty string
this can also be written is a simpler form as:
count_if(column_value = '') as number_missing_values
The data base already knows how many rows are in a column, and it knows how many null columns there are. If loading data into a table, it might make more sense to not load empty strings, and use null instead then, for not compute cost you can run:
count(*) - count(column) as number_empty_values
also of note, if you have two tables in snowflake you can compare the via the MINUS
aka
select * from table_1
minus
select * from table_2
is useful to find missing rows, you do have to do it in both directions.
Then you can HASH rows, or hash the whole table via HASH_AGG
But normally when looking for missing data, you have an external system, so the driver is 'what can that system handle' and finding common ground.
Also in the past we where search for bugs in our processing that cause duplicate data (where we needed/wanted no duplicates) so then the above, and COUNT DISTINCT like commands come in useful.
I'm using SQLite3 and trying to query for recent rows. So I'm having SQLite3 insert a unix timestamp into each row with strftime('%s','now'). My Table looks like this:
CREATE TABLE test(id INTEGER PRIMARY KEY, time);
INSERT INTO test (time) VALUES (strftime('%s','now')); --Repeated
SELECT * FROM test;
1|1516816522
2|1516816634
3|1516816646 --etc lots of rows
Now I want to query for only recent entries, for example, I'm trying to get all rows with a time within the last hour. I'm trying the following SQL query:
SELECT * FROM test WHERE time > strftime('%s','now')-60*60;
However, that always returns all rows regardless of the value in the time column. I really don't know what's going on.
Also, if I put WHERE time > strftime('%s','now') it'll return nothing (which is expected) but if I put WHERE time > strftime('%s','now')-1 then it'll return everything. I don't know why.
Here's one more example:
sqlite> SELECT , strftime('%s','now')-1 AS window FROM test WHERE time > window;
1|1516816522|1516817482
2|1516816634|1516817482
3|1516816646|1516817482
It seems that SQLite3 thinks the values in the middle column are greater than the values in the right column!?
This isn't at all what I expect. Can someone please tell me what's going on? Thanks!
The purpose of strftime() is to format values, so it returns a string.
When you try to do computations with its return value, the database must convert it into a number. And numbers and strings cannot be compared directly with each other.
You must ensure that both values in a comparison have the same data type.
The best way to do this is to store numbers in the table:
INSERT INTO test (time)
VALUES (CAST(strftime('%s','now') AS MAKE_THIS_A_NUMBER_PLEASE));
(Or just declare the column type as something with numeric affinity.)
I'd like to have a Calculated Column in a table that counts the instances of a concatenation.
I get the following error when inputting Abs(Count([concat])) as the column formula for the calculation: The expression Abs(Count([concat])) cannot be used in a calculated column.
Is there any other way to do it without doing a query? I'm pretty sure it can't be done but I figured I'd ask anyways since I didn't see any other posts about it.
No, and even if there was, you should create and use a query for this.
Besides, applying Abs on a count doesn't make much sense, as the count cannot be negative.
So I know enough SQL just to be really dangerous (I don't normally work the back-end) but cannot get the following view to be created successfully ;) The result set I'm after is a data set that has rows assigned as a column alias from multiple tables (instead of a 1xN flat of all columns). There is a many-to-one relationship when looking at the main table, based on foreign keys associated to the row id of the appropriate related table.
Ideally I'd like a data set that looks like this in the return:
dataset.transaction_row[n]: col1, col2, col3, coln... (columns from the transaction table)
dataset.category_row[n]: col1, co2, col3, coln... (columns from the category table)
and so on...
I get the following error:
Query Error: near "AS": syntax error Unable to execute statement
From:
CREATE VIEW view_unreconciled_transactions
AS SELECT account_transaction.* AS transaction_row,
category.* AS category_row,
memorized.name_rule_replace OR account_transaction.name AS payee
FROM account_transaction
LEFT JOIN memorized ON account_transaction.memorized_key = memorized.id
LEFT JOIN category ON account_transaction.category_key = category.id
WHERE status != 2
ORDER BY account_transaction.dt_posted DESC
It seems easy enough since the result-column selector is repeatable which includes expressions (referencing sqlite's syntax diagrams). In reference to the error, I'm assuming it's complaining about the 2nd 'AS' where I'm trying to get table.* assigned as an alias. Any help in the right direction is appreciated. If I had to, I suppose I could explicitly state all columns but that feels like a kludge.
The AS modifier can only be applied to a single column, not to a collection such as the * you used. You will have to break them out into specific names, (which is best practice IMHO anyway)
It looks like you want to make a "pivot table". They can be tricky to make in a database. I can say that if you a data result, where each row comes from a different table source, and the columns form each table are IDENTICAL, then you could try using a UNION statement to join the different results together like they are just one dataset.
NOTE that the columns all take their naming cue from the first dataset in a UNION and the datatype all need to be the same.
I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w