I'm using SQLite3 and trying to query for recent rows. So I'm having SQLite3 insert a unix timestamp into each row with strftime('%s','now'). My Table looks like this:
CREATE TABLE test(id INTEGER PRIMARY KEY, time);
INSERT INTO test (time) VALUES (strftime('%s','now')); --Repeated
SELECT * FROM test;
1|1516816522
2|1516816634
3|1516816646 --etc lots of rows
Now I want to query for only recent entries, for example, I'm trying to get all rows with a time within the last hour. I'm trying the following SQL query:
SELECT * FROM test WHERE time > strftime('%s','now')-60*60;
However, that always returns all rows regardless of the value in the time column. I really don't know what's going on.
Also, if I put WHERE time > strftime('%s','now') it'll return nothing (which is expected) but if I put WHERE time > strftime('%s','now')-1 then it'll return everything. I don't know why.
Here's one more example:
sqlite> SELECT , strftime('%s','now')-1 AS window FROM test WHERE time > window;
1|1516816522|1516817482
2|1516816634|1516817482
3|1516816646|1516817482
It seems that SQLite3 thinks the values in the middle column are greater than the values in the right column!?
This isn't at all what I expect. Can someone please tell me what's going on? Thanks!
The purpose of strftime() is to format values, so it returns a string.
When you try to do computations with its return value, the database must convert it into a number. And numbers and strings cannot be compared directly with each other.
You must ensure that both values in a comparison have the same data type.
The best way to do this is to store numbers in the table:
INSERT INTO test (time)
VALUES (CAST(strftime('%s','now') AS MAKE_THIS_A_NUMBER_PLEASE));
(Or just declare the column type as something with numeric affinity.)
Related
am am trying to write a function in python to use sqlite and while I managed to get it to work there is a behavior in sqlite that I dont understand when using the count command. when I run the following sqlite counts as expected, ie returns an int.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10
however when I add, shown below, an offset to the end sqlite returns an emply list, in other words nothing.
SELECT COUNT (*) FROM Material WHERE level IN (?) LIMIT 10 OFFSET 82
while omitting the offset is an easy fix I don't understand why sqlite returns nothing. Is this the expected behavior for the command I gave?
thanks for reading
When you execute that COUNT(*) it will return you only a single row.
The LIMIT function is for limiting the number of rows returned. You are setting the limit to 10 which doesn't have any effect here (Because it is returning only a single row).
OFFSET is for offsetting/skipping specified number of rows. Which also doesn't have any effect here.
In simple terms your query translates to COUNT number of rows, then return 10 rows starting from 83rd position. Since you've a single row it will always return empty.
Read about LIMIT and OFFSET
I am new to Snowflake, and running a query to get a couple of day's data - this returns more than 200 million rows, and take a few days. I tried running the same query in Jupyter - and the kernel restars/dies before the query ends. Even if it got into Jupyter - I doubt I could analyze the data in any reasonable timeline (but maybe using dask?).
I am not really sure where to start - I am trying to check the data for missing values, and my first instinct was to use Jupyter - but I am lost at the moment.
My next idea is to stay within Snowflake - and check the columns there with case statements (e.g. sum(case when column_value = '' then 1 else 0 end) as number_missing_values
Does anyone have any ideas/direction I could try - or know if I'm doing something wrong?
Thank you!
not really the answer you are looking for but
sum(case when column_value = '' then 1 else 0 end) as number_missing_values`
when you say missing value, this will only find values that are an empty string
this can also be written is a simpler form as:
count_if(column_value = '') as number_missing_values
The data base already knows how many rows are in a column, and it knows how many null columns there are. If loading data into a table, it might make more sense to not load empty strings, and use null instead then, for not compute cost you can run:
count(*) - count(column) as number_empty_values
also of note, if you have two tables in snowflake you can compare the via the MINUS
aka
select * from table_1
minus
select * from table_2
is useful to find missing rows, you do have to do it in both directions.
Then you can HASH rows, or hash the whole table via HASH_AGG
But normally when looking for missing data, you have an external system, so the driver is 'what can that system handle' and finding common ground.
Also in the past we where search for bugs in our processing that cause duplicate data (where we needed/wanted no duplicates) so then the above, and COUNT DISTINCT like commands come in useful.
I'm trying to write a table to an Oracle database using the ROracle package. This works fine, however all of the numeric values are showing the full floating point decimal representation on the database. For instance, 7581.24 shows up as 7581.2399999999998.
Is there a way of specifying the number of digits to be stored after the decimal point when writing the table?
I found a work around using Allan's solution here, but it would be better not to have to change the variable after writing it to the database.
Currently I write the table with code like this:
dbWriteTable(db_connection, "TABLE_NAME", table, overwrite = TRUE)
Thanks in advance.
It's not elegant but maybe good programming to make the types and precisions explicit. I did it with something like:
if (dbExistsTable(con, "TABLE_NAME")) dbRemoveTable(con, "TABLE_NAME")
create_table <- "create table CAMS_CFDETT_2019_AA(
ID VARCHAR2(100),
VALUE NUMBER(6,2)
)"
dbGetQuery(con_maps, create_table)
ins_str <- "insert into TABLE_NAME values(:1, :2)"
dbGetQuery(con, ins_str, df)
dbCommit(con)
Essentially, it creates the table and specifies the types for each column and the precision. Then it fills in the values with those from the dataframe (df) in R. You just have to be careful that everything matches up in terms of the columns. If you assign a number to oracle with precision 2 (VALUE NUMBER(3,2) and then push a value from R with more decimals, it will round it to the assigned precision (2 in this example). It will not truncate it. So df$value = 3.1415 in R would become VALUE 3.14 in the Oracle table.
With this schema:
CREATE TABLE temperatures(
sometext TEXT,
lowtemp INT,
hightemp INT,
moretext TEXT);
When I do search
select * from temperatures where lowtemp < 20 and hightemp > 20;
I get the correct result which is always one record (due to the specifics of the data).
Now, when I index the table:
CREATE INDEX ltemps ON temperatures(lowtemp);
CREATE INDEX htemps ON temperatures(hightemp);
The exact same query above stops providing expected results -- now I get many records, including ones where the lowtemp and hightemp obviously don't meet the comparison test.
I'm running this on the same sqlite3 database, same table. The only difference is adding the above 2 index statements after table creation.
Can someone explain how indexing influences this behavior?
I have a column in my sqlite table which is string and has the following format
2011-09-06 18:34:55.863414
You can see that it identifies date and time. I'd like to construct a query that will
delete all records that are older than certain date and time.
Is this possible?
Since your date is already in the best format (largest time-period values to smallest)
DELETE FROM myTable
WHERE myDateField < '2011-09-06 18:34:55.863414'
BTW -- dates are strings in sqllite, AFAIK (which is why the format matters -- biggest values to smallest, so it works alphabetically too). IF you want to treat them as dates, you can use functions. Some good examples here: http://sqlite.org/lang_datefunc.html
DELETE FROM tablename WHERE columnname < '2011-09-06 18:34:55.863414'
See:
http://www.sqlite.org/lang_datefunc.html