VALUES clause in SQLAlchemy without column name (to be compatible with sqlite) - sqlite

I have seen the answers from here VALUES clause in SQLAlchemy without being satisfactory. Basically SQLAlchemy forces you to give each column a name building the query as
SELECT * FROM (VALUES (1, 2, 3)) AS sq (colname1, colname2);
instead of using the default names "column1, column2, ..." when you don't specify (colname1, colname2). The problem with this is that specifying the column names is not compatible with sqlite. Do you know any way of doing that? I am thinking of using bare text query. The problem with that is that my full query is
SELECT pairs.column1 AS element_id,
pairs.column2 as variant_id,
products_elements.name as element_name,
elements_variants.name as variant_name
FROM (
VALUES (1, 2),
(2, 2),
(3, 1)
) AS pairs
JOIN (products_elements, elements_variants) ON (
products_elements.id = pairs.column1
AND elements_variants.id = pairs.column2
);
and I don't know how to embed the values. Thanks

If you want a raw query you can name to columns with a CTE:
WITH pairs(colname1, colname2) AS (VALUES (1, 2), (2, 2), (3, 1))
SELECT pairs.colname1 AS element_id,
pairs.colname2 AS variant_id,
products_elements.name AS element_name,
elements_variants.name AS variant_name
FROM pairs
JOIN products_elements ON products_elements.id = pairs.colname1
JOIN elements_variants ON elements_variants.id = pairs.colname2;

Related

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?
The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

What is the Sqlite analogue for table literals as in Postgresql or Sybase?

What is the Sqlite analogue for table literals as in Postgresql or Sybase?
select * from (values (1, 'a'), (2, 'b')) as t(x,y);
Such table literals are supported since version 3.8.3, as part of the common table expression support.
To specify column names, you must use an actual common table expression:
WITH t(x, y) AS (
VALUES (1, 'a'), (2, 'b')
)
SELECT * FROM t;
It's the same in SQLite (tested with 3.14). However the alias specifying the column names is not supported.
So this works:
select *
from (values (1, 'a'), (2, 'b')) as t;
I don't know how to specify the alias for the columns though.

Undesired flattening occuring

I'm using BigQuery on exported GA data (see schema here)
Looking at the documentation, I see that when I selected a field that is inside a record it will automatically flatten that record and duplicate the surrounding columns.
So I tried to create a denormalized table that I could query in a more SQL like mindset
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
I wrote the result of this table into another table denorm (flatten on, large data set on).
I get different results when I query denorm with the clause
WHERE session_id_STRING = "100001897901013346771447300813"
versus wrapping the above query in (which yields desired results)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
I'm sure this is by design, but if someone could explain the difference between these two methods that would be very helpful?
I believe you are saying that you did check the box "Flatten Results" when you created the output table? And I assume from your question that session_id_STRING is a repeated field?
If those are correct assumptions, then what you are seeing is exactly the behavior you referenced from the documentation above. You asked BigQuery to "flatten results" so it turned your repeated field into an un-repeated field and duplicated all the fields around it so that you have a flat (i.e., no repeated data) table.
If the desired behavior is the one you see when querying over the subquery, then you should uncheck that box when creating your table.
Looking at the documentation, I see that when I selected a field that
is inside a record it will automatically flatten that record and
duplicate the surrounding columns.
This is not correct. BTW, can you please point to the documentation - it needs to be improved.
Selecting a field does not flatten that record. So if you have a table T with a single record {a = 1, b = (2, 2, 3)}, then do
SELECT * FROM T WHERE b = 2
You still get a single record {a = 1, b = (2, 2)}. SELECT COUNT(a) from this subquery would return 1.
But once you write results of this query with flatten=on, you get two records: {a = 1, b = 2}, {a = 1, b = 2}. SELECT COUNT(a) from the flattened table would return 2.

SQLite group-by behaviour

If a column in the SELECT clause is omitted from the GROUP BY clause, does SQLite group by the remaining columns (by default), and then return the value of the omitted column in the first row it evaluates?
For example, finding the TransactionId associated with the highest value per ProductId:
CREATE TABLE IF NOT EXISTS ProductTransaction
(
Id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
ProductId INTEGER NOT NULL,
TransactionType INTEGER NOT NULL,
Value INTEGER NOT NULL
);
INSERT INTO ProductTransaction (ProductId, TransactionType, Value)
VALUES (1, 7, 23), (1, 3, 12), (2, 4, 43), (1, 7, 5), (1, 10, 23),
(3, 3, 23), (3, 2, 31), (1, 1, 23), (2, 5, 50), (2, 6, 14), (1, 4, 23);
SELECT ProductId
, TransactionType
, MAX(Value)
FROM ProductTransaction
GROUP BY ProductId;
DELETE FROM ProductTransaction;
Running the previous statements gives me the TransactionType of 7 for ProductId 1 (Highest value 23).
However, if I add an the index:
CREATE INDEX IF NOT EXISTS IDX_TransType ON ProductTransaction(ProductId ASC, TransactionType ASC);
It returns the TransactionType 1, presumably because it's now ordering the rows according to the index. Modifying the index supports this theory:
CREATE INDEX IF NOT EXISTS IDX_TransType ON ProductTransaction(ProductId ASC, TransactionType DESC);
It will now return TransactionType 10 for ProductId 1.
Is this behaviour by design, or is it just an unreliable side-effect?
EDIT: It seems that it's an unreliable side-effect. From the documentation:
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
https://www.sqlite.org/lang_select.html#resultset
Since SQLite 3.7.11, using MAX() or MIN() will force any non-aggregated columns to come from the same row that matches the MAX()/MIN().
However, when there are multiple rows with the same largest/smalles value, it is still unspecified from which of those rows the other columns' values come. (SQLite's behaviour is consistent in this regard, but can change in different versions or with different database schemas.)

SQLite Insert and Replace with condition

I can not figure out how to query a SQLite.
needed:
1) Replace the record (the primary key), if the condition (comparison of new and old fields entries)
2) Insert an entry if no such entry exists in the database on the primary key.
Importantly, it has to work very fast!
I can not come up with an effective inquiry.
Edit.
MyInsertRequest - the desired expression.
Script:
CREATE TABLE testtable (a INT PRIMARY KEY, b INT, c INT)
INSERT INTO testtable VALUES (1, 2, 3)
select * from testtable
1|2|3
-- Adds an entry, because the primary key is not
++ MyInsertRequest VALUES (2, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
-- Adds
++ MyInsertRequest VALUES (3, 8, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing, because such a record (from primary key field 'a')
-- is in the database and none c>4
++ MyInsertRequest VALUES (1, 2, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- Does nothing
++ MyInsertRequest VALUES (3, 34, 3) {if c>4 then replace}
select * from testtable
1|2|3
2|2|3
3|8|3
-- replace, because such a record (from primary key field 'a')
-- is in the database and c>2
++ MyInsertRequest VALUES (3, 34, 1) {if c>2 then replace}
select * from testtable
1|2|3
2|2|3
3|34|1
Isn't INSERT OR REPLACE what you need ? e.g. :
INSERT OR REPLACE INTO table (cola, colb) values (valuea, valueb)
When a UNIQUE constraint violation occurs, the REPLACE algorithm
deletes pre-existing rows that are causing the constraint violation
prior to inserting or updating the current row and the command
continues executing normally.
You have to put the condition in a unique constraint on the table. It will automatically create an index to make the check efficient.
e.g.
-- here the condition is on columnA, columnB
CREATE TABLE sometable (columnPK INT PRIMARY KEY,
columnA INT,
columnB INT,
columnC INT,
CONSTRAINT constname UNIQUE (columnA, columnB)
)
INSERT INTO sometable VALUES (1, 1, 1, 0);
INSERT INTO sometable VALUES (2, 1, 2, 0);
select * from sometable
1|1|1|0
2|1|2|0
-- insert a line with a new PK, but with existing values for (columnA, columnB)
-- the line with PK 2 will be replaced
INSERT OR REPLACE INTO sometable VALUES (12, 1, 2, 6)
select * from sometable
1|1|1|0
12|1|2|6
Assuming your requirements are:
Insert a new row when a doesn't exists;
Replacing row when a exist and existing c greater then new c;
Do nothing when a exist and existing c lesser or equal then new c;
INSERT OR REPLACE fits first two requirements.
For last requirement, the only way I know to make an INSERT ineffective is supplying a empty rowset.
A SQLite command like following whould make the job:
INSERT OR REPLACE INTO sometable SELECT newdata.* FROM
(SELECT 3 AS a, 2 AS b, 1 AS c) AS newdata
LEFT JOIN sometable ON newdata.a=sometable.a
WHERE newdata.c<sometable.c OR sometable.a IS NULL;
New data (3,2,1 in this example) is LEFT JOINen with current table data.
Then WHERE will "de-select" the row when new c is not less then existing c, keeping it when row is new, ie, sometable.* IS NULL.
I tried the others answers because I was also suffering from a solution to this problem.
This should work, however I am unsure about the performance implications. I believe that you may need the first column to be unique as a primary key else it will simply insert a new record each time.
INSERT OR REPLACE INTO sometable
SELECT columnA, columnB, columnC FROM (
SELECT columnA, columnB, columnC, 1 AS tmp FROM sometable
WHERE sometable.columnA = 1 AND
sometable.columnB > 9
UNION
SELECT 1 AS columnA, 1 As columnB, 404 as columnC, 0 AS tmp)
ORDER BY tmp DESC
LIMIT 1
In this case one dummy query is executed and union-ed onto a second query which would have a performance impact depending on how it is written and how the table is indexed. The next performance problem has potential where the results are ordered and limited. However, I expect that the second query should only return one record and therefore it should not be too much of a performance hit.
You can also omit the ORDER BY tmp LIMIT 1 and it works with my version of sqlite, but it may impact performance since it can end up updating the record twice (writing the original value then the new value if applicable).
The other problem is that you end up with a write to the table even if the condition states that it should not be updated.

Resources