Sqlite UNION ALL Query with Different Columns - sqlite

Using UNION ALL with Queries with different number of column returns the following error sqlite3.OperationalError: SELECTs to the left and right of UNION ALL do not have the same number of result columns.
I tried
this answer but I think that is now outdated and does not work. I tried to find something in the documentation but I couldn't find it.
Both UNION and UNION ALL do not work.
This answer is a bit complex for me to understand.
What would be the workaround to achieve this? A Column with Null - how do I do that ?
Update:
Also, I don't know the no. or name of tables as in my program I allow the user to create and manipulate data.
To find out the queries in the database, I Use:
SELECT name FROM sqlite_master WHERE type='table';
and to find out the columns I use this:
[i[0] for i in cursor.description]

Simple, the column size and type for each side of a union must be identical.
You can make them identical by casting columns to the correct type, or setting missing columns to NULL.

I think you just need to compensate for the missing columns with adding 'empty' columns.
CREATE TABLE test_table1(k INTEGER, v INTEGER);
CREATE TABLE test_table2(k INTEGER);
INSERT INTO test_table1(k,v) VALUES(4, 5);
INSERT INTO test_table2(k) VALUES(4);
SELECT * FROM test_table1 UNION ALL SELECT *,'N/A' FROM test_table2;
4|5
4|N/A
Here I've added another pseudo-column with 'N/A', so the test_table2 has two columns before the UNION ALL happens.

Related

SQLITE select unique rows

I have a table where rows appear to be "duplicates" but they are actually not (they have different date).
Suppose each record has a column A that is supposed to be unique. However due to this column A could or could not appear again later with updated information (with column A unchanged), it is no longer unique even when it should be.
Therefore I want the table with latest information only. Currently this table contains 500k entries, however the "true" number of unique entries is less than half of it.
I have tried
SELECT *
FROM TABLE
WHERE A = A
AND Date = (SELECT MAX(Date) from TABLE)
ORDER BY DATE
However this only returns 2 results. How do I achieve that?
The subquery on the date is the correct idea, but you must include the column A in the subquery and relate it back to the main table. I prefer to use explicit joins rather than embedding the subquery in the WHERE statement. This is usually more efficient anyway.
SELECT TABLE.*
FROM TABLE INNER JOIN
(SELECT A, MAX(Date) AS MaxDate FROM TABLE GROUP BY A) AS latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY A, date
Or even better, I prefer CTE (Common Table Expression) syntax, since it makes the individual queries easier to read:
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
)
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
Comparison to other answer
The answer by MikeT relies on a non-standard feature of sqlite. That is okay of itself as long as you are aware that the solution is not compatible with other databases engines/servers and SQL dialects.
The next possible gotcha really relies on your actual data and table schema (neither of which you shared in the question details). If your data allows multiple rows with the same date for the a single A column value, then the conditions in your question are not enough to definitively remove all duplicates. You would need to identify another column by which to resolve any remaining duplicates, but once again your question did not do that.
However, in testing, I found that my solution allows unresolved duplicates to remain in the results. MikeT's solution eliminate all duplicates, but it does so by arbitrarily excluding one of those duplicates. There are ways to fix either solution to definitely select which duplicate to keep, but I will not even attempt that unless you post actual data and the table schema so that my answer is not just mere guessing. I'm glad that my answer was useful thus far, but you need to understand your data better (than reveal in the question) to ensure what solution is actually best.
Bonus
Against my better judgement to just keep expanding on answers... since you should really research this separately... here's an example of how you would continue joining this with other queries...
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
),
firstResults AS (
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
)
SELECT otherTable.*
FROM firstResults JOIN otherTable
ON firstResults.A = otherTable.A
WHERE somecondition = 'foobar'
Another approach if you're using a somewhat recent version of sqlite (3.25 or newer), using the row_number() window function to rank groups of the same a value by date and picking the first one:
WITH cte AS
(SELECT a, date, row_number() OVER (PARTITION BY a ORDER BY date DESC) AS rn
FROM yourtable)
SELECT a, date
FROM cte
WHERE rn = 1;
One important thing to note since I noticed you mentioning another answer was slow is that an index on mytable(a, date DESC) will be needed for this query for best results, and an index on mytable(a, date) will speed up the other answers given.
I believe, if I understand what you have written, that you could use :-
SELECT a,max(date), other FROM mytable GROUP BY a ORDER BY date;
note that the other column represents other columns (if present)
However, the other column will be an arbritary value (from one of the grouped columns) which may well be the required value (in the example it is).
As per :-
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
SQL As Understood By SQLite - SELECT
More correctly, to eliminate an arbritary value(sic) for the other column, you could use :-
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m /* AS m allows the outer query to be distinguished from the inner query */
GROUP BY a /* this effectivel removes duplicates on the a column */
ORDER BY date
;
The example below appears to produce the same result.
Example :-
Using the following to populate the table with some generated testing data :-
CREATE TABLE IF NOT EXISTS mytable (a TEXT, date TEXT, other);
WITH cte(count,a,date,other) AS
(
SELECT 1,1,date('now','+'||(random() % 30)||' days'),'other1'
UNION ALL SELECT count+1,abs(random()) % 20,date('now','+'||(abs(random()) % 30)||' days'), 'other'||(count+1) FROM cte LIMIT 100
INSERT INTO mytable (a,date,other) SELECT a,date,other FROM cte
;
SELECT * FROM mytable ORDER BY DATE DESC;
in this case :-
Highlighted rows being those required to be extracted.
Then after the above has been run the following is run
SELECT * FROM mytable WHERE a = a AND date = (SELECT MAX(date) FROM mytable);
SELECT * FROM mytable WHERE /*a = a AND*/ date = (SELECT MAX(date) FROM mytable);
/* Will only select 1 row per unique value of a BUT other will be an arbritary value not necessairlly the latest */
SELECT a,max(date), other FROM mytable GROUP BY a /* group by effectively display unique */;
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m
GROUP BY a
;
The first two results show that a = a does nothing as it will always be true.
The thrid query produces (unordered) :-
Note ticks assigned by checking the value of other from the previous result.
In this case this shorter query works OK even though values of other are arbritary values (they aren't really as it depends upon how the query planner plasn the query).
The fourth, the more correct, produces the same results :-
Result 2 (your orignal query) and 3 (original without a = a) produce :-
and :-

SQLITE: Replacing highly redundant with index to another table

I have a table t with around 500,000 rows. One of the columns (stringtext) contains a very long string and I have now discovered that that there are in fact only 80 distinct strings. I'd like to declutter table t by moving the strings into a separate table, s, and merely referencing them in t.
I have created a separate table of the long strings, including what is effectively an explicit row-index number using:
CREATE TEMPORARY TABLE stmp AS
SELECT DISTINCT
stringtext
FROM t;
CREATE TABLE s AS
SELECT _ROWID_ AS stringindex, stringtext
FROM stmp;
(It was creating this table that showed me there were only a few distinct strings).
How can I now replace stringtext in t with the corresponding stringindex from s?
I would think about something like Update t set stringtext = (select stringindex from s where s.stringtext = t.stringtext) and would recommend first making an index on s(stringtext) as SQLite might not be smart enough to build a temporary index. And then a VACUUMing would be in order.
Untested.

Hive: COUNT features requires GROUP BY when using HAVING, work around?

I'm curious if there is a workaround for excluding a field in the 'group by' statement in Hive?
select g.country, count(*) as road_count
from geography g
join g_street gs on (g.id=gs.id)
group by g.iso_country_code, g.virtual
having (g.virtual='f' or g.virtual is null)
;
I do not want the 'g.virtual' in the group by statement because my result should be grouped by country only. Hive requires the 'g.virtual' in the group by statement.
Thanks in advance!
I am not sure about what you are trying to achieve with the query. Since I see fields in select which don't appear in the group by statement. The only suggestion that I can give is if you plan to put a restriction on geography table then you can place a where clause before joining it with g_street and then group by on the required fields.
Here is an example :
select g.iso_country_code, count(*)
from geography g
where g.virtual='f' or g.virtual is null
join g_street gs on (g.id=gs.id)
group_by g.iso_country_code

How to create a view that returns a 2x2 (or NxN) matrix of results

So I know enough SQL just to be really dangerous (I don't normally work the back-end) but cannot get the following view to be created successfully ;) The result set I'm after is a data set that has rows assigned as a column alias from multiple tables (instead of a 1xN flat of all columns). There is a many-to-one relationship when looking at the main table, based on foreign keys associated to the row id of the appropriate related table.
Ideally I'd like a data set that looks like this in the return:
dataset.transaction_row[n]: col1, col2, col3, coln... (columns from the transaction table)
dataset.category_row[n]: col1, co2, col3, coln... (columns from the category table)
and so on...
I get the following error:
Query Error: near "AS": syntax error Unable to execute statement
From:
CREATE VIEW view_unreconciled_transactions
AS SELECT account_transaction.* AS transaction_row,
category.* AS category_row,
memorized.name_rule_replace OR account_transaction.name AS payee
FROM account_transaction
LEFT JOIN memorized ON account_transaction.memorized_key = memorized.id
LEFT JOIN category ON account_transaction.category_key = category.id
WHERE status != 2
ORDER BY account_transaction.dt_posted DESC
It seems easy enough since the result-column selector is repeatable which includes expressions (referencing sqlite's syntax diagrams). In reference to the error, I'm assuming it's complaining about the 2nd 'AS' where I'm trying to get table.* assigned as an alias. Any help in the right direction is appreciated. If I had to, I suppose I could explicitly state all columns but that feels like a kludge.
The AS modifier can only be applied to a single column, not to a collection such as the * you used. You will have to break them out into specific names, (which is best practice IMHO anyway)
It looks like you want to make a "pivot table". They can be tricky to make in a database. I can say that if you a data result, where each row comes from a different table source, and the columns form each table are IDENTICAL, then you could try using a UNION statement to join the different results together like they are just one dataset.
NOTE that the columns all take their naming cue from the first dataset in a UNION and the datatype all need to be the same.

sqlite subqueries with group_concat as columns in select statements

I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w

Resources