Remove all the rows except one with the EXCEPT SQLite command

Remove all the rows except one with the EXCEPT SQLite command - sqlite

From a dataset character that has a name column, I want to query the two names in with the shortest and longest names, as well as their respective lengths and when there is more than one smallest or largest name, I choose the one that comes first when ordered alphabetically.
With that query, I get all the shortest and longest names (A)
SELECT
name, LENGTH(name) AS LEN
FROM
character
WHERE
length(name) = (SELECT MAX(LENGTH(name)) FROM character)
OR length(name) = (SELECT MIN(LENGTH(name)) FROM character)
With this one, I get all the shortest names except the first alphabetically ordered one (B)
SELECT
name, LENGTH(name) AS LEN
FROM
character
WHERE
length(name) = (SELECT MIN(LENGTH(name)) FROM character)
ORDER BY
name DESC
LIMIT 10 OFFSET 2;
When I try to remove B from A
A EXCEPT B
I would expect to keep the first shortest name but It does not appear.

I would use ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY LENGTH(name), name) rn_min,
ROW_NUMBER() OVER (ORDER BY LENGTH(name) DESC, name) rn_max
FROM character
)
SELECT name, LENGTH(name) AS LEN
FROM cte
WHERE 1 IN (rn_min, rn_max)
ORDER BY LENGTH(name);

When you set OFFSET 2 in your B query, you don't get:
all the shortest names except the first 1 alphabetically ordered
Instead you get:
all the shortest names except the first 2 alphabetically ordered,
because this is what OFFSET 2 does: it skips the first 2 rows.
Also another problem with your code is the ORDER BY clause in your B query.
If you have this:
SELECT name,LENGTH(name) AS LEN FROM character
WHERE length(name) = (select max( LENGTH(name)) from character )
or length(name) = (select min( LENGTH(name)) from character)
EXCEPT
SELECT name,LENGTH(name) AS LEN FROM character
WHERE length(name) = (select min( LENGTH(name)) from character)
ORDER BY name desc LIMIT 10 OFFSET 2;
you may think that the ORDER BY clause (and LIMIT and OFFSET) is applied only to your B query, but this is not how it is interpreted.
Actually ORDER BY (and LIMIT and OFFSET) is applied to the whole query after the rows are returned.
To get the results that you want by using code similar to yours you must use a subquery to wrap your B query, like this:
SELECT name,LENGTH(name) AS LEN FROM character
WHERE length(name) = (select max( LENGTH(name)) from character )
or length(name) = (select min( LENGTH(name)) from character)
EXCEPT
SELECT * FROM (
SELECT name,LENGTH(name) AS LEN FROM character
WHERE length(name) = (select min( LENGTH(name)) from character)
ORDER BY name desc LIMIT 10 OFFSET 1
)

Related

Select specified row from multiple rows returned by select

I have a select statement which returns multiple rows. I need to select a particular one of these rows. I would like to write something like
SELECT * FROM ( SELECT * FROM table WHERE x IS y ) WHERE row_in_selected_rows IS n;
Is this possible?
Note that I cannot use the rowid from the original table, because i have the index of the required row in the rows returned from the first select statement, not the index in the original table.

You can use ROW_NUMBER() window function:
select col1, col2, ....
from (
SELECT *, row_number() over() rn
FROM table
WHERE x IS y
) t
where t.rn = 4
but this way the order of the rows returned from the query is not defined, so the 4th row could be any row.
The correct way would be:
select col1, col2, ....
from (
SELECT *, row_number() over(order by somecolumn) rn
FROM table
WHERE x IS y
) t
where t.rn = 4
You could also use OFFSET with LIMIT:
SELECT * FROM ( SELECT * FROM table WHERE x IS y )
LIMIT 1 OFFSET 3;
which skips the first 3 rows and returns only the 4th, but again LIMIT and OFFSET should be used with ORDER BY, like:
SELECT * FROM ( SELECT * FROM table WHERE x IS y )
ORDER BY somecolumn
LIMIT 1 OFFSET 3;

order of search for Sqlite's "IN" operator guaranteed?

I'm performing an Sqlite3 query similar to
SELECT * FROM nodes WHERE name IN ('name1', 'name2', 'name3', ...) LIMIT 1
Am I guaranteed that it will search for name1 first, name2 second, etc? Such that by limiting my output to 1 I know that I found the first hit according to my ordering of items in the IN clause?
Update: with some testing it seems to always return the first hit in the index regardless of the IN order. It's using the order of the index on name. Is there some way to enforce the search order?

The order of the returned rows is not guaranteed to match the order of the items inside the parenthesis after IN.
What you can do is use ORDER BY in your statement with the use of the function INSTR():
SELECT * FROM nodes
WHERE name IN ('name1', 'name2', 'name3')
ORDER BY INSTR(',name1,name2,name3,', ',' || name || ',')
LIMIT 1
This code uses the same list from the IN clause as a string, where the items are in the same order, concatenated and separated by commas, assuming that the items do not contain commas.
This way the results are ordered by their position in the list and then LIMIT 1 will return the 1st of them which is closer to the start of the list.
Another way to achieve the same results is by using a CTE which returns the list along with an Id which serves as the desired ordering of the results, which will be joined to the table:
WITH list(id, item) AS (
SELECT 1, 'name1' UNION ALL
SELECT 2, 'name2' UNION ALL
SELECT 3, 'name3'
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
Or:
WITH list(id, item) AS (
SELECT * FROM (VALUES
(1, 'name1'), (2, 'name2'), (3, 'name3')
)
)
SELECT n.*
FROM nodes n INNER JOIN list l
ON l.item = n.name
ORDER BY l.id
LIMIT 1
This way you don't have to repeat the list twice.

Cannot replace a string with several random strings taken from another table in sqlite

I'm trying to replace a placeholder string inside a selection of 10 random records with a random string (a name) taken from another table, using only sqlite statements.
i've done a subquery in order to replace() of the placeholder with the results of a subquery. I thought that each subquery loaded a random name from the names table, but i've found that it's not the case and each placeholder is replaced with the same string.
select id, (replace (snippet, "%NAME%", (select
name from names
where gender = "male"
) )
) as snippet
from imagedata
where timestamp is not NULL
order by random()
limit 10
I was expecting for each row of the SELECT to have different random replacement every time the subquery is invoked.
hello i'm %NAME% and this is my house
This is the car of %NAME%, let me know what you think
instead each row has the same kind of replacement:
hello i'm david and this is my house
This is the car of david, let me know what you think
and so on...
I'm not sure it can be done inside sqlite or if i have to do it in php over two different database queries.
Thanks in advance!

Seems that random() in the subquery is only evaluated once.
Try this:
select
i.id,
replace(i.snippet, '%NAME%', n.name) snippet
from (
select
id,
snippet,
abs(random()) % (select count(*) from names where gender = 'male') + 1 num
from imagedata
where timestamp is not NULL
order by random() limit 10
) i inner join (
select
n.name,
(select count(*) from names where name < n.name and gender = 'male') + 1 num
from names n
where gender = 'male'
) n on n.num = i.num

How can I accumulate values from rows on a per closest date basis to a list of dates as a parameter according the parameter dates?

There is table that has a date and cnt column e.g.
timestamp cnt
------------------
1547015021 14
1547024080 2
This table can be created using :-
DROP TABLE IF EXISTS roundit_base;
CREATE TABLE IF NOT EXISTS roundit_base (timestamp INTEGER, cnt INTEGER);
INSERT INTO roundit_base VALUES (1547015021,14),(1547024080,2);
The result should be the sum of the cnt column of rows that are the closest timestamp to a list of supplied timestamps, e.g. the supplied data could be
1546905600 - 0
1546992000 - 0
1547078400 - 0
...
The result should be along the lines of
1546905600 - 0
1546992000 - 14
1547078400 - 2
That is two columns:-
the timestamp from the list of supplied timestamps, that the respective rows from the database are closest to and
the sum of the cnt column those rows on a per supplied timestamp

Although the results are different from the expected results in that the calculations used places both 1547015021 and 1547024080 as being closest to the suplied timestamp of 1546992000;
The following could be the basis of an SQLite based solution :-
WITH
-- The supplied list of timestamps
v (cv,dflt) AS (
VALUES (1546905600,0),(1546992000,0),(1547078400,0)
),
-- Join the two sets calculating the difference
cte1 AS (
SELECT *, abs(cv - timestamp) AS diff FROM roundit_base INNER JOIN v
),
-- Find the closest (smallest difference) for each timestamp
cte2 AS (
SELECT *, min(diff) FROM cte1 GROUP BY timestamp
)
-- For each compartive value sum the counts allocated/assigned (timestamps) to that
SELECT cv,
CASE
WHEN
(SELECT sum(cnt) FROM cte2 WHERE cv = v.cv) IS NOT NULL
THEN
(SELECT sum(cnt) FROM cte2 WHERE cv = v.cv)
ELSE 0
END AS cnt
FROM v;
;
The above results in :-

sort semicolon separated values per row in a column

I want to sort semicolon separated values per row in a column. Eg.
Input:
abc;pqr;def;mno
xyz;pqr;abc
abc
xyz;jkl
Output:
abc;def;mno;pqr
abc;pqr;xyz
abc
jkl;xyz
Can anyone help?

Perhaps something like this. Breaking it down:
First we need to break up the strings into their component tokens, and then reassemble them, using LISTAGG(), while ordering them alphabetically.
There are many ways to break up a symbol-separated string. Here I demonstrate the use of a hierarchical query. It requires that the input strings be uniquely distinguished from each other. Since the exact same semicolon-separated string may appear more than once, and since there is no info from the OP about any other unique column in the table, I create a unique identifier (using ROW_NUMBER()) in the most deeply nested subquery. Then I run the hierarchical query to break up the inputs and then reassemble them in the outermost SELECT.
with
test_data as (
select 'abc;pqr;def;mno' as str from dual union all
select 'xyz;pqr;abc' from dual union all
select 'abc' from dual union all
select 'xyz;jkl' from dual
)
-- End of test data (not part of the solution!)
-- SQL query begins BELOW THIS LINE.
select str,
listagg(token, ';') within group (order by token) as sorted_str
from (
select rn, str,
regexp_substr(str, '([^;]*)(;|$)', 1, level, null, 1) as token
from (
select str, row_number() over (order by null) as rn
from test_data
)
connect by level <= length(str) - length(replace(str, ';')) + 1
and prior rn = rn
and prior sys_guid() is not null
)
group by rn, str
;
STR SORTED_STR
--------------- ---------------
abc;pqr;def;mno abc;def;mno;pqr
xyz;pqr;abc abc;pqr;xyz
abc abc
xyz;jkl jkl;xyz
4 rows selected.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove all the rows except one with the EXCEPT SQLite command - sqlite

I would use ROW_NUMBER here: WITH cte AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY LENGTH(name), name) rn_min, ROW_NUMBER() OVER (ORDER BY LENGTH(name) DESC, name) rn_max FROM character ) SELECT name, LENGTH(name) AS LEN FROM cte WHERE 1 IN (rn_min, rn_max) ORDER BY LENGTH(name);

Related

Select specified row from multiple rows returned by select

order of search for Sqlite's "IN" operator guaranteed?

Cannot replace a string with several random strings taken from another table in sqlite

How can I accumulate values from rows on a per closest date basis to a list of dates as a parameter according the parameter dates?

sort semicolon separated values per row in a column

Categories

Resources