SELECT DISTINCT and GROUP BY returning duplicates - sqlite

I'm trying to select a distinct HourlyRate from a table, and then group the resulting HourlyRate by a FECode (basically a person). One person may have 2 or 3 rates over time, but the results that are returning involve the same HourlyRate being repeated for the same FECode.
SELECT DISTINCT Cost/Hours As HourlyRate, Date, FECode
FROM Table1
WHERE HourlyRate != ''
GROUP BY HourlyRate, FECode
ORDER BY FECode
The result looks like as follows:
HourlyRate, Date, FECode
215.00, 2017-04-06, AAA
215.00, 2017-04-27, AAA
225.00, 2017-06-16, AAA
The data from Table1 is as follows:-
Date, FECode, Cost, Hours
2017-04-06, AAA, 236.5, 1.1
2017-04-27, AAA, 43, 0.2
2017-06-16, AAA, 247.5, 1.1
Clearly, in this example, the second result of 215.00 should not be returning, but it is. How do I stop this from happening?

The result is ok because DISTINCT remove the line which match on "full set of columns". The Cost/Hours is number which is divide and the result looks like round number (but the number is not the same), therefore it did not match as the same number. try use this, and do not forget the remove date column:
SELECT cast(Cost/Hours as text) As HourlyRate, FECode
FROM Table1
WHERE HourlyRate != ''
ORDER BY FECode

These two values are not equal:
SELECT 236.5/1.1 = 43/0.2;
0
There actually is a difference:
SELECT 236.5/1.1 - 43/0.2;
-2.8421709430404e-14
See Is floating point math broken?
You have to round the result.
(And using the column Date with this GROUP BY does not make sense.)

The following query returns the expected result:-
SELECT ROUND(Cost/Hours, 2) As HourlyRate, Date, FECode FROM Table1 WHERE HourlyRate!= '' GROUP BY FECode, HourlyRate ORDER BY FECode ASC

Related

Transpose rows to columns in SQLite

I have data like this:
I am trying to transform it to this (using SQLite). In the desired result, within each id, each start should be on the same row as the chronologically closest end. If an id has a start but no end (like id=4), then the corresponding end, will be empty (as shown below).
I have tried this
select
id,
max( case when start_end = "start" then date end) as start,
max(case when start_end = "end" then date end ) as end
from df
group by id
But the result is this, which is wrong because id=5 only have one row, when it should have two:
id start end
1 2 1994-05-01 1996-11-04
2 4 1979-07-18 <NA>
3 5 2010-10-01 2012-10-06
Any help is much appreciated
CREATE TABLE mytable(
id INTEGER NOT NULL PRIMARY KEY
,start_end VARCHAR(5) NOT NULL
,date DATE NOT NULL
);
INSERT INTO mytable(id,start_end,date) VALUES (2,'start','1994-05-01');
INSERT INTO mytable(id,start_end,date) VALUES (2,'end','1996-11-04');
INSERT INTO mytable(id,start_end,date) VALUES (4,'start','1979-07-18');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2005-02-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2009-09-17');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2010-10-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2012-10-06');
select
s.id as id,
s.date as 'start',
min(e.date) as 'end' -- earliest end date from "same id&start"
from
-- only start dates
(select id, date
from intable
where start_end='start'
) as s
left join -- keep the start-only lines
-- only end dates
(select id, date
from intable
where start_end='end'
) as e
on s.id = e.id
and s.date < e.date -- not too early
group by s.id, s.date -- "same id&start"
order by s.id, s.date; -- ensure sequence
Left join (to keep the start-only line for id "4") two on-the-fly tables, start dates and end dates.
Take the minimal end date which is just higher than start date (same id, using min()and group by.
Order by id, then start date.
I tested this on a test table which is similar to your dump, but has no "NOT NULL" and no "PRIMARY KEY". I guess for this test table that is irrelevant; otherwise explain the effect, please.
Note:
Internally three pairs of dates for id 5 (those that match end>start) are found, but only those are forwarded with the lowest end (min(end)) for each of the two different combinations of ID and start group by ID, start. The line where end>start but end not being the minimum is therefor not returned. That makes two lines with start/end pairs as desired.
Output (with .headers on):
id|start|end
2|1994-05-01|1996-11-04
4|1979-07-18|
5|2005-02-01|2009-09-17
5|2010-10-01|2012-10-06
UPDATE: Incorporate helpful comments by #MatBailie.
Thank you! This is exactly what I needed to do, only with a few changes:
SELECT
s.value AS 'url',
"AVGDATE" AS 'fieldname',
sum(e.value)/count(*) AS 'value'
FROM
(SELECT url, value
FROM quicktag
WHERE fieldname='NAME'
) AS s
LEFT JOIN
(SELECT url, substr(value,1,4) AS value
FROM quicktag
WHERE fieldname='DATE'
) AS e
ON s.url = e.url
WHERE e.value != ""
GROUP BY s.value;
I had a table like this:
url fieldname value
---------- ---------- ----------
1000052801 NAME Thomas
1000052801 DATE 2007
1000131579 NAME Morten
1000131579 DATE 2005
1000131929 NAME Tanja
1000131929 DATE 2014
1000158449 NAME Knud
1000158449 DATE 2007
1000158450 NAME Thomas
1000158450 DATE 2003
I needed to correlate NAME and DATE in columns based on url as a key, and generate a field with average DATE grouped by multiple NAME fields.
So my result looks like this:
url fieldname value
---------- ---------- ----------
Thomas AVGDATE 2005
Morten AVGDATE 2005
Tanja AVGDATE 2014
Knud AVGDATE 2007
Unfortunately I not have enough posts to make my vote count yet.

sort semicolon separated values per row in a column

I want to sort semicolon separated values per row in a column. Eg.
Input:
abc;pqr;def;mno
xyz;pqr;abc
abc
xyz;jkl
Output:
abc;def;mno;pqr
abc;pqr;xyz
abc
jkl;xyz
Can anyone help?
Perhaps something like this. Breaking it down:
First we need to break up the strings into their component tokens, and then reassemble them, using LISTAGG(), while ordering them alphabetically.
There are many ways to break up a symbol-separated string. Here I demonstrate the use of a hierarchical query. It requires that the input strings be uniquely distinguished from each other. Since the exact same semicolon-separated string may appear more than once, and since there is no info from the OP about any other unique column in the table, I create a unique identifier (using ROW_NUMBER()) in the most deeply nested subquery. Then I run the hierarchical query to break up the inputs and then reassemble them in the outermost SELECT.
with
test_data as (
select 'abc;pqr;def;mno' as str from dual union all
select 'xyz;pqr;abc' from dual union all
select 'abc' from dual union all
select 'xyz;jkl' from dual
)
-- End of test data (not part of the solution!)
-- SQL query begins BELOW THIS LINE.
select str,
listagg(token, ';') within group (order by token) as sorted_str
from (
select rn, str,
regexp_substr(str, '([^;]*)(;|$)', 1, level, null, 1) as token
from (
select str, row_number() over (order by null) as rn
from test_data
)
connect by level <= length(str) - length(replace(str, ';')) + 1
and prior rn = rn
and prior sys_guid() is not null
)
group by rn, str
;
STR SORTED_STR
--------------- ---------------
abc;pqr;def;mno abc;def;mno;pqr
xyz;pqr;abc abc;pqr;xyz
abc abc
xyz;jkl jkl;xyz
4 rows selected.

Teradata - Cannot nest aggregate operations

The PROD_AMT I'd like to get is when ACCT_NBR, PROD_NBR And PROD_AMT are the same, I only need one PROD_AMT which is 100 (from distinct), and when ACCT_NBR are the same but PROD_NBR are different, then the PROD_AMT I need is 90 (30+60)
SELECT ACCT_NBR
,COUNT(DISTINCT CASE WHEN PROD_NBR = 1 THEN SUM(DISTINCT PROD_AMT)
WHEN PROD_NBR > 1 THEN SUM(PROD_AMT)
END) AS AMT
FROM TABLE
ACCT_NBR PROD_NBR PROD_AMT
3007 001 30
3007 002 60
1000 003 100
1000 003 100
There's probably a few ways to solve this. Using a subquery to determine which records should be summed vs which ones should be distinct, you could use:
SELECT
acct_nbr,
CASE WHEN sumflag = 'X' THEN SUM(prod_amt) ELSE MAX(prod_amt) END as amt
FROM
(
SELECT
acct_nbr,
prod_nbr,
prod_amt,
CASE WHEN COUNT(*) OVER (PARTITION BY Acct_nbr, prod_nbr, prod_amt) = 1 THEN 'X' ELSE NULL END AS sumflag
FROM
table
)t1
GROUP BY acct_nbr, sumflag
I'm just using MAX() here since it doesn't matter... all the values that will be aggregated with max() we know are duplicates, so it's a wash.
You could get similar results with a UNION query where one query would do the summing in the event that the records are distinct, and the other would just return distinct prod_amt's where the records are duplicates.
While the above example is nice if you truly have different aggregation needs depending on complex logic, for your question there's a simpler way of doing the same thing that doesn't use window functions:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
table
)t1
GROUP BY 1
If you need to adapt this to a complex statement you could just sling your complex statement in as subquery where table is above like:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
(
YOUR REALLY COMPLEX QUERY GOES IN HERE
)t2
)t1
GROUP BY 1

How to SELECT values not found in a table?

I would like to determine particular IDs that are not present in a table.
For example, I have the IDs 1, 2 and 3 and want to know if they exist in the table.
Essentially this would boil down to:
SELECT id FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
)
WHERE
NOT EXISTS (SELECT * FROM table WHERE table.id = id)
Suppose table had the IDs 1 and 4, then this would yield 2 and 3.
Are there more elegant / concise / faster ways to get those IDs in SQLite ?
The compound SELECT operator EXCEPT allows you to do something similar to NOT EXISTS:
SELECT 1 AS id UNION ALL
SELECT 2 UNION ALL
SELECT 3
EXCEPT
SELECT id FROM MyTable
Beginning with SQLite 3.8.3, you can use VALUES everywhere you could use SELECT, but this is just a different syntax:
VALUES (1),
(2),
(3)
EXCEPT
SELECT id FROM MyTable

SQLITE query, if last row matches criteria, check row preceding it matches different criteria

I'm finding it hard to get my head around this problem, and I couldn't find any answers to this specific problem anywhere:
Say I have a table like this, I'm just using fruit as an example:
Fruit | Date | Value
=================================
Apple | 1 | other_random_value
Apple | 2 | some_value_1
Apple | 3 | some_value_2
Pear | 1 | other_random_value
Pear | 2 | unexpected_value_1
Pear | 3 | some_value_2
Everything will be ordered by Fruit, then Date.
Basically, if the last row (for each fruit) is some_value_2, but the one preceding it is not some_value_1, I want to match just those fruits (i.e. in this case, Pear).
So, some_value_2 I always expect to come after a row with a certain value for that particular fruit, and if it doesn't I want to flag errors against those particular fruits. It would also be nice to match cases where nothing precedes some_value_2 as well, though if this is too complicated I could match it seperately and just check that some_value_2 is not the first row, which I don't imagine would be a difficult query.
EDIT: Also, being able to match any consecutive rows where the preceding value is unexpected would be nice, though I mainly care about the last 2 rows. So if being able to match all consecutive rows results in a simpler and better performing query, then I might go with that. I'm going to be doing an INSERT at the same time (into an alert table), so if I could flag it as an ERROR if it's the last two rows and a WARNING if it's not, that would be really nifty. Though I wouldn't know where to start with writing a query that does that. Also having a query that performs well is a must, as I will be using this across a large dataset.
EDIT:
This is what I used in the end, it's quite slow, but if I index Date, it's not so bad:
SELECT c.Id AS CId, c.Fruit AS CFruit,
c.Date AS CDate, c.Value AS CValue,
(SELECT Id
FROM fruits
WHERE Fruit = c.Fruit
AND Date >= c.Date
AND Id > c.Id
ORDER BY Date, Id) AS NId, n.Fruit AS NFruit,
n.Date AS NDate, n.Value AS NValue
FROM fruits AS c
JOIN fruits AS n ON n.Id = NId
ORDER BY c.Date, c.Id
I might try Joachim's method again at some point, as I realised I'm getting a lot of results I don't really care much about. Or I might even try incorporating the two somehow and delegate to INFO/ERROR as appropriate...
Solved: I used the same SELECT statement that I used to get NId, and used SELECT COUNT(*) instead of SELECT Id. This told me the number of results after the current one. Then I just used a CASE operator to turn it into a boolean field called Latest :). So I effectively combined Nicolas' and Joachim's methods. Performance still seems OK, probably because SQLite caches the results.
SQLite is (as far as I know) a bit low on efficient operators for this, so this is the best I can come up with for now :)
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 1
AND fruits.value <> 'some_value_1'
INTERSECT
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 0
AND fruits.value = 'some_value_2'
An SQLfiddle to test with.
I named the table fruits. This query gets you the preceding date for a ‘key‘ (fruit + date)
select fruit, date, value currvalue,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c ;
From there we can get the precedent value for each key
select f1.*, precedingdate, f2.value precedingvalue
from
fruits f1 join
(select fruit, date, value,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c) f2
on f1.fruit = f2.fruit and f1.date = precedingdate ;
For all the rows that have a previous row, you get both the current and preceding date and the current and preceding value.
Edit : we add an id used to choose when there are several identical previous date (see comment below)
I will be using intermediate views for the sake of clarity but you could write one big query.
As before, what's the previous date :
create view VFruitsWithPreviousDate
as select fruit, date, value, id,
(select max(date)
from fruits p
where p.fruit = c.fruit
and p.date < c.date) previousdate
from fruits c ;
What's the previous id :
create view VFruitsWithPreviousId
as select fruit, date, value,
(select max(id)
from fruits f
where v.fruit = f.fruit AND
v.previousdate = f.date) previousID
from VFruitsWithPreviousDate v ;
A query for all consecutive rows :
select f.*, v.value
from fruits f
join VFruitsWithPreviousId v on f.id = v.previousid ;
You can then add the condition WHERE f.Value = 'some_value_2' AND v.value != 'some_value_1'

Resources