I need to store the previous value in a column till there is a change and in case of change it would retain the new value
Example
Input
-------
ID Name Stdt EndDt
1 A 20/01/2019 20/02/2019
1 B 20/02/2019 20/03/2019
1 C 20/03/2019 15/05/2019
1 C 15/05/2019 16/05/2019
1 C 16/05/2019 19/06/2019
1 C 19/06/2019 15/07/2019
1 A 15/07/2019 NULL
Output
----------
ID Name Stdt EndDt Previous Name
1 A 20/01/2019 20/02/2019 NULL
1 B 20/02/2019 20/03/2019 A
1 C 20/03/2019 15/05/2019 B
1 C 15/05/2019 16/05/2019 B
1 C 16/05/2019 19/06/2019 B
1 C 19/06/2019 15/07/2019 B
1 A 15/07/2019 NULL C
Tried preceding and self joins but those are limited to know number of changes (like name can remain constant for N times) but need more dynamic
You need nested Window functions:
SELECT ...
-- assignt the previous value to all following rows
Last_Value(CASE WHEN prev_name <> NAME THEN prev_name END IGNORE NULLS)
Over (PARTITION BY id
ORDER BY stdt, enddt) as previous_name
FROM
(
SELECT ...
-- flag the changed row
Lag(NAME) Over (PARTITION BY id ORDER BY stdt, enddt) AS prev_name
-- pre-TD 16.10
-- MIN(NAME)
-- Over (PARTITION BY id ORDER BY stdt, enddt
-- ROWS BETWEEN 1 Preceding AND 1 Preceding) AS prev_name
FROM mytable
) AS dt
thanks a lot for the solution it worked like a charm.
P.S : Sorry I was away on a vacation and the requirement changed a bit so didn't checked this solution provided.
Thanks again blessing us with your knowledge all the time :)
Regards
Anindya
Related
Im trying to check how many times two teams have played against each other, while each appearing once at home and once away.
In the next table, we can see that Team 1 played against Team 2 three times, twice away and once at home, and Team 3 played against Team 4 twice, once away and once at home.
how can I do it using sqlite?
id_home
id_away
date
1
2
2/12
2
1
3/12
3
4
4/12
4
3
5/12
2
1
6/12
You can group by sorted pairings of the id_home and id_away teams:
with matches as (select distinct t1.id_home h, (select t2.id_away from teams t2 where t2.id_home = t1.id_home) a from teams t1), m1 as (select case when h < a then h else a end a1, case when a < h then h else a end a2 from matches)
select m1.a1 team1, m1.a2 team2, (select sum(m1.a1 = t3.id_home) from teams t3) home, (select sum(m1.a1 = t3.id_away) from teams t3) away from m1 group by m1.a1, m1.a2;
Output:
team1 | team2 | home | away
1 2 1 2
3 4 1 1
I have a table of the form
select rowid,* from t;
rowid f
----- -----
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
7 ggg
8 aaa
9 bbb
10 ccc
11 ddd
12 eee
13 fff
14 ggg
Id like to select n row before and m row after a given row match, i.e for instance for rows that match f='ccc' with n=m=1 I'd like to get
2 bbb
3 ccc
4 ddd
9 bbb
10 ccc
11 ddd
The rowid is sequential in my setup so I guess we can play with it. I tried thing along the line of
select rowid,f from t where rowid between
(select rowid-1 from t where f='ccc') and
(select rowid+1 from t where f='ccc');
rowid f
----- -----
2 bbb
3 ccc
4 ddd
But the result is obviously wrong I got only the 1st occurence of the 'ccc' match. I guess I got to join or may be recursive cte, but I am affraid it is beyound my knowlegde so far :) Thanx in advance.
A scalar subquery can return only a single value.
You could do two self joins, but it would be simpler to use set operations:
SELECT * FROM t
WHERE rowid IN (SELECT rowid - 1 FROM t WHERE f = 'ccc'
UNION ALL
SELECT rowid FROM t WHERE f = 'ccc'
UNION ALL
SELECT rowid + 1 FROM t WHERE f = 'ccc');
Larger values of n and m require more subqueries.
If there are too many, you can use a join:
SELECT *
FROM t
WHERE rowid IN (SELECT t.rowid
FROM t
JOIN (SELECT rowid - ? AS n,
rowid + ? AS m
FROM t
WHERE f = 'ccc'
) AS ranges
ON t.rowid BETWEEN ranges.n AND ranges.m);
I come with a solution that is not optimal I think but I am not able to simplify (remove) the temp (intermediate) table.
select rowid,f from t;
rowid f
----- -----
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
7 ggg
8 aaa
9 bbb
10 ccc
11 ddd
12 eee
13 fff
14 ggg
create table u as
select t2.rowid x,t1.rowid+2 y from t t1 // +2 ==> 2 rows after 'ccc'
join t t2 on t1.rowid=t2.rowid+1 // +1 ==> 1 row before 'ccc'
where t1.f='ccc';
select * from u;
x y
----- -----
2 5
9 12
select t.rowid,t.f from t inner
join u on t.rowid>=u.x and t.rowid<=u.y'
rowid f
----- -----
2 bbb 1 before
3 ccc <== match
4 ddd 2 after
5 eee
9 bbb 1 before
10 ccc <== match
11 ddd 2 after
12 eee
I think I am set with what I need, but optimisations are welome :)
I might be overlooking something, but the provided solutions that suggest adding/subtracting from values of the rowid column could be improved upon. They will face issues should rowid ever be missing a value (Which I'm aware was stated to never be the case in the top post, but in general is an assumption that's often not true).
By using sqlite's row_number() you can have a solution that circumvents that problem and also can be used to fetch the entries "around" your row matches based on any arbitrary order you want, not just based on rowid.
Together with Common Table Expressions this can even be made somewhat readable, though should you have a larger amount of row-matches this will still be a slow query.
What you'll conceptually be doing is:
Do a pre-select on your table (here cte_t) to get all possible values that could be a valid hit and attach a row-number to each entry
Do a select on that pre-select to fetch the specific row that you actually want and get only its row-number (here targetRows)
"join" the two by pretty much just multiplying the two tables generated in 1) and 2).
Now you can easily select for all entries whose row-number is in a specific range around the target's row-number using ABS
WITH
cte_t AS (
SELECT *, row_number() OVER (ORDER BY t.rowid) AS rownum
FROM t
-- If you can make this cte smaller by removing all entries that can't possibly be the solution with the appropriate WHERE clause, you can make the entire query substantially faster
),
targetRows AS(
SELECT rownum AS targetRowNum
FROM cte_t
WHERE f = 'ccc' -- This should be the WHERE condition that defines the entries that match your query exactly and for which you want to get the entries around them
)
SELECT cte_t.rowid, cte_t.f
FROM cte_t, targetRows -- This is basically multiplying both tables with one another, this part will be horribly slow if targetRows gets larger
WHERE ABS(cte_t.rownum - targetRows.targetRowNum) <= 1; --Get all entries in targetRows as well as those whose rownum is 1 larger or 1 lower than the rownum of a targetRow
This will return
rowid f
2 bbb
3 ccc
4 ddd
9 bbb
10 ccc
11 ddd
Here a good resource about this.
I got a table like this
a b c
-- -- --
1 1 10
2 1 0
3 1 0
4 4 20
5 4 0
6 4 0
The b column 'points' to 'a', a bit like if a is the parent.
c was computed. Now I need to propagate the parent c value to their children.
The result would be
a b c
-- -- --
1 1 10
2 1 10
3 1 10
4 4 20
5 4 20
6 4 20
I can't make an UPDATE/SELECT combo that works
So far I got a SELECT that procuce the c column I'd like to get
select t1.c from t t1 join t t2 on t1.a=t2.b;
c
----------
10
10
10
20
20
20
But I dunno how to stuff that into c
Thanx in advance
Cheers, phi
You have to look up the value with a correlated subquery:
UPDATE t
SET c = (SELECT c
FROM t AS parent
WHERE parent.a = t.b)
WHERE c = 0;
I finnally found a way to copy back my initial 'temp' SELECT JOIN to table 't'. Something like this
create temp table u as select t1.c from t t1 join t t2 on t1.a=t2.b;
update t set c=(select * from u where rowid=t.rowid);
I'd like to know how the 2 solutions, yours with 1 query UPDATE correlated SELECT, and mine that is 2 queries and 1 correlated query each, compare perf wise. Mine seems more heavier, and less aesthetic, yet regarding perf I wonder.
On the Algo side, yours take care not to copy the parent data, only copy child data, mine copy parent on itself, but that's a nop, yet consuming some cycles :)
Cheers, Phi
Hopefully a straight forward answer, but how can I determine if a variable has more then one observation that equals the maximum? For example, if the following is my data:
ID Doctor COUNT
576434 Tim 1
576434 Lynn 1
576434 Moran 1
576434 Wade 2
576434 Ashwin 2
Looking at the variable "COUNT", we can see that two observations equal the maximum value 2. I would then want to flag this "ID" as having a tie. The data is already in order by ID then COUNT, so my thought was maybe to just see if the values in the last two rows equal, but hoping for a better way. Thanks!
Ok, take a look at this strange solution...
First, your data (with a few more lines to test it out)
data have;
format
ID 8.
Doctor $10.
COUNT 8.;
input
#1 ID 8.
#8 Doctor $7.
#16 COUNT 8.;
DATALINES;
576434 Tim 1
576434 Lynn 1
576434 Moran 1
576434 Wade 2
576434 Ashwin 2
111111 AAAAAA 1
111111 BBBBBB 2
111111 CCCCCC 3
111111 DDDDDD 3
111111 EEEEEE 3
222222 ZZZZZZ 1
222222 WWWWWW 2
;
RUN;
proc sort data=have;
by ID;
run;
Create a dataset to help us, selecting only those max values which count > 1 by ID
proc sql;
CREATE TABLE AUX AS
SELECT
ID
,MAX(COUNT) AS AUXCOUNT
FROM (
SELECT
ID
,COUNT
,COUNT(*) AS COUNTOBS
FROM HAVE
GROUP BY 1,2
HAVING CALCULATED COUNTOBS > 1
)
GROUP BY 1
ORDER BY 1;
quit;
Merge with the first data set to flag the data when there's a tie:
data want;
merge
have (in=a)
aux (in=b);
by ID;
if count = auxcount then TIEFLAG = "TIE";
else TIEFLAG = "";
drop auxcount;
run;
RESULT
ID DOCTOR COUNT FLAG
111111 AAAAAA 1
111111 BBBBBB 2
111111 CCCCCC 3 TIE
111111 DDDDDD 3 TIE
111111 EEEEEE 3 TIE
222222 ZZZZZZ 1
222222 WWWWWW 2
576434 Tim 1
576434 Lynn 1
576434 Moran 1
576434 Wade 2 TIE
576434 Ashwin 2 TIE
Let's assume your dataset has additional rows for example's sake:
data have;
input ID Doctor$ Count;
datalines;
576434 Tim 1
576434 Lynn 1
576434 Moran 1
576434 Wade 2
576434 Ashwin 2
576435 Barry 8
576435 Jim 10
576435 Bart 10
576391 Tom 1
576391 Bill 2
run;
Step 1: Sort your dataset by ID, descending Count
proc sort data=have;
by ID descending count;
run;
We now have the original dataset put in an order that we can work with. Next, we we'll remove duplicate max entries.
Step 2: Remove duplicates for ID descending Count
proc sort data=have
out=_temp_
dupout=dupes
nodupkey;
by ID descending count;
run;
We don't care about the output dataset from proc sort, but we do care about the dupout dataset.
Dupes
ID Doctor Count
576434 Ashwin 2 <---- Duplicate max
576434 Lynn 1
576434 Moran 1
576435 Bart 10 <---- Duplicate max
Step 3: Pick out the duplicates
Notice that the start of each ID group is the max duplicate value. Since it's sorted by ID, then count in descending order, the very first entry of each ID group in the dupout dataset will get us every instance of a duplicate max. Because everything is sorted thanks to proc sort, this trick will work without errors.
data dupe_max;
set dupes;
by ID descending count;
if(first.ID);
keep ID;
run;
Step 4: Merge these IDs back with the original sorted dataset
data want;
merge dupes(in=dupmax)
have(in=all);
by ID;
Duplicate_Max = (have=dupes);
run;
Two things are going on here:
The in= option allows us to create a Boolean variable that tells us which dataset the observation is coming from. In other words, if the ID exists in dupes, the variable dupmax = 1. If the ID exists in have, the variable all = 1.
variable = (logic here) is a shorthand way of creating Boolean 1/0 variables. You can get the same result by doing:
if(have=dupes) then Duplicate_Max = 1; else Duplicate_Max = 0;
Behind the scenes, this is what's happening:
ID from have ID from dupes
vvv vvvvvv
ID Doctor Count Duplicate_Max all dupmax Match?
576391 Bill 2 0 1 0 No
576391 Tom 1 0 1 0 No
576434 Wade 2 1 1 1 Yes
576434 Ashwin 2 1 1 1 Yes
576434 Tim 1 1 1 1 Yes
576434 Lynn 1 1 1 1 Yes
576434 Moran 1 1 1 1 Yes
576435 Jim 10 1 1 1 Yes
576435 Bart 10 1 1 1 Yes
576435 Barry 8 1 1 1 Yes
It's simpler to do it in SQL. You don't even need to sort. Assuming your source data set is "have" as either of the two other posters provided:
PROC SQL ;
CREATE TABLE dupemax AS
SELECT a.id, a.count, a.numobs
FROM (SELECT id, count, COUNT(*) AS numobs
FROM have
GROUP BY id, count
) a
%* gives the number of rows with each id/count combination ;
INNER JOIN
(SELECT id, MAX(count) AS count
FROM have
GROUP BY id
) b
%* finds the id/MAX(count) value ;
ON a.id EQ b.id AND a.count EQ b.count
%* so only getting the id and the MAX(count), not all "count" values ;
WHERE a.numobs GE 2
%* and of the id/MAX(count), only where there were multiple rows of the MAX(count) ;
;
QUIT ;
Hi I have the following table
buildingcode flatname flatdescription date
01 A 1 name 1 12-2012
01 B 2 name 1 12-2012
02 A 0 name 2 12-2012
01 A 1 name 1 11-2012
I want to display as follow
B 2 name 1
A 1 name 1
And to explain what I want to do:
display only buildingcode 01,
display once by flatname,
sort by flatname desc
What is the sqlite command to do this?
I try this but the order is wrong
'SELECT DISTINCT flatdescription, flatname, buildingcode FROM bill WHERE buildingcode = ? '
Please advice
SELECT flatname, flatdescription
FROM bill
WHERE buildingcode = '01'
ORDER BY flatname DESC