Compare rows in different column Teradata - teradata

I need compare value from 1 column with previous value from 2 column. For example, I have table:
id | create_date | end_date
1 | 2016-12-31 | 2017-01-25
2 | 2017-01-26 | 2017-05-21
3 | 2017-05-22 | 2017-08-26
4 | 2017-09-01 | 2017-09-02
I need to compare create_date for id = 2 with end_date for id = 1
and compare create_date for id = 3 with end_date for id = 2 etc.
Result: show me id which has create_date (id = n) <> end_date (id = n-1) + interval '1' day
Should I use lag() function? How I can compare it? Which function I should use and how?
Thank you

Teradata doesn't have lag/lead, but you can still get the same functionality:
select
id,
create_date,
end_date,
max(end_date) over (order by id between 1 preceding and 1 preceding) as prev_end_date
...
qualify
create_date <> prev_end_date + INTERVAL '1' day;

Related

SQLite: create equal date ranges and query data based on them?

I have datas in a table with schema:
Id INTEGER,
date DATETIME,
value REAL
id is primary key, and I have an index on date column to speed up querying values within a specific date range.
What should I do if I need N equal date ranges between specific start and end dates, and query aggregated datas for each date range?
For example:
Start date: 2015-01-01
End date: 2019-12-31
N: 5
In this case equal date intervals should be:
2015-01-01 ~ 2015-12-31
2016-01-01 ~ 2016-12-31
2017-01-01 ~ 2017-12-31
2018-01-01 ~ 2018-12-31
2019-01-01 ~ 2019-12-31
And the query should aggregate all values (AVG) in between those intervals, so I would like to have 5 total rows after the execution.
Maybe something with CTE?
There are 2 ways to do it.
They both use recursive ctes but return different results.
The 1st one with NTILE():
with
dates as (select '2015-01-01' mindate, '2019-12-31' maxdate),
alldates as (
select mindate date from dates
union all
select date(a.date, '1 day')
from alldates a cross join dates d
where a.date < d.maxdate
),
groups as (
select *, ntile(5) over (order by date) grp
from alldates
),
cte as (
select min(date) date1, max(date) date2
from groups
group by grp
)
select * from cte;
Results:
| date1 | date2 |
| ---------- | ---------- |
| 2015-01-01 | 2016-01-01 |
| 2016-01-02 | 2016-12-31 |
| 2017-01-01 | 2017-12-31 |
| 2018-01-01 | 2018-12-31 |
| 2019-01-01 | 2019-12-31 |
And the 2nd builds the groups with math:
with
dates as (select '2015-01-01' mindate, '2019-12-31' maxdate),
cte1 as (
select mindate date from dates
union all
select date(
c.date,
((strftime('%s', d.maxdate) - strftime('%s', d.mindate)) / 5) || ' second'
)
from cte1 c inner join dates d
on c.date < d.maxdate
),
cte2 as (
select date date1, lead(date) over (order by date) date2
from cte1
),
cte as (
select date1,
case
when date2 = (select maxdate from dates) then date2
else date(date2, '-1 day')
end date2
from cte2
where date2 is not null
)
select * from cte
Results:
| date1 | date2 |
| ---------- | ---------- |
| 2015-01-01 | 2015-12-31 |
| 2016-01-01 | 2016-12-30 |
| 2016-12-31 | 2017-12-30 |
| 2017-12-31 | 2018-12-30 |
| 2018-12-31 | 2019-12-31 |
In both cases you can get the averages by joining the table to the cte:
select c.date1, c.date2, avg(t.value) avg_value
from cte c inner join tablename t
on t.date between c.date1 and c.date2
group by c.date1, c.date2

How to get count of multiple distinct columns with one column as date

I have a table with columns as id,date,name
id date name
1 2019-08-01 00:00:00 abc
1 2019-08-01 00:00:00 def
2 2019-08-01 00:00:00 pqr
1 2019-08-31 00:00:00 def
I want to get the count of id for given month.
The expected result for count of id for month 8 must be 3
SELECT strftime('%Y/%m/%d', date) as vdate,count(DISTINCT vdate,id) AS totalcount FROM cardtable WHERE date BETWEEN date('" + $rootScope.mydate + "', 'start of month') AND date('" + $rootScope.mydate + "','start of month','+1 month','-1 day') group by vdate
Basically i want to count if id and date both are distinct.for example if there are 2 entries on date 2019-08-01 with same id than it should give count as 1,if there 3 entries on date 2019-08-01 in which 2 entries are with id 1 and 3rd entry with 2 than it should count 2 and when there are 2 entries with id 1 and on different date lets say 1 entry on 2019-08-01 with id 1 and other on 2019-08-31 with id 1 than count id for month 8 must 2.How can i modify the above query.
Use a subquery which returns the distinct values that you want to count:
SELECT COUNT(*) AS totalcount
FROM (
SELECT DISTINCT strftime('%Y/%m/%d', date), id
FROM cardtable
WHERE date(date) BETWEEN
date('" + $rootScope.mydate + "', 'start of month')
AND
date('" + $rootScope.mydate + "','start of month','+1 month','-1 day')
)
See the demo.
Results:
| totalcount |
| ---------- |
| 3 |

Why is data.table join not working with dates?

data.table join is not selecting the maximum date, but is the maximum value. See the following example:
table1 <- fread(
"individual_id | date
1 | 2018-01-06
2 | 2018-01-06",
sep ="|"
)
table1$date = as.IDate(table1$date)
table2 <- fread(
"individual_id | date_second | company_id | value
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
table2$date_second = as.IDate(table2$date_second)
The following join should select the maximum value by company id and then select the return the maximum of all the values returned for each individual.
The join to select max value:
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum value by company_id and the max of all of these
max(.SD[,max(value), by=.(company_id)]$V1)]
output:
individual_id date_second V1
1: 1 2018-01-06 1.5
2: 2 2018-01-06 4.0
same join, selecting max date:
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum date by company_id and the max of all of these
max(.SD[,max(date_second), by=.(company_id)]$V1)]
output:
individual_id date_second V1
1: 1 2018-01-06 2018-01-02
2: 2 2018-01-06 2018-01-01
Why is it not returning the max date like it did the max value?
I guess you are looking for an update join:
table1[table2
, on = .(individual_id, date >= date_second)
, by = .EACHI
, second_date := max(i.date_second)][]
which gives:
> table1
individual_id date second_date
1: 1 2018-01-06 2018-01-05
2: 2 2018-01-06 2018-01-05
ok, it turns out you cannot select based on one of the join criteria, so I have to create a new column date_second_copy and then select based of this, e.g.:
table2$date_second_copy = table2$date_second
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum date by company_id and the max of all of these
max(.SD[,max(date_second_copy), by=.(company_id)]$V1)]

Date between based on two column in oracle

enter image description here
How to generate dates between tow date column based on each row
A row generator technique should be used, such as:
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> with test (sno, start_date, end_date) as
2 (select 1, date '2018-01-01', date '2018-01-05' from dual union
3 select 2, date '2018-01-03', date '2018-01-05' from dual
4 )
5 select sno, start_date + column_value - 1 datum
6 from test,
7 table(cast(multiset(select level from dual
8 connect by level <= end_date - start_date + 1)
9 as sys.odcinumberlist))
10 order by sno, datum;
SNO DATUM
---------- ----------
1 01.01.2018
1 02.01.2018
1 03.01.2018
1 04.01.2018
1 05.01.2018
2 03.01.2018
2 04.01.2018
2 05.01.2018
8 rows selected.
SQL>

PL/SQl, oracle 9i, deleting duplicate rows using sql

we have a scenario here where we need to delete all the duplicate rows from a table based on timestamp. The table structure looks like this:
Item Ref1 Ref2 Timestamp
1 A test1 2/3/2012 10:00:00
1 A test2 2/3/2012 11:00:00
1 A test1 2/3/2012 12:00:00
2 A prod1 2/3/2012 10:00:00
2 B prod2 2/3/2012 11:00:00
2 A prod2 2/3/2012 12:00:00
So we need to delete the duplicate rows from this table based on item and ref1. like here we should have only 1 row for item 1 and ref1 A with the latest timestamp. Same for item 2 we should have only 1 row for ref1 A with latest timestamp.
Any pointers will be great
Assuming that your desired end result is a table with these 3 rows
Item Ref1 Ref2 Timestamp
1 A test1 2/3/2012 12:00:00
2 B prod2 2/3/2012 11:00:00
2 A prod2 2/3/2012 12:00:00
Something like
DELETE FROM table_name a
WHERE EXISTS( SELECT 1
FROM table_name b
WHERE a.item = b.item
AND a.ref1 = b.ref1
AND a.timestamp < b.timestamp );
should work assuming that there are no two rows with the same Item and Ref1 that both have the same Timestamp. If there can be multiple rows with the same Item and Ref1 that both have the latest Timestamp and assuming that you don't care which one you keep
DELETE FROM table_name a
WHERE EXISTS( SELECT 1
FROM table_name b
WHERE a.item = b.item
AND a.ref1 = b.ref1
AND a.timestamp <= b.timestamp
AND a.rowid < b.rowid);
You can query your records grouping by Item and Ref1 an then delete where Item and Ref are equals and Timestamp < max.
select Item
, Ref1
, max(Timestamp) tm
from table
group by Item, Ref1
With the results...
delete from table where Item = ? and Ref1 = ? and Timestamp < ?
I don't have an Oracle 9 install at hand, so I cannot test this, but I believe that this may work:
Create a view which lists adds "indices" to your records:
SELECT ROW_NUMBER( ) OVER (PARTITION BY Item, Ref1 ORDER BY Timestamp DESC) ix, * FROM table
Delete the records from the view where ix is higher than 1

Resources