I have data in the following format
id | first_name | last_name | birth_date
abc | Jared | Pollard | 1970-01-01
def | Jared | Pollard | 1972-02-02
ghi | Jared | Pollard | 1980-01-01
klm | Jared | Pollard | 2015-01-01
and I would like a query which groups data based on the following rule
If first_name, last_name are equal and birth_dates are within 5 years of each other, than records belong to same group
So the above data contains three groups group1=(abc, def), group2=(ghi) and group3=(klm)
Currently I have the following query which incorrectly creates only 2 groups, group1=(abc, def) and group2=(ghi, klm)
SELECT
g.id,
FIRST_VALUE(g.id) OVER (PARTITION BY lower(trim(g.last_name)), lower(trim(g.first_name)),
CASE WHEN g.birth_date between g.fv_birth_date - interval '5 year' AND g.fv_birth_date + interval '5 year' THEN 1 ELSE 0 END
ORDER BY g.last_used_dt DESC NULLS LAST) AS cluster_id
FROM (
SELECT id, last_used_dt, last_name, first_name, birth_date,
FIRST_VALUE(birth_date)
OVER (PARTITION BY
lower(trim(last_name)),
lower(trim(first_name))
ORDER BY last_used_dt DESC NULLS LAST) AS fv_birth_date
FROM guest
) g;
I understand this is because of the CASE statement within the PARTITION BY clause but am unable to come up with any other query
Related
I have a table having an ID column, this column is a primary key and unique as well. In addition, the table has a modified date column.
I have the same table in 2 databases and I am looking to merge both into one database. The merging scenario in a table is as follows:
Insert the record if the ID is not present;
If the ID exists, only update if the modified date is greater than that of the existing row.
For example, having:
Table 1:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | jane | 2019-01-01 | 2019-04-03
Table 2:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-04-30
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
The resulting table would be:
id | name | createdAt | modifiedAt
---|------|------------|-----------
1 | john | 2019-01-01 | 2019-05-01
2 | JANE | 2019-01-01 | 2019-04-04
3 | doe | 2019-01-01 | 2019-05-01
I've read about INSERT OR REPLACE, but I couldn't figure out how the date condition can be applied. I know as well that I can loop through each pair of similar row and check the date manually but this would be very time and performance consuming. Therefore, is there an efficient way to accomplish this in SQLite?
I'm using sqlite3 on Node.js .
The UPSERT notation added in Sqlite 3.24 makes this easy:
INSERT INTO table1(id, name, createdAt, modifiedAt)
SELECT id, name, createdAt, modifiedAt FROM table2 WHERE true
ON CONFLICT(id) DO UPDATE
SET (name, createdAt, modifiedAt) = (excluded.name, excluded.createdAt, excluded.modifiedAt)
WHERE excluded.modifiedAt > modifiedAt;
First create the table Table3:
CREATE TABLE Table3 (
id INTEGER,
name TEXT,
createdat TEXT,
modifiedat TEXT,
PRIMARY KEY(id)
);
and then insert the rows like this:
insert into table3 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat from (
select * from table1 t1
where not exists (
select 1 from table2 t2
where t2.id = t1.id and t2.modifiedat >= t1.modifiedat
)
union all
select * from table2 t2
where not exists (
select 1 from table1 t1
where t1.id = t2.id and t1.modifiedat > t2.modifiedat
)
)
This uses a UNION ALL for the 2 tables and gets only the needed rows with EXISTS which is a very efficient way to check the condition you want.
I have >= instead of > in the WHERE clause for Table1 in case the 2 tables have a row with the same id and the same modifiedat values.
In this case the row from Table2 will be inserted.
If you want to merge the 2 tables in Table1 you can use REPLACE:
replace into table1 (id, name, createdat, modifiedat)
select id, name, createdat, modifiedat
from table2 t2
where
not exists (
select 1 from table1 t1
where (t1.id = t2.id and t1.modifiedat > t2.modifiedat)
)
I have a dataset like shown below (except the Ser_NO, this is the field i want to create).
+--------+------------+--------+
| CaseID | Order_Date | Ser_No |
+--------+------------+--------+
| 44 | 22-01-2018 | 1 |
+--------+------------+--------+
| 44 | 24-02-2018 | 3 |
+--------+------------+--------+
| 44 | 12-02-2018 | 2 |
+--------+------------+--------+
| 100 | 24-01-2018 | 1 |
+--------+------------+--------+
| 100 | 26-01-2018 | 2 |
+--------+------------+--------+
| 100 | 27-01-2018 | 3 |
+--------+------------+--------+
How can i achieve a serial number for each CaseId based on my dates. So the first date in a specific CaseID gets number 1, the second date in this CaseID gets number 2 and so on.
I'm working with T-SQL btw,
I've tried a few things:
CASE
WHEN COUNT(CaseID) > 1
THEN ORDER BY (Order_Date)
AND Ser_no +1
END
Thanks in advance.
First of all, although I don't understand what you did, it gives you what you wanted. The serial number is assigned by date order. The problem I can see is that the result shows you the rows in the wrong order (1, 3, 2 instead of 1, 2, 3).
To sort that order you can try this:
SELECT *, ROW_NUMBER() OVER (PARTITION BY caseid ORDER BY caseid, order_date) AS ser_no
FROM [Table]
Thanks for your reply,
Sorry for the misunderstanding, because the ser_no is not yet in my table. That is the field a want to calculate.
I finished it myself this morning, but it looks almost the same like your measure:
RANK() OVER(PARTITION BY CaseID ORDER BY CaseID, Order_Date ASC
This may be a kind of the Knapsack problem.
I need to traverse a data table, group it by a column, choosing ones with better time.
Then repeat the previous step until a limit given by column CAPACITY is not reached.
This is the demo scenario:
create table if not exists data( vid num, size num, epid num, sid num, capacity num, dt );
delete from data;
insert into data(vid,size,epid,sid,capacity,dt)
values
(0,20,1,1,50,1100), -- 2nd choice
(0,20,1,1,50,1000), -- 1st choice
(0,20,1,1,50,1200), -- last choice excluded because out of capacity
(1,20,2,2,50,1100), -- 2nd choice
(1,20,2,2,50,1000), -- 1st choice
(1,20,2,2,50,1200); -- last choice excluded because out of capacity
This is the non recursive solution:
with best0 as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
),
best1 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best0 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
),
best2 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best1 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0 union all select tid from best1)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best0
union all
select * from best1
union all
select * from best2
And this the result:
tid | vid | sid | size | Dtime | capacity | group_level
--- | --- | --- | ---- | ----- | -------- | -----------
2 | 0 | 1 | 20 | 1000 | 30 | 0
5 | 1 | 2 | 20 | 1000 | 30 | 0
1 | 0 | 1 | 20 | 1100 | 10 | 1
4 | 1 | 2 | 20 | 1100 | 10 | 1
This is the recursive version that give error: "recursive reference in a subquery: best"
with recursive best(tid,vid,sid,size,dt,remains,level)
as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
union all
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best) and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best
I tried differents solutions even using a loop counter but everyone give the same error.
I have a project that calculates work hour from the attendance logs that I import from attendance machine. I use SQLite database and VB .NET.
First I'll show the table that I use:
CREATE TABLE [CheckLogs] (
[IDCheckLog] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
[IDEmployee] TEXT NOT NULL,
[Dates] TEXT NOT NULL,
[In] TEXT,
[Out] TEXT,
[OverTime] NUMERIC DEFAULT 0);
CREATE TABLE integers (i INTEGER NOT NULL PRIMARY KEY);
INSERT INTO integers (i) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Table CheckLogs is the data that I import from the attendance machine. The OverTime column is calculated in my program. Table integer is used to create the date list, I got it from here.
I want to generate a view that shows employee attendance between 2 dates and display the CheckLogs data if the employee is present and null if absent. Because in the table CheckLogs, when the employee is absent then there is no data from that day from this employee.
This is the view that I desired (this is report for employee 10001 between 2014-10-01 and 2014-10-05):
Dates | IDEmployee | In | Out
---------------------------------------
2014-10-01 | 10001 | 07:00 | 16:00
2014-10-02 | 10001 | 07:01 | 15:58
2014-10-03 | 10001 | null | null
2014-10-04 | 10001 | 07:08 | 15:48
2014-10-05 | 10001 | null | null
And this is the query that I have now:
SELECT X.[Dates], C.[IDEmployee], C.[In], C.[Out]
FROM
(select date('2014-10-01', '+' || (H.i*100 + T.i*10 + U.i) || ' day') as Dates
from integers as H
cross
join integers as T
cross
join integers as U
where date('2005-01-25', '+' || (H.i*100 + T.i*10 + U.i) || ' day') <= '2014-10-05') AS X
, CheckLogs AS C USING (Dates)
WHERE C.[IDEmployee]='10001'
From this query I have this result:
Dates | IDEmployee | In | Out
---------------------------------------
2014-10-01 | 10001 | 07:00 | 16:00
2014-10-02 | 10001 | 07:01 | 15:58
2014-10-04 | 10001 | 07:08 | 15:48
To get NULL values for rows without a match, you need an outer join.
And you have to take care not to filter out those rows with a WHERE clause that would not match NULL values; to get dates that do not match a condition, you have to put that condition into the join's ON clause:
SELECT ...
FROM ( ... ) AS X
LEFT JOIN CheckLogs AS C ON C.Dates = X.Dates AND
C.IDEmployee = '10001'
I have table [Surgery_By] table[Surgery] table[Doctor] i'm using ASP.NET with SQL Server :
The table [Surgery_By] contains the following columns:
1-ID (PK)
2-Surgery ID (FK)
3-Doctor ID (FK)
How to Display doctors ordered by number of performed surgeries ?
Try it this way
SELECT d.id, d.fullname, COUNT(s.id) total_surgeries
FROM doctor d LEFT JOIN surgery_by s
ON d.id = s.doctor_id
GROUP BY d.id, d.fullname
ORDER BY total_surgeries DESC
Sample output:
| ID | FULLNAME | TOTAL_SURGERIES |
|----|------------|-----------------|
| 1 | John Doe | 3 |
| 2 | Jane Doe | 1 |
| 3 | Mark Smith | 0 |
Here is SQLFiddle demo
This is a stab in the dark.
Select Doctor.ID As DoctorID
,Count(*) As Count
From Doctor
Join Surgery_By
On Doctor.ID = Surgery_By.DoctorID
Group By Doctor.DoctorID
Order By Count(*)
I am not sure if you want the table Surgery incorporated (but if you do, the join will be pretty straight forward - just be sure to add selected columns to the Group By statement.)
From ASP.NET, you may select this data from a SQL Command.