Generating attendance list for hours without a matching row - sqlite

I have a project that calculates work hour from the attendance logs that I import from attendance machine. I use SQLite database and VB .NET.
First I'll show the table that I use:
CREATE TABLE [CheckLogs] (
[IDCheckLog] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
[IDEmployee] TEXT NOT NULL,
[Dates] TEXT NOT NULL,
[In] TEXT,
[Out] TEXT,
[OverTime] NUMERIC DEFAULT 0);
CREATE TABLE integers (i INTEGER NOT NULL PRIMARY KEY);
INSERT INTO integers (i) VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Table CheckLogs is the data that I import from the attendance machine. The OverTime column is calculated in my program. Table integer is used to create the date list, I got it from here.
I want to generate a view that shows employee attendance between 2 dates and display the CheckLogs data if the employee is present and null if absent. Because in the table CheckLogs, when the employee is absent then there is no data from that day from this employee.
This is the view that I desired (this is report for employee 10001 between 2014-10-01 and 2014-10-05):
Dates | IDEmployee | In | Out
---------------------------------------
2014-10-01 | 10001 | 07:00 | 16:00
2014-10-02 | 10001 | 07:01 | 15:58
2014-10-03 | 10001 | null | null
2014-10-04 | 10001 | 07:08 | 15:48
2014-10-05 | 10001 | null | null
And this is the query that I have now:
SELECT X.[Dates], C.[IDEmployee], C.[In], C.[Out]
FROM
(select date('2014-10-01', '+' || (H.i*100 + T.i*10 + U.i) || ' day') as Dates
from integers as H
cross
join integers as T
cross
join integers as U
where date('2005-01-25', '+' || (H.i*100 + T.i*10 + U.i) || ' day') <= '2014-10-05') AS X
, CheckLogs AS C USING (Dates)
WHERE C.[IDEmployee]='10001'
From this query I have this result:
Dates | IDEmployee | In | Out
---------------------------------------
2014-10-01 | 10001 | 07:00 | 16:00
2014-10-02 | 10001 | 07:01 | 15:58
2014-10-04 | 10001 | 07:08 | 15:48

To get NULL values for rows without a match, you need an outer join.
And you have to take care not to filter out those rows with a WHERE clause that would not match NULL values; to get dates that do not match a condition, you have to put that condition into the join's ON clause:
SELECT ...
FROM ( ... ) AS X
LEFT JOIN CheckLogs AS C ON C.Dates = X.Dates AND
C.IDEmployee = '10001'

Related

global or local indexes on column with duplicate value in Oracle 19C?

I have below table on Oracle19c(I am an oracle newbie). 4 million rows are inserted into the table daily and for now this table have 40 column and 240 million rows.
I usually search the table with user_id and MyTimestamp columns filter query and it takes 10 minutes to return the answer.
Example:
select * from table where user_id=123581 and MyTimestamp between 1657640396 and 1657777396
Note: Duplicate values are stored in the user_id and MyTimestamp columns.
I want partition monthly on MyTimestamp and index on user_id but which global or local indexes is suitable for indexing and how do I do it?
----------------------------------------------------------------------------------------------------
| id | MyTimestamp | Name | user_id ...
----------------------------------------------------------------------------------------------------
| 0 | 1657640396 | John | 123581 ...
| 1 | 1657638832 | Tom | 168525 ...
| 2 | 1657640265 | Tom | 168525 ...
| 3 | 1657640292 | John | 123581 ...
| 4 | 1657640005 | Jack | 896545 ...
--------------------------------------------------------------------------------------------------
If the majority of your queries contain the partition key, then better create LOCAL indexes:
CREATE INDEX index_name ON table_name (MyTimestamp, user_id) LOCAL;
Local indexes are smaller (i.e. the index partition) and thus faster and you don't have to rebuild the index when you drop an outdated partition.

Enforce uniqueness within a date range or based on the value of another column

I have a table with a large amount of data; moving forward, I would like to enforce uniqueness for a given column in this table. However, the table contains a large amount of rows where that column is non-unique. I am not able to delete or alter these rows.
Is it possible to enforce uniqueness over a given date range, or since a specific date, or based on the value of another column (or something else like that) in MariaDB?
You can create a UNIQUE index on multiple columns, where one column is nullable. MariaDB will see each column with NULL values as a different value regarding the UNIQUE index, even if the other column values of the UNIQUE index are the same. Check the MariaDB documentation Getting Started with Indexes - Unique Index:
The fact that a UNIQUE constraint can be NULL is often overlooked. In SQL any NULL is never equal to anything, not even to another NULL. Consequently, a UNIQUE constraint will not prevent one from storing duplicate rows if they contain null values:
CREATE TABLE t1 (a INT NOT NULL, b INT, UNIQUE (a,b));
INSERT INTO t1 values (3,NULL), (3, NULL);
SELECT * FROM t1;
+---+------+
| a | b |
+---+------+
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | NULL |
| 3 | NULL |
+---+------+
You can create such a UNIQUE index on the date column you already have and a new column which indicates if the date value should be unique or not:
CREATE TABLE Foobar(
id INT AUTO_INCREMENT PRIMARY KEY NOT NULL,
createdAt DATE NOT NULL,
dateUniqueMarker BIT NULL DEFAULT 0,
UNIQUE KEY uq_createdAt(createdAt, dateUniqueMarker)
);
INSERT INTO Foobar(createdAt) VALUES ('2021-11-04'),('2021-11-05'),('2021-11-06');
SELECT * FROM Foobar;
+----+------------+------------------------------------+
| id | createdAt | dateUniqueMarker |
+----+------------+------------------------------------+
| 1 | 2021-11-04 | 0x00 |
| 2 | 2021-11-05 | 0x00 |
| 3 | 2021-11-06 | 0x00 |
+----+------------+------------------------------------+
INSERT INTO Foobar(createdAt) VALUES ('2021-11-05');
ERROR 1062 (23000): Duplicate entry '2021-11-05-\x00' for key 'Foobar.uq_createdAt'
UPDATE Foobar SET dateUniqueMarker = NULL WHERE createdAt = '2021-11-05';
INSERT INTO Foobar(createdAt, dateUniqueMarker) VALUES ('2021-11-05', NULL);
SELECT * FROM Foobar;
+----+------------+------------------------------------+
| id | createdAt | dateUniqueMarker |
+----+------------+------------------------------------+
| 1 | 2021-11-04 | 0x00 |
| 2 | 2021-11-05 | NULL |
| 5 | 2021-11-05 | NULL |
| 3 | 2021-11-06 | 0x00 |
+----+------------+------------------------------------+
Without any data example and scenario illustration, it's hard to know. If you can update your question with those information, please do.
"Is it possible to enforce uniqueness over a given date range, or since a specific date, or based on the value of another column (or something else like that) in MariaDB?"
If by "enforce" you mean to create a new column then populate it with unique identifier, then yes it is possible. If what you really mean is to generate a unique value based on other column, that's also possible. Question is, how unique do you want it to be?
Is it like this unique?
column1
column2
column3
unique_val
2021-02-02
ABC
DEF
1
2021-02-02
CBD
FEA
1
2021-02-03
BED
GER
2
2021-02-04
ART
TOY
3
2021-02-04
ZSE
KSL
3
Whereby if it's the same date (on column1), it should have the same unique value regardless of column2 & column3 data.
Or like this?
column1
column2
column3
unique_val
2021-02-02
ABC
DEF
1
2021-02-02
CBD
FEA
2
2021-02-03
BED
GER
3
2021-02-04
ART
TOY
4
2021-02-04
ZSE
KSL
5
Taking all (or certain) columns to consider the unique value.
Both of the scenario above can be achieved in query without the need to alter the table, adding and populate a new column but of course, the latter is also possible.

How can I set multiple aliases for a single derived table in MariaDB 5.5?

Consider a database with three tables:
goods (Id is the primary key)
+----+-------+-----+
| Id | Name | SKU |
+----+-------+-----+
| 1 | Nails | 123 |
| 2 | Nuts | 456 |
| 3 | Bolts | 789 |
+----+-------+-----+
invoiceheader (Id is the primary key)
+----+--------------+-----------+---------+
| Id | Date | Warehouse | BuyerId |
+----+--------------+-----------+---------+
| 1 | '2021-10-15' | 1 | 223 |
| 2 | '2021-09-18' | 1 | 356 |
| 3 | '2021-07-13' | 2 | 1 |
+----+--------------+-----------+---------+
invoiceitems (Id is the primary key)
+----+----------+--------+-----+-------+
| Id | HeaderId | GoodId | Qty | Price |
+----+----------+--------+-----+-------+
| 1 | 1 | 1 | 15 | 1.1 |
| 2 | 1 | 3 | 7 | 1.5 |
| 3 | 2 | 1 | 12 | 1.5 |
| 4 | 3 | 3 | 3 | 1.3 |
+----+----------+--------+-----+-------+
What I'm trying to do is to get the MAX(invoiceheader.Date) for every invoiceitems.GoodId. Or, in everyday terms, to find out, preferably in a single query, when was the last time any of the goods were sold, from a specific warehouse.
To do that, I'm using a derived query, and the solution proposed here . In order to be able to do that, I think that I need to have a way of giving multiple (well, two) aliases for a derived table.
My query looks like this at the moment:
SELECT tmp.* /* placing the second alias here, before or after tmp.* doesn't work */
FROM ( /* placing the second alias, tmpClone, here also doesn't work */
SELECT
invoiceheader.Id,
invoiceheader.Date,
invoiceitems.HeaderId,
invoiceitems.Id,
invoiceitems.GoodId
FROM invoiceheader
LEFT JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
WHERE invoiceheader.Warehouse = 3
AND invoiceheader.Date > '0000-00-00 00:00:00'
AND invoiceheader.Date IS NOT NULL
AND invoiceheader.Date > ''
AND invoiceitems.GoodId > 0
ORDER BY
invoiceitems.GoodId ASC,
invoiceheader.Date DESC
) tmp, tmpClone /* this doesn't work with or without a comma */
INNER JOIN (
SELECT
invoiceheader.Id,
MAX(invoiceheader.Date) AS maxDate
FROM tmpClone
WHERE invoiceheader.Warehouse = 3
GROUP BY invoiceitems.GoodId
) headerGroup
ON tmp.Id = headerGroup.Id
AND tmp.Date = headerGroup.maxDate
AND tmp.HeaderId = headerGroup.Id
Is it possible to set multiple aliases for a single derived table? If it is, how should I do it?
I'm using 5.5.52-MariaDB.
you can use both (inner select) and left join to achieve this for example:
select t1.b,(select t2.b from table2 as t2 where t1.x=t2.x) as 'Y' from table as t1 Where t1.y=(select t3.y from table3 as t3 where t2.a=t3.a)
While this doesn't answer my original question, it does solve the problem from which the question arose, and I'll leave it here in case anyone ever comes across a similar issue.
The following query does what I'd intended to do - find the newest sale date for the goods from the specific warehouse.
SELECT
invoiceheader.Id,
invoiceheader.Date,
invoiceitems.HeaderId,
invoiceitems.Id,
invoiceitems.GoodId
FROM invoiceheader
INNER JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
INNER JOIN (
SELECT
MAX(invoiceheader.Date) AS maxDate,
invoiceitems.GoodId
FROM invoiceheader
INNER JOIN invoiceitems
ON invoiceheader.Id = invoiceitems.HeaderId
WHERE invoiceheader.Warehouse = 3
AND invoiceheader.Date > '0000-00-00 00:00:00'
AND invoiceheader.Date IS NOT NULL
AND invoiceheader.Date > ''
GROUP BY invoiceitems.GoodId
) tmpDate
ON invoiceheader.Date = tmpDate.maxDate
AND invoiceitems.GoodId = tmpDate.GoodId
WHERE invoiceheader.Warehouse = 3
AND invoiceitems.GoodId > 0
ORDER BY
invoiceitems.GoodId ASC,
invoiceheader.Date DESC
The trick was to join by taking into consideration two things - MAX(invoiceheader.Date) and invoiceitems.GoodId - since one GoodId can only appear once inside a specific invoiceheader / invoiceitems JOINing (strict limit imposed on the part of the code which inserts into invoiceitems).
Whether this is the most optimal solution (ignoring the redundant conditions in the query), and whether it would scale well, remains to be seen - it has been tested on tables with ~5000 entries for invoiceheader, ~60000 entries for invoiceitems, and ~4000 entries for goods. Execution time was < 1 sec.

Make partitions based on difference in date in Postgres window function

I have data in the following format
id | first_name | last_name | birth_date
abc | Jared | Pollard | 1970-01-01
def | Jared | Pollard | 1972-02-02
ghi | Jared | Pollard | 1980-01-01
klm | Jared | Pollard | 2015-01-01
and I would like a query which groups data based on the following rule
If first_name, last_name are equal and birth_dates are within 5 years of each other, than records belong to same group
So the above data contains three groups group1=(abc, def), group2=(ghi) and group3=(klm)
Currently I have the following query which incorrectly creates only 2 groups, group1=(abc, def) and group2=(ghi, klm)
SELECT
g.id,
FIRST_VALUE(g.id) OVER (PARTITION BY lower(trim(g.last_name)), lower(trim(g.first_name)),
CASE WHEN g.birth_date between g.fv_birth_date - interval '5 year' AND g.fv_birth_date + interval '5 year' THEN 1 ELSE 0 END
ORDER BY g.last_used_dt DESC NULLS LAST) AS cluster_id
FROM (
SELECT id, last_used_dt, last_name, first_name, birth_date,
FIRST_VALUE(birth_date)
OVER (PARTITION BY
lower(trim(last_name)),
lower(trim(first_name))
ORDER BY last_used_dt DESC NULLS LAST) AS fv_birth_date
FROM guest
) g;
I understand this is because of the CASE statement within the PARTITION BY clause but am unable to come up with any other query

Add value from row to date in SELECT query

Assume I have the following SQLite table "foobar":
id | start | duration
---+------------+---------
1 | 2016-05-12 | 2
2 | 2016-01-01 | 5
My goal is to get the sum of the start-date and the duration (durations are in years).
So my desired result is the following:
id | end
---+-----------
1 | 2018-05-12
2 | 2021-01-01
Is this possible with a single query?
I know it is possible to add static values as follows
SELECT date(start, "+2 years") FROM foobar;
but I could not find a way to replace the static 2 with the dynamic value of duration.
SELECT date(start, "+" || duration || " years")
FROM foobar;
SQLFiddle demo

Resources