sqlite mark duplicates with true or false - sqlite

I need to create a new column in my dataset (duplicate_name) that contains TRUE if there are more than one record for someone or FALSE otherwise. I found this code (I am working with sqlite):
SELECT *,
CASE
WHEN ROW NUMBER() OVER (PARTITION BY
first_name, last_name) > 1) THEN 'TRUE'
ELSE 'FALSE'
END AS duplicate_name
FROM users;
But when I ran it it gives me something like this (ONLY THE SECOND RECORD IS MARKED AS TRUE):
Carlo Thomas male FALSE
Carlo Thomas male TRUE
Don Scallion male FALSE
Tania Lopes female FALSE
What I need is a table like this (both records with the same name are marked as TRUE):
Carlo Thomas male TRUE
Carlo Thomas male TRUE
Don Scallion male FALSE
Tania Lopes female FALSE
Can someone help me, please.
Thanks

Instead of ROW_NUMBER() use COUNT(*) window function:
SELECT *,
CASE WHEN COUNT(*) OVER (PARTITION BY first_name, last_name) > 1 THEN 'TRUE' ELSE 'FALSE' END AS duplicate_name
FROM users;
or simpler with a 1 for true and 0 for false:
SELECT *,
COUNT(*) OVER (PARTITION BY first_name, last_name) > 1 AS duplicate_name
FROM users;

Related

SELECT DISTINCT and GROUP BY returning duplicates

I'm trying to select a distinct HourlyRate from a table, and then group the resulting HourlyRate by a FECode (basically a person). One person may have 2 or 3 rates over time, but the results that are returning involve the same HourlyRate being repeated for the same FECode.
SELECT DISTINCT Cost/Hours As HourlyRate, Date, FECode
FROM Table1
WHERE HourlyRate != ''
GROUP BY HourlyRate, FECode
ORDER BY FECode
The result looks like as follows:
HourlyRate, Date, FECode
215.00, 2017-04-06, AAA
215.00, 2017-04-27, AAA
225.00, 2017-06-16, AAA
The data from Table1 is as follows:-
Date, FECode, Cost, Hours
2017-04-06, AAA, 236.5, 1.1
2017-04-27, AAA, 43, 0.2
2017-06-16, AAA, 247.5, 1.1
Clearly, in this example, the second result of 215.00 should not be returning, but it is. How do I stop this from happening?
The result is ok because DISTINCT remove the line which match on "full set of columns". The Cost/Hours is number which is divide and the result looks like round number (but the number is not the same), therefore it did not match as the same number. try use this, and do not forget the remove date column:
SELECT cast(Cost/Hours as text) As HourlyRate, FECode
FROM Table1
WHERE HourlyRate != ''
ORDER BY FECode
These two values are not equal:
SELECT 236.5/1.1 = 43/0.2;
0
There actually is a difference:
SELECT 236.5/1.1 - 43/0.2;
-2.8421709430404e-14
See Is floating point math broken?
You have to round the result.
(And using the column Date with this GROUP BY does not make sense.)
The following query returns the expected result:-
SELECT ROUND(Cost/Hours, 2) As HourlyRate, Date, FECode FROM Table1 WHERE HourlyRate!= '' GROUP BY FECode, HourlyRate ORDER BY FECode ASC

delete all rows if a field contains a value from another record

I've got a table like this:
Name Code
-------------------
John 1235
John 1235/11
John 1236/12
Mary 2500
Mary 2500/8
Mary 3600
Mary 3600/9
I want to delete all the rows where the value of code is contained in another row.
In the example I want to delete these records:
Name Code
-------------------
John 1235
Mary 2500
Mary 3600
Here is one method:
delete from t
where exists (select 1
from t t2
where t2.value like t1.value || '/%'
);
If you don't want to actually delete the records, but just want a query to not return them:
select *
from t
where not exists (select 1
from t t2
where t2.value like t1.value || '/%'
);
These assume that (as in the example), "is contained" really means "starts with before the "/".
Delete From tablename t
Where Exists
(Select * from table
Where charIndex(t.Code, Code) !=0)

Trouble with Sqlite subquery

My CustomTags table may have a series of "temporary" records where Tag_ID is 0, and Tag_Number will have some five digit value.
Periodically, I want to clean up my Sqlite table to remove these temporary values.
For example, I might have:
Tag_ID Tag_Number
0 12345
0 67890
0 45678
1 12345
2 67890
In this case, I want to remove the first two records because they are duplicated with actual Tag_ID 1 and 2. But I don't want to remove the third record yet because it hasn't been duplicated yet.
I have tried a number of different types of subqueries, but I just can't get it working. This is the last thing I tried, but my database client complains of an unknown syntax error. (I have tried with and without AS as an alias)
DELETE FROM CustomTags t1
WHERE t1.Tag_ID = 0
AND (SELECT COUNT(*) FROM CustomTags t2 WHERE t1.Tag_Number = t2.Tag_Number) > 1
Can anyone offer some insight? Thank you
There are many options, but the simplest are probably to use EXISTS;
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND EXISTS(
SELECT 1 FROM CustomTags c
WHERE c.Tag_ID <> 0 AND c.Tag_Number = CustomTags.Tag_Number
)
An SQLfiddle to test with.
...or NOT IN...
DELETE FROM CustomTags
WHERE Tag_ID = 0
AND Tag_Number IN (
SELECT Tag_Number FROM CustomTags WHERE Tag_ID <> 0
)
Another SQLfiddle.
With your dataset like so:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
0 12345
0 67890
0 45678
You can run:
delete from test
where rowid not in (
select a.rowid
from test a
inner join (select tag_number, max(tag_id) as mt from test group by tag_number) b
on a.tag_number = b.tag_number
and a.tag_id = b.mt
);
Result:
sqlite> select * from test;
tag_id tag_number
---------- ----------
1 12345
1 67890
Please do test this out with a few more test cases than you have to be entirely sure that's what you want. I'd recommend creating a copy of your database before you run this on a large dataset.

Sum in Access Producing Total Sum and Placing This on Each Row

I am trying to create two columns: IntlAir and DomesticAir. I have a Boolean column in my data called International, and IntlAir returns Penalty + SellingFare when International is TRUE, and DomAir returns that sum when International = FALSE.
I would like to show this amount for each DK by Month.
My code is:
SELECT data.PostingMonth, data.DK_Number
, (SELECT sum(data.Penalty + data.SellingFare)
FROM data
WHERE data.International = TRUE) AS IntlAir
, (SELECT sum(data.Penalty + data.SellingFare)
FROM data
WHERE data.International = FALSE) AS DomesticAir
FROM data
GROUP BY data.PostingMonth, data.DK_Number
ORDER BY data.PostingMonth;
However, the output is giving me the total sum across all dks and across all months, and putting this value into every row.
Can someone tell me what I am doing wrong?
Perhaps this is all you need:
SELECT
PostingMonth,
DK_Number,
SUM((Penalty + SellingFare) * IIf(International, 1, 0)) AS IntlAir,
SUM((Penalty + SellingFare) * IIf(International, 0, 1)) AS DomAir
FROM [data]
GROUP BY PostingMonth, DK_Number
For the test data...
PostingMonth DK_Number International Penalty SellingFare
------------ --------- ------------- ------- -----------
1 1 False $10.00 $100.00
1 1 True $20.00 $200.00
2 1 False $30.00 $300.00
1 2 False $40.00 $400.00
1 2 False $50.00 $500.00
1 2 True $60.00 $600.00
...the above query returns
PostingMonth DK_Number IntlAir DomAir
------------ --------- ------- -------
1 1 $220.00 $110.00
1 2 $660.00 $990.00
2 1 $0.00 $330.00
There's a few ways to do that, though the one you chose wasn't one of them
Never sure where access is at in terms of sql but if you create a query that does this and call it queryAirTotal or somesuch
SELECT PostingMonth, DK_Number, International, sum(Penalty + SellingFare) as AirTotal
FROM data GROUP BY PostingMonth,DK_Number,International
That will give your totals by month, dk and type then you can do
Select t1.PostingMonth,t1.DK_Number,t1.AirTotal as IntlAir, t2.Total as DomesticAir
From queryAirTotal t1
Left Join queryAirTotal t2
On t1.PostingMonth = t2.PostingMonth and t1.DK_Number = t2.DK_Number
Where t1.International = TRUE and t2.International = FALSE
Though that will miss out Month/DKs where there was only Domestic air and no International air. You could sort that with a full outer join, which I believe access also struggles with.
You can get round that with a Union
Select t1.PostingMonth,t1.DK_Number,t1.AirTotal as IntlAir, t2.Total as DomesticAir
From queryAirTotal t1
Left Join queryAirTotal t2
On t1.PostingMonth = t2.PostingMonth and t1.DK_Number = t2.DK_Number
Where t1.International = TRUE and t2.International = FALSE
Union
Select t1.PostingMonth,t1.DK_Number,t1.AirTotal as IntlAir, t2.Total as DomesticAir
From queryAirTotal t1
Left Join queryAirTotal t2
On t2.PostingMonth = t1.PostingMonth and t2.DK_Number = t1.DK_Number
Where t1.International = TRUE and t2.International = FALSE

SQLite Compare two columns

I am creating a database for my Psych class and I am scoring a personality profile. I need to compare two test items and, if they match a condition, then copy into a separate table.
Example (pseudocode is between \)Sqlite3
INSERT INTO Scale
SELECT* FROM Questions
WHERE \\if Question 1 IS 'TRUE' AND Question 3 IS 'FALSE' THEN Copy this Question
and its response into the Scale table\\;
I have about 100 other questions that work like this. Sample format goes like this:
IF FirstQuestion IS value AND SecondQuestion IS value THEN
Copy both questions into the Scale TABLE.
---------- EDITED AFTER FIRST RESPONSE! EDITS FOLLOW-------------
Here is my TestItems table:
ItemID | ItemQuestion | ItemResponse
```````````````````````````````````````````````````
1 | Is the sky blue? | TRUE
2 | Are you a person? | TRUE
3 | 2 Plus 2 Equals Five | FALSE
What I want to do: If Question 1 is TRUE AND Question 3 is FALSE, then insert BOTH questions into the table 'Scale' (which is setup like TestItems). I tried this:
INSERT INTO Scale
SELECT * FROM TestItems
WHERE ((ItemID=1) AND (ItemResponse='TRUE'))
AND ((ItemID=3) AND (ItemResponse='FALSE'));
HOWEVER: The above INSERT copies neither.
The Resulting 'Scale' table should look like this:
ItemID | ItemQuestion | ItemResponse
```````````````````````````````````````````````````
1 | Is the sky blue? | TRUE
3 | 2 Plus 2 Equals Five | FALSE
There is nothing wrong with your query. You're just there:
INSERT INTO Scale
SELECT * FROM Questions
WHERE `Question 1` = 1 AND `Question 3` = 0;
Here 1 and 0 are values (in your first case, true and false). First of all you should ensure there are fields Question 1 and Question 3 in your Questions table. Secondly the column count as well as data types of Scale table should match Questions table. Otherwise you will have to do selectively choose the fields in your SELECT query.
Edit: To respond to your edit, I am not seeing an elegant solution. You could do this:
INSERT INTO Scale
SELECT * FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
WHERE (SELECT COUNT(*) FROM (
SELECT 1 FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
) AS t) >= 2
Your insert did not work because ItemID cant be both 1 and 3 at the same time. My solution gets the required records to be inserted into Scale table, but verifies both the record exists by checking the count. Additionally you could (should) do as below since this can be marginally more efficient (the above SQL was to clearly show the logic being used):
INSERT INTO Scale
SELECT * FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
WHERE (
SELECT COUNT(*)
FROM TestItems
WHERE ItemID = 1 AND ItemResponse = 'TRUE'
OR ItemID = 3 AND ItemResponse = 'FALSE'
) >= 2

Resources