SQLite - Create dummy variable vector/string from multiple columns - sqlite

I have some data that looks like this:
UserID Category
------ --------
1 a
1 b
2 c
3 b
3 a
3 c
A I'd like to binary-encode this grouped by UserID: three different values exist in Category, so a binary encoding would be something like:
UserID encoding
------ --------
1 "1, 1, 0"
2 "0, 0, 1"
3 "1, 1, 1"
i.e., all three values are present for UserID = 3, so the corresponding vector is "1, 1, 1".
Is there a way to do this without doing a bunch of CASE WHEN statements? There may be dozens of possible values in Category

Cross join the distinct users to distinct categories and left join to the table.
Then use GROUP_CONCAT() window function which supports an ORDER BY clause, to collect the 0s and 1s:
WITH
users AS (SELECT DISTINCT UserID FROM tablename),
categories AS (
SELECT DISTINCT Category, DENSE_RANK() OVER (ORDER BY Category) rn
FROM tablename
),
cte AS (
SELECT u.UserID, c.rn,
'"' || GROUP_CONCAT(t.UserID IS NOT NULL)
OVER (PARTITION BY u.UserID ORDER BY c.rn) || '"' encoding
FROM users u CROSS JOIN categories c
LEFT JOIN tablename t
ON t.UserID = u.UserID AND t.Category = c.Category
)
SELECT DISTINCT userID,
FIRST_VALUE(encoding) OVER (PARTITION BY UserID ORDER BY rn DESC) encoding
FROM cte
ORDER BY userID
This will work for any number of categories.
See the demo.
Results:
UserID
encoding
1
"1,1,0"
2
"0,0,1"
3
"1,1,1"

First create an encoding table to explicit establish order of categories in the bitmap:
create table e (Category int, Encoding int);
insert into e values ('a', 1), ('b', 2), ('c', 4);
First generate a list of users u (cross) joined with the encoding table e to get a fully populated (UserId, Category, Encoding) table. Then left join the fully populated table with the user supplied data t. The right hand side t can now be used to drive if we need to set a bit or not:
select
u.UserId,
'"' ||
group_concat(case when t.UserId is null then 0 else 1 end, ', ')
|| '"' 'encoding'
from
(select distinct UserID from t) u
join e
left natural join t
group by 1
order by e.Encoding
and it gives the expected result:
1|"1, 1, 0"
2|"0, 0, 1"
3|"1, 1, 1"

Related

Recursing through a SQLite table to find a matching subset of records

I have a table with 3 text fields column (A, B & C) imported from a flat file comprising many thousands of lines. None of these fields have a UNIQUE constraint and there is no primary key combination. As a result one or more records may have the same values and there will even be records with the same values across all fields. In many records, columns A, B and C should be the same but due to data quality issues, column C has many variant where column A and B are the same. Where column A and B are the same the corresponding value in column C may be subsets of the value of column C in another record having the same values as other records for column A and B.
To illustrate a subset arrived at by using GROUP BY gives:
enter image description here
I now need to narrow down that subset further to find all records where the value in column C is INSTR the values of the other grouped results i.e. i'd like to return:
enter image description here
because "Buckingham" and "Lindsey" are both INSTR the records that contain "Lindsey Buckingham" in column C
With EXISTS and INSTR():
select t.* from tablename t
where exists (
select 1 from tablename
where a = t.a and b = t.b and c <> t.c and instr(c, t.c) > 0
)
or with LIKE:
select t.* from tablename t
where exists (
select 1 from tablename
where a = t.a and b = t.b and c <> t.c and c like '%' || t.c || '%'
)
or with a self join:
select distinct t.*
from tablename t inner join tablename tt
on tt.a = t.a and tt.b = t.b and tt.c <> t.c and tt.c like '%' || t.c || '%'

SELECT SUM of each row and the next row

Table three columns id, numers1 and numbers2. We need to summarize numers1 and numbers2 but the first row to the second row numers1 numers2 the second with the third and forth etc.:
CREATE TABLE tb1 (id INTEGER PRIMARY KEY AUTOINCREMENT,numbers1,numbers2);
INSERT INTO tb1 (numbers1,numbers2) values(1,10);
INSERT INTO tb1 (numbers1,numbers2) values(2,20);
INSERT INTO tb1 (numbers1,numbers2) values(3,30);
INSERT INTO tb1 (numbers1,numbers2) values(4,40);
INSERT INTO tb1 (numbers1,numbers2) values(5,50);
I want to get as:
21
32
43
54
with the reference of getting the correct row index per record here:
How to use ROW_NUMBER in sqlite
I was able to create the required result with the following query:
SELECT
num1 + coalesce(b_num2, 0)
FROM(
SELECT
num1,
(select count(*) from test as b where a.id >= b.id) as cnt
FROM test as a) as a
LEFT JOIN
(SELECT num2 as b_num2,
(select count(*) from test as b where a.id >= b.id) as cnt
FROM test as a
) as b
ON b.cnt = a.cnt + 1
Explanation:
by joining two same table of similar record index, then merge the next record with the current record and then sum num1 of current record with num2 of next record, I do not know how you want to deal with the last row as it does not have a next row so I assume it to add nothing to have a result of just the value of num1
Result:
For one row with a specific ID x, you can get values from the next row by searching for ID values larger than x, and taking the first such row:
SELECT ...
FROM tb1
WHERE id > x
ORDER BY id
LIMIT 1;
You can then use this as a correlated subquery to get that value for each row:
SELECT numbers1 + (SELECT T2.numbers2
FROM tb1 AS T2
WHERE T2.id > T1.id
ORDER BY T2.id
LIMIT 1) AS sum
FROM tb1 AS T1
WHERE sum IS NOT NULL; -- this omits the last row, where the subquery returns NULL

Simple Split function in SQL Server 2012 with explanation pls

I have two tables Procedures and ProcedureTypes.
Procedures has a column Type which is a varchar with the values (1, 2), (3, 4), (4, 5) etc...
ProcedureType has a primary key 'ID' 1 to 9.
ID Description
1 Drug
2 Other-Drug
etc...
ID is an integer value and Type is varchar value.
Now I need to join these two tables to show the values
ID in the Procedures table
ProcedureType in the Procedures table
Description in the ProceduresType table with the value separated by a "-".
For example if he value in Type is (1,2) the new table after join should show values in the description like (Drug-Other Drug)
I have used this query bot to no avail
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Can anyone tell me how to do it and why the above query is not working
with Procedures as (
select 1 as ID, '1,2,3' as Typ
),
ProcedureTypes as (
select 1 as TypeID, 'Drug' as Name
union select 2 , 'Other-Drug'
union select 3 , 'Test 3'
)
/*Get one extra column of type xml*/
,Procedures_xml as (
select id,CONVERT(xml,' <root> <s>' + REPLACE(Typ,',','</s> <s>') + '</s> </root> ') as Typ_xml
from Procedures
)
/*Convert the field string to multiple rows then join to procedure types*/
, Procdure_With_Type as (
select ID,T.c.value('.','varchar(20)') as TypeID,
ProcedureTypes.Name
from Procedures_xml
CROSS APPLY Typ_xml.nodes('/root/s') T(c)
INNER JOIN ProcedureTypes ON T.c.value('.','varchar(20)') = ProcedureTypes.TypeID
)
/*Finally, group the procedures type names by procedure id*/
select id,
STUFF((
SELECT ', ' + [Name]
FROM Procdure_With_Type inn
WHERE (Procdure_With_Type.ID = inn.ID)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS NameValues
from Procdure_With_Type
group by ID
You can't have a select statement as a parameter for a function, so instead of this:
SELECT * FROM dbo.[Split]((select RequestType from GPsProcedures), ',')
Use this:
select S.*
from GPsProcedures P
cross apply dbo.[Split](P.RequestType, ',') S

Casting comma separated to integer for IN clause

I have three tables estimate, location and department. Now I am JOINing tables location and estimate to get desired results.
Query
SELECT e.id, e.department_ids FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
For above requirement this query was working fine.
Now I want relevant department names as well. So I am using this query
Query
SELECT e.id, e.department_ids, (SELECT group_concat(department, ', ') FROM department WHERE id IN (e.department_ids)) as departmentName FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
which gives me only departments with single department id.
Although if I hardcode e.department as "2, 5" I am getting desired result
Query
SELECT e.id, e.department_ids, (SELECT group_concat(department, ', ') FROM department WHERE id IN (2, 5)) as departmentName FROM estimate e JOIN location l ON e.location_id = l.id WHERE e.user_id = '1' and e.delete_flag = 0 and l.active_flag = 1
Result
I tried cast(e.department_ids as integer), but this is also taking single department_id per row. Is there any function I can cast whole string of e.departments (i.e. "4, 2") so that I can pass that in IN clause?
I got solution for the same in oracle, I could find it's equivalent for sqlite.
I got the desired result using GROUP BY clause.

Consolidating values from multiple tables

I have an application which has data spread accross 2 tables.
There is a main table Main which has columns - Id , Name, Type.
Now there is a Sub Main table that has columns - MainId(FK), StartDate,Enddate,city
and this is a 1 to many relation (each main can have multiple entries in submain).
Now I want to display columns Main.Id, City( as comma seperated from various rows for that main item from submain), min of start date(from submain for that main item) and max of enddate( from sub main).
I thought of having a function but that will slow things up since there will be 100k records. Is there some other way of doing this. btw the application is in asp.net. Can we have a sql query or some linq kind of thing ?
This is off the top of my head, but firstly I would suggest you create a user defined function in sql to create the city comma separated list string that accepts #mainid, then does the following:
DECLARE #listStr VARCHAR(MAX)
SELECT #listStr = COALESCE(#listStr+',' , '') + city
FROM submain
WHERE mainid = #mainid
... and then return #listStr which will now be a comma separated list of cities. Let's say you call your function MainIDCityStringGet()
Then for your final result you can simply execute the following
select cts.mainid,
cts.cities,
sts.minstartdate,
sts.maxenddate
from ( select distinct mainid,
dbo.MainIDCityStringGet(mainid) as 'cities'
from submain) as cts
join
( select mainid,
min(startdate) as 'minstartdate',
max(enddate) as 'maxenddate'
from submain
group by mainid ) as sts on sts.mainid = cts.mainid
where startdate <is what you want it to be>
and enddate <is what you want it to be>
Depending on how exactly you would like to filter by startdate and enddate you may need to put the where filter within each subquery and in the second subquery in the join you may then need to use the HAVING grouped filter. You did not clearly state the nature of your filter.
I hope that helps.
This will of course be in stored procedure. May need some debugging.
An alternative to creating a stored procedure is performing the complex operations on the client side. (untested):
var result = (from main in context.Main
join sub in context.SubMain on main.Id equals sub.MainId into subs
let StartDate = subs.Min(s => s.StartDate)
let EndDate = subs.Max(s => s.EndDate)
let Cities = subs.Select(s => s.City).Distinct()
select new { main.Id, main.Name, main.Type, StartDate, EndDate, Cities })
.ToList()
.Select(x => new
{
x.Id,
x.Name,
x.Type,
x.StartDate,
x.EndDate,
Cities = string.Join(", ", x.Cities.ToArray())
})
.ToList();
I am unsure how well this is supported in other implimentations of SQL, but if you have SQL Server this works a charm for this type of scenario.
As a disclaimer I would like to add that I am not the originator of this technique. But I immediately thought of this question when I came across it.
Example:
For a table
Item ID Item Value Item Text
----------- ----------------- ---------------
1 2 A
1 2 B
1 6 C
2 2 D
2 4 A
3 7 B
3 1 D
If you want the following output, with the strings concatenated and the value summed.
Item ID Item Value Item Text
----------- ----------------- ---------------
1 10 A, B, C
2 6 D, A
3 8 B, D
The following avoids a multi-statement looping solution:
if object_id('Items') is not null
drop table Items
go
create table Items
( ItemId int identity(1,1),
ItemNo int not null,
ItemValue int not null,
ItemDesc nvarchar(500) )
insert Items
( ItemNo,
ItemValue,
ItemDesc )
values ( 1, 2, 'A'),
( 1, 2, 'B'),
( 1, 6, 'C'),
( 2, 2, 'D'),
( 2, 4, 'A'),
( 3, 7, 'B'),
( 3, 1, 'D')
select it1.ItemNo,
sum(it1.ItemValue) as ItemValues,
stuff((select ', ' + it2.ItemDesc --// Stuff is just used to remove the first 2 characters, instead of a substring.
from Items it2 with (nolock)
where it1.ItemNo = it2.ItemNo
for xml path(''), type).value('.','varchar(max)'), 1, 2, '') as ItemDescs --// Does the actual concatenation..
from Items it1 with (nolock)
group by it1.ItemNo
So you see all you need is a sub query in your select that retrieves a set of all the values you need to concatenate and then use the FOR XML PATH command in that sub query in a clever way. It does not matter where the values you need to concatenate comes from you just need to retrieve them using the sub query.

Resources