I wish to group data in a table and select the most up to date (last) non null field for each column. I was hoping to use window functions to partition the data, however, the window function yields one result per row in the window.
Consider the following example, there is a table which contains the rows personid and firstname and lastname. Updates to persons are stored by inserting a row with the same personid and non NULL values for the updated fields.
To retrieve the current state one has to select the last non NULL values for each column grouped by personid.
Here is my attempt:
Data:
CREATE TABLE IF NOT EXISTS "test" (
"id" INTEGER,
"firstname" TEXT,
"lastname" TEXT,
"personid" INTEGER,
PRIMARY KEY("id")
);
INSERT INTO "test" ("id","firstname","lastname","personid") VALUES (1,'Mark','Twain',1);
INSERT INTO "test" ("id","firstname","lastname","personid") VALUES (2,'Tom','hacksaw',2);
INSERT INTO "test" ("id","firstname","lastname","personid") VALUES (3,'Maximus',NULL,1);
Query:
SELECT * FROM
(SELECT
personid,
FIRST_VALUE(firstname) OVER (PARTITION BY personid ORDER BY firstname IS NULL, id DESC) AS firstname,
FIRST_VALUE(lastname) OVER (PARTITION BY personid ORDER BY lastname IS NULL, id DESC) AS lastname
FROM
test)
GROUP BY
personid
Note that I have to filter my result by pumping it through a GROUP BY, this makes me feel that the window function FIRST_VALUE is computed multiple times and feels inefficient. Is there a way to window data and get one result per group? (Since I have different orderings to filter out the null values for each respective column I do not see an efficient way using group by clauses)
Related
Consider the following schema and table:
CREATE TABLE IF NOT EXISTS `names` (
`id` INTEGER,
`name` TEXT,
PRIMARY KEY(`id`)
);
INSERT INTO `names` VALUES (1,'zulu');
INSERT INTO `names` VALUES (2,'bene');
INSERT INTO `names` VALUES (3,'flip');
INSERT INTO `names` VALUES (4,'rossB');
INSERT INTO `names` VALUES (5,'albert');
INSERT INTO `names` VALUES (6,'zuse');
INSERT INTO `names` VALUES (7,'rossA');
INSERT INTO `names` VALUES (8,'juss');
I access this table with the following query:
SELECT *
FROM names
ORDER BY name
LIMIT 10
OFFSET 4;
Where offset 4 is used because it's the rowid (in the ordered list) to the first occurance of 'R%' names. This returns:
1="7" "rossA"
2="4" "rossB"
3="1" "zulu"
4="6" "zuse"
My question is, is there an SQL statement which can return the OFFSET value (in the R case above its 4) given a starting first letter please? (I don't really want to resort to stepping() through results, counting rows, until first 'R%' is reached!)
I've tried the following without success:
SELECT MIN(ROWID)
FROM
(
SELECT *
FROM names
ORDER BY name
)
WHERE name LIKE 'R%'
It always returns single row of NULL data.
As background, this table is a phone book list and I want to provide subset of results (from main table) back to caller, starting at a initial letter offset.
Just count the rows before the string of interest:
select count(*) from names where name < 'r';
The following has a number of options. Basically your issues is that the sub-query doesn't return the roiwd hencne NULL as the minimum. However, there is no need to use the rowid directly as the id column is an alias of the rowid, so that could be used:-
SELECT name, id, MIN(rowid), min(id) -- shows how rowid and id are the same
FROM
(
SELECT rowid, * -- returns rowid from the subquery so min(rowid) now works
FROM names
ORDER BY name
)
WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1 -- Will effectivley do the same (no need for the sub-query)
Extra columns added for demonstration.
As such your query could be :-
SELECT min(rowid) FROM names where name LIKE 'R%';
Or :-
SELECT min(id) FROM names where name LIKE 'R%';
You could also use :-
SELECT id FROM names WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1;
Or :-
SELECT rowid FROM names WHERE name LIKE 'R%' ORDER BY id ASC LIMIT 1;
So I've been looking at this for the past week and learning. I'm used to SQL Server not SQLite. I understand RowId now, and that if I have an "id" column of my own (for convenience) it will actually use RowId. I've done running totals in SQL Server using ROW_NUMBER, but that doesn't seem to be an option with SQLite. The most useful post was...
How do I calculate a running SUM on a SQLite query?
My issue is that it works as long as I have data that I will keep adding to at the "bottom" of the table. I say "bottom" and not bottom because my display of the data is always sorted based on some other column such as a month. So in other words if I insert a new record for a missing month it will get inserted with a higher "id" (aka _RowId"). My running total below that month now needs to reflect this new data for all subsequent months. This means I cannot order by "id".
With SQL Server, ROW_NUMBER took care of my sequencing because in the select where I use a.id > running.id, I would have used a.rownum > running.rownum
Here's my table
CREATE TABLE `Test` (
`id` INTEGER,
`month` INTEGER,
`year` INTEGER,
`value` INTEGER,
PRIMARY KEY(`id`)
);
Here's my query
WITH RECURSIVE running (id, month, year, value, rt) AS
(
SELECT id, month, year, value, value
FROM Test AS row1
WHERE row1.id = (SELECT a.id FROM Test AS a ORDER BY a.id LIMIT 1)
UNION ALL
SELECT rowN.id, rowN.month, rowN.year, rowN.value, (rowN.value + running.rt)
FROM Test AS rowN
INNER JOIN running ON rowN.id = (
SELECT a.id FROM Test AS a WHERE a.id > running.id ORDER BY a.id LIMIT 1
)
)
SELECT * FROM running
I can order my CTE with year,month,id similar to how it is suggested in original example I linked above. However unless I'm mistaken that example solution relies on records in the table already ordered by year, month, id. If I'm right if I insert an earlier "month", then it will break because the "id" will have the largest value of all the _RowId_s.
Appreciate if someone can set me straight.
I have this table
CREATE TABLE "INGREDIENTS" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL ,
"material" VARCHAR,
"type" VARCHAR,
"company" VARCHAR
)
and I want to add a row
INSERT INTO "INGREDIENTS" VALUES('material1','type1','company1');
and I get an error, ... has 4 columns but 3 values supplied
However, I want the row to get the id value +1 from the previous row ..
You need to specify in which columns you insert
INSERT INTO INGREDIENTS (material, type, company)
VALUES ('material1', 'type1', 'company1');
You should actually always specify the columns. If you don't and your table changes then your queries will start to break.
Sqlite doesn't have a row number function. My database however could have several thousands of records. I need to sort a table based upon a date (the date field is actually an INTEGER) and then return a specific range of rows. So if I wanted all the rows from 600 to 800, I need to somehow create a row number and limit the results to fall within my desired range. I cannot use RowID or any auto-incremented ID field because all the data is inserted with random dates. The closest I can get is this:
CREATE TABLE Test (ID INTEGER, Name TEXT, DateRecorded INTEGER);
Insert Into Test (ID, Name, DateRecorded) Values (5,'fox', 400);
Insert Into Test (ID, Name, DateRecorded) Values (1,'rabbit', 100);
Insert Into Test (ID, Name, DateRecorded) Values (10,'ant', 800);
Insert Into Test (ID, Name, DateRecorded) Values (8,'deer', 300);
Insert Into Test (ID, Name, DateRecorded) Values (6,'bear', 200);
SELECT ID,
Name,
DateRecorded,
(SELECT COUNT(*)
FROM Test AS t2
WHERE t2.DateRecorded > t1.DateRecorded) AS RowNum
FROM Test AS t1
where RowNum > 2
ORDER BY DateRecorded Desc;
This will work except it's really ugly. The Select Count(*) will result in carrying out that Select statement for every row encountered. So if I have several thousands of rows, that will be a very poor performance.
This is what the LIMIT/OFFSET clauses are for:
SELECT *
FROM Test
ORDER BY DateRecorded DESC
LIMIT 200 OFFSET 600
I am trying to run some analysis on sales data using SQLite.
At the moment, my table has several columns including a unique transaction ID, product name, quantity of that product and value of that product. For each transaction, there can be several records, because each distinct type of product in the basket has its own entry.
I would like to add two new columns to the table. The first one would be a total for each transaction ID which summed up the total quantity of all products in that basket.
I realize that there would be duplication in the table, as the repeated transaction IDs would all have the total. The second one would be similar but in value terms.
I unfortunately cannot do this by creating a new table with the values I want calculated in Excel, and then joining it to the original table, because there are too many records for Excel.
Is there a way to get SQL to do the equivalent of a sumif in Excel?
I was thinking something along the lines of:
select sum(qty) where uniqID = ...
But I am stumped by how to express that it needs to sum all quantities where the uniqID is the same as the one in that record.
You wouldn't create a column like that in SQL. You would simply query for the total on the fly. If you really wanted a table-like object, you could create a view that held 2 columns; uniqID and the sum for that ID.
Let's set up some dummy data in a table; column a is your uniqID, b is the values you're summing.
create table tab1 (a int, b int);
insert into tab1 values (1,1);
insert into tab1 values (1,2);
insert into tab1 values (2,10);
insert into tab1 values (2,20);
Now you can do simple queries for individual uniqIDs like this:
select sum(b) from tab1 where a = 2;
30
Or sum for all uniqIDs (the 'group by' clause might be all you're groping for:) :
select a, sum(b) from tab1 group by a;
1|3
2|30
Which could be wrapped as a view:
create view totals as select a, sum(b) from tab1 group by a;
select * from totals;
1|3
2|30
The view will update on the fly:
insert into tab1 values (2,30);
select * from totals;
1|3
2|60
In further queries, for analysis, you can use 'totals' just like you would a table.