SQLite Ranking Time Stamps - sqlite

I am new to SQL and am having trouble with a (fairly simple) query to rank time stamps.
I have one table with survey data from 2014. I am trying to determine the 'learning curve' for good customer satisfaction performance. I want to order and rank each survey at an agent level based on the time stamp of the survey. This would let me see what the average performance is when an agent has 5 total surveys, 10, 20 etc.
I imagine it should be something like (table name is tablerank):
select T1.*,
(select count(*)
from tablerank as T2
where T2.call_date > T1.call_date
) as SurveyRank
from tablerank as T1
where p1.Agent_ID = T2.Agent_ID;
For each agent, it would list each survey in order and tag a 1 for the earliest survey, a 2 for the second earliest, etc. Then I could Pivot the data in Excel and see the learning curve based on survey count rather than tenure or time (since surveys are more rare, sometimes you only get 1 or 2 in a month).

A correlated subquery must have the correlation in the subquery itself; any table names/aliases from the subquery (such as T2) are not visible in the outer query.
For ranking, you want to count earlier surveys, and you want to include the current survey so that the first one gets the rank number 1, so you need to use <= instead of >:
SELECT *,
(SELECT COUNT(*)
FROM tablerank AS T2
WHERE T2.Agent_ID = T1.Agent_ID
AND T2.call_date <= T1.call_date
) AS SurveyRank
FROM tablerank AS T1

Related

How to run ROWS UNBOUNDED PRECEDING on specific rows only

I have an SQL query that is calculating the running total of 2 columns. Every week, there is new data which will be added to DUMMY_TABLE and everytime when I run the running total for that table, it will calculate the running total of all preceding rows which I don't really need. I just need the running total for the data that has been newly inserted. It will be a waste of resource if I have to run the total of all previous rows. I would like to know if there is any way to return the running total only for the week newly inserted.
I tried to use where but it filters the data. The reason I am looking for this is if I have new data every week and the table is size of 10K records, the running total will re-calculate all the 10K records and the new data I inserted.
SELECT
RID,
FYFW,
VOL,
FAILED_VOL,
SUM(VOL) OVER (PARTITION BY RID, SUBSTR(TRIM(FYFW), 1, 4) ORDER BY RID,FYFW ROWS UNBOUNDED PRECEDING) AS YTD_VOL,
SUM(FAILED_VOL) OVER (PARTITION BY RID, SUBSTR(TRIM(FYFW), 1, 4) ORDER BY RID,FYFW ROWS UNBOUNDED PRECEDING) AS YTD_FAILED_VOL,
FROM DUMMY_TABLE
GROUP BY 1,2,3,4
ORDER BY 1,2;

Count observations arranged in multiple columns

I have a database with species ID in the rows (very large) and places where they occur in the columns (several sites). I need a summary of how many species are per site. My observations are categorical in some cases (present) or numerical (number of individuals), because they are from different database sources. Also, there are several na's in the entire database.
in R, I have been using functions to count observations one site at the time only.
I appreciate any help on how to count the observations from the different columns at the same time.
You could do just:
SELECT COUNT(*)
FROM tables
WHERE conditions
And in the conditions specify the different columns conditions
WHERE t.COLUMN1="THIS" AND t.COLUMN2="THAT"
or with a SUM CASE (probably the best idea in general):
SELECT grfield,
SUM(CASE when a=1 then 1 else 0 end) as tcount1,
SUM(CASE when a=2 then 1 else 0 end) as tcount2
FROM T1
GROUP by grfield;
Or in a more complex way you could do a subquery inside the count:
SELECT COUNT(*) FROM
(
SELECT DISTINCT D
FROM T1
INNER JOIN T2
ON A.T1=B.T2
) AS subquery;
You could do also several counts in subqueries... the possibilities are endless.

will converting from dateFrom/dateTo to period data type improve performance?

I have a really slow query and I'm trying to speed it up.
I have a target date range (dateFrom/dateTo) defined in a table with only one row I need to use as a limit against a table with millions of rows. Is there a best practice for this?
I started with one table with one row with dateFrom and dateTo fields. I can limit the rows in the large table by CROSS JOINing it with the small table and using the WHERE clause, like:
select
count(*)
from
tblOneRow o, tblBig b
where
o.dateFrom < b.dateTo and
o.dateTo >= b.dateFrom
or I can inner join the tables on the date range, like:
select
count(*)
from
tblOneRow o inner join
tblBig b on
o.dateFrom < b.dateTo and
o.dateTo >= b.dateFrom
but I thought if I changed my single-row table to use one field with a PERIOD data type instead of two fields with DATE data types, it could improve the performance. Is this a reasonable assumption? The explain isn't showing a time difference if I change it to:
select
count(*)
from
tblOneRow o inner join
tblBig b on
begin(o.date) < b.dateTo and
end(o.date) >= b.dateFrom
or if I convert the small table's date range to a PERIOD data type and join ON P_INTERSECT, like:
select
count(*)
from
tblOneRow o inner join
tblBig b on
o.date p_intersect period(b.dateFrom, b.dateTo + 1) is not null
to help the parsing engine with this join, would I need to define the fields on the large table with a period data type instead of two dates? I can't do that as I don't own that table, but if that's the case, I'll give up on improving performance with this method.
Thanks for your help.
I don't expect any difference between the first three Selects, Explain should be the same a product join (the optimizer should expect exactly one row, but as it's duplicated the estimated size should be the number of AMPs in your system). The last Select should be worse, because you apply a calculation (OVERLAPS would be more appropriate, but probably not better).
One way to improve this single row cross join would be a View (select date '...' as dateFrom, date '...' as dateTo) instead of the single row table. This should resolve the dates and result in hard-coded dateFrom/To instead of a product join.
Similar when you switch to Scalar Subqueries:
select
count(*)
from
tblBig b
where
(select min(o.dateFrom) from tblOneRow) < b.dateTo
and
(select min(o.dateTo) from tblOneRow) >= b.dateFrom

Split-apply-combine in SQLite

Is there an SQLite equivalent of by or the split-apply-combine strategy?
Specifically, I have a table with columns firm,flag. firm is an integer that takes on a few hundred values (a firm id), flag is an integer taking on the values {0,1}. There are hundreds of entries per firm. I would like to compute the mean of flag for each firm, then store that in the same table (not efficient, I know, as each value will be repeated multiple times).
You could use a subquery:
UPDATE MyTable
SET FlagAverage = (SELECT AVG(flag)
FROM MyTable AS T2
WHERE T2.firm = MyTable.firm)

Moving average filter in postgresql

I have a query that computes the moving average in a table over the last 7 days. My table has two columns date_of_data which is date type and is a date series with one day interval and val which is float.
with B as
(SELECT date_of_data, val
FROM mytable
group by date_of_data
order by date_of_data)
select
date_of_data,val, avg(val) over(order by date_of_data rows 7 preceding)mean7
from B
order by date_of_data;
I want to compute a moving filter for 7 days. It means that for every row , the moving window would contain the last 3 days, the row itself and 3 succeeding rows.I cannot find a command to take into account the succeeding rows. Can anybody help me on this?
Try this:
select date_of_data,
val,
avg(val) over(order by date_of_data ROWS BETWEEN 3 preceding AND 3 following) as mean7
from mytable
order by date_of_data;

Resources