Calculate moving average of a dataset - u-sql

Sample representation of my dataset is below and I want to calculate the moving 7 day average of the total employees for a given set of dept, subgroup and team at a given date.
Is there some way I can pass this rowset to a C# method that can calculate the moving average? Is there another more efficient way to do this.

based on your data I did the following queries:
SELECT * FROM
(VALUES
(new DateTime(2019,01,01),"D1","S1","T1",20),
(new DateTime(2019,01,02),"D1","S1","T1",33),
(new DateTime(2019,01,03),"D1","S1","T1",78),
(new DateTime(2010,01,05),"D1","S2","T2",77)
) AS T(date,Dept, Subgroup, Team, Total_Employees);
#moving_Average_Last_7_Days =
SELECT DISTINCT
current.date,
current.Dept,
current.Subgroup,
current.Team,
current.Total_Employees,
AVG(lasts.Total_Employees) OVER(PARTITION BY lasts.Dept, lasts.Subgroup, lasts.Team,current.date) AS Moving_Average_Last_7_Days
FROM #moving_Average_Last_7_Days AS current
CROSS JOIN
#moving_Average_Last_7_Days AS lasts
WHERE (current.date - lasts.date).Days BETWEEN 0 AND 6
;
Please tell me if is this that you want to achieve!

Related

Combining COUNTA() and AVERAGEX() in DAX / Power BI

I have a simple data set with training sessions for some athletes. Let's say I want to visualize how many training sessions are done as an average of the number of athletes, either in total or divided by the clubs that exist. I hope the data set is somewhat self-describing.
To norm the number of activities by the number of athletes I use two measures:
TotalSessions = COUNTA(Tab_Sessions[Session key])
AvgAthlete = AVERAGEX(VALUES(Tab_Sessions[Athlete]),[TotalSessions])
I give AvgAthlete as the desired value in both visuals shown below. If I make a filter on the clubs the values are as expected, but with no filter applied I get some strange values
What I guess happens is that since Athlete B doesn't do any strength, Athlete B is not included in the norming factor for strength. Is there a DAX function that can solve this?
If I didn't have the training sessions as a hierarchy (Type-Intensity), it would be pretty straightforward to do some kind of workaround with a calculated column, but it won't work with hierarchical categories. The expected results calculated in excel are shown below:
Data set as csv:
Session key;Club;Athlete;Type;Intensity
001;Fast runners;A;Cardio;High
002;Fast runners;A;Strength;Low
003;Fast runners;B;Cardio;Low
004;Fast runners;B;Cardio;High
005;Fast runners;B;Cardio;High
006;Brutal boxers;C;Cardio;High
007;Brutal boxers;C;Strength;High
If you specifically want to aggregate this across whatever choice you have made in your Club selection, then you simply write out a simple measure that does that:
AvgAthlete =
VAR _athletes =
CALCULATE (
DISTINCTCOUNT ( 'Table'[Athlete] ) ,
ALLEXCEPT ( 'Table' , 'Table'[Club] )
)
RETURN
DIVIDE (
[Sessions] ,
_athletes
)
Here we use a distinct count of values in the Athlete column, with all filters removed apart from on the Club column. This is, as far as I interpret your question, the denominator you are after.
Divide the total number of sessions on this number of athletes. Here is the result:

Oracle 11g PLSQL - Splitting A Record Out Into Constituent Records Over Time - Row Generating

I have a dataset (a view) that has a numeric field "WR_EST_MHs". If that field exceeds a certain number of man hours (120 or 60, depending on 2 other fields' values), I need to split it out into constiuent records and spread those hours over future weeks.
The OH_UG_Key and 1kMCM_Flag fields determine the threshold for splitting. For example, if the OH_UG = 1 AND 1kMCM_Flag = 'N' and the WR_EST_MHs > 120, then spread the WR_EST_MHs value over as many records as is necessary, in 120 MH increments, changing only the WRSchedDate and WRSchedDate_Key fields (advancing each by one week).
Each OH_UG / 1kMCM_Flag / WR_EST_MHs scenario is as follows:
This is an example of what I need to do:
I thought that something like this might work, but I haven't worked with levels before:
with cte as
2 (Select * from "STJOF"."vfactScheduledWAWork"
5 )
6 select WR_Key, WP_Key, WRShedDate, DistSA_Key_Hash, CrewHQ_Key_Hash, Priority_Key_Hash, JobType_Key_Hash, WRStatus_Key_Hash, PerfBy_Key, OHUG_Key, 1kMCM_Flag, WR_EST_MHs
7 from cte cross join table(cast(multiset(select level from dual
8 connect by level >= WR_EST_MHs / 120
9 ) as sys.odcinumberlist))
10 order by WR_Key;
I also thought this could be done with a "tally table" which I have a little experience with. I really don't know where to begin on this one.
So I would say that a "Tally Table" will work if it is applied correctly. (Or, in this case, a tally view.)
First, break the logic for the hour breakout into a function so we don't have case when everywhere like so:
CREATE OR REPLACE FUNCTION get_hour_breakout(in_ohug_key IN NUMBER, in_1kmcm_flag in varchar2, in_tot_hours in number)
RETURN number
IS hours number;
BEGIN
hours:=
case when in_ohug_key=2 and in_1kmcm_flag='N' and in_tot_hours>60 then 60 else
case when in_ohug_key=2 and in_1kmcm_flag='Y' and in_tot_hours>60 and in_tot_hours<=120 then 60 else
case when in_ohug_key=2 and in_1kmcm_flag='Y' and in_tot_hours>120 then 120 else
120
end
end
end;
RETURN(hours);
END get_hour_breakout;
This way, if the hour breakout logic changes, it can be tweaked in one place.
Second, join to a dynamic "tally" view like so:
select wr_key,
WP_Key,
wrscheddate+idxkey.nnn*7 wrscheddate,
to_char(wrscheddate+idxkey.nnn*7,'yyyymmdd') WRSchedDate_Key,
OHUG_Key,
kMCM_Flag,
case when (wr_est_mhs-idxkey.nnn*get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs))>=get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) then get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) else wr_est_mhs-idxkey.nnn*get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) end wr_est_mhs
from yourView inner join (SELECT ROWNUM-1 nnn
FROM ( SELECT 1 just_a_column
FROM dual
CONNECT BY LEVEL <= 52
)
) idxkey on vwrk.wr_est_mhs/get_hour_breakout(ohug_key, kmcm_flag, wr_est_mhs) > idxkey.nnn
By using the connect by level we, in effect, generate a bunch of zero indexed rows, then by joining to it with the hours divided by the breakout greater than the feed number we get a few rows for each group.
For example, if the function returns 120 and the hours are 100 you get a single row, so it stays 1 to 1. If the function returns 120 and the hours are 500, however, you get 5 rows because 500/120=4.1666666…, which in the join gives rows 4,3,2,1,0. Then the rest is simple math to determine the number of hours per breakout.
This could also be improved by moving the function call into the lower view so it is only used once per row. And the inline tally view could be made into it's own view, depends on the maintainability you need to build into it.

SQLite order results by smallest difference

In many ways this question follows on from my previous one. I have a table that is pretty much identical
CREATE TABLE IF NOT EXISTS test
(
id INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL,
c INTEGER NOT NULL,
d INTEGER NOT NULL,
weather INTEGER NOT NULL);
in which I would typically have entries such as
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
INSERT INTO test (a,b,c,d) VALUES(1,2,5,5,10100306);
INSERT INTO test (a,b,c,d) VALUES(1,5,5,5,11100306);
INSERT INTO test (a,b,c,d) VALUES(5,5,5,5,21101306);
Typically this table would have multiple rows with the some/all of b, c and d values being identical but with different a and weather values. As per the answer to my other question I can certainly issue
WITH cte AS (SELECT *, DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn FROM test where a = 1) SELECT * FROM cte WHERE rn < 3;
No issues thus far. However, I have one further requirement which arises as a result of the weather column. Although this value is an integer it is in fact a composite where each digit represents a "banded" weather condition. Take for example weather = 20100306. Here 2 represents the wind direction divided up into 45 degree bands on the compass, 0 represents a wind speed range, 1 indicates precipitation as snow etc. What I need to do now while obtaining my ordered results is to allow for weather differences. Take for example the first two rows
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
Though otherwise similar they represent rather different weather conditions - the fourth number is four as opposed to 0 indicating a higher precipitation intensity brand. The WITH cte... above would rank the first two rows at the top which is fine. But what if I would rather have the row that differs the least from an incoming "weather condition" of 30130306? I would clearly like to have the second row appearing at the top. Once again, I can live with the "raw" result returned by WITH cte... and then drill down to the right row based on my current "weather condition" in Java. However, once again I find myself thinking that there is perhaps a rather neat way of doing this in SQL that is outwith my skill set. I'd be most obliged to anyone who might be able to tell me how/whether this can be done using just SQL.
You can sort the results 1st by DENSE_RANK() and 2nd by the absolute difference of weather and the incoming "weather condition":
WITH cte AS (
SELECT *,
DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn
FROM test
WHERE a = 1
)
SELECT a,b,c,d,weather
FROM cte
WHERE rn < 3
ORDER BY rn, ABS(weather - ?);
Replace ? with the value of that incoming "weather condition".

Automatically count DB records by interval and write result periodically to aggregation table

I have a SQLITE3 DB with following 3 column layout
typ (1=gas or 0=electrical power) | time (seconds since epoch) | value (float)
In there, I document events from a gas meter which fires every 10 liter of consumed gas. This is (when the gas heating is active) once every ~20 seconds. The value written together with the timestamp is 0 (zero).
I want to automatically fill an aggregaton table with the count of all records within an interval of 10 minutes.
I had success with this query to get the counts within the intervals:
select time/600*600+600 _time, count(*) _count
from data
where typ = 1 and value = 0
group by _time
order by _time
But how would I achive the following:
run this query regularely every 10 minutes (or at every INSERT with a TRIGGER?) at xx:10 / xx:20 / xx:20 / ...
write the resulting count of only the last 10 minutes to an aggregation table together with the interval end time.
I of course could do this with a program (e.g. PHP) but I'd prefer a DB-only solution if possible.
Thanks for any help.
This trigger will run for every inserted row, and tries to insert a corresponding row in an aggregate table if one does not already exist. Then it increments the counter value in the aggregate table for the timespan of the newly inserted row.
create trigger after insert on data
begin
insert or ignore into aggregateData(startTime, counter) values ((new.time / 600) * 600, 0);
update aggregateData set counter = counter + 1 where startTime = (new.time / 600) * 600;
end;
I think that I found an easier solution which in the end creates the same result:
Just turn my aggregate query into a view:
CREATE VIEW _aggregate as
select time/600*600+600 _time, count(*) _count
from data
where typ = 1 and value = 0
group by _time
order by _time
This gives me exactly my desired result if I do a:
select * from _aggregate
It's good enough to have the aggregated values at runtime and not to store them. Or do you see a substantial difference to your solution?

How to UPDATE multiple columns using a correlated subquery in SQLite?

I want to update multiple columns in a table using a correlated subquery. Updating a single column is straightforward:
UPDATE route
SET temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
However, I'd like to update several columns of the route table. As the subquery is much more complex in reality (JOIN with a nested subquery using SpatiaLite functions), I want to avoid repeating it like this:
UPDATE route
SET
temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
error = (SELECT amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
Ideally, SQLite would let me do something like this:
UPDATE route
SET (temperature, error) = (SELECT amb_temp.temperature, amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
Alas, that is not possible. Can this be solved in another way?
Here's what I've been considering so far:
use INSERT OR REPLACE as proposed in this answer. It seems it's not possible to refer to the route table in the subquery.
prepend the UPDATE query with a WITH clause, but I don't think that is useful in this case.
For completeness sake, here's the actual SQL query I'm working on:
UPDATE route SET (temperature, time_distance) = ( -- (C)
SELECT -- (B)
temperature.Temp,
MIN(ABS(julianday(temperature.Date_HrMn)
- julianday(route.date_time))) AS datetime_dist
FROM temperature
JOIN (
SELECT -- (A)
*, Distance(stations.geometry,route.geometry) AS distance
FROM stations
WHERE EXISTS (
SELECT 1
FROM temperature
WHERE stations.USAF = temperature.USAF
AND stations.WBAN_ID = temperature.NCDC
LIMIT 1
)
GROUP BY stations.geometry
ORDER BY distance
LIMIT 1
) tmp
ON tmp.USAF = temperature.USAF
AND tmp.WBAN_ID = temperature.NCDC
)
High-level description of this query:
using geometry (= longitude & latitude) and date_time from the route table,
(A) find the weather station (stations table, uniquely identified by the USAF and NCDC/WBAN_ID columns)
closest to the given longitude/latitude (geometry)
for which temperatures are present in the temperature table
(B) find the temperature table row
for the weather station found above
closest in time to the given timestamp
(C) store the temperature and "time_distance" in the route table

Resources