I am attempting to find the top n records when grouped by multiple attributes. I believe it is related to this problem, but I am having difficulty adapting the solution described to my situation.
To simplify, I have a table with columns (did is short for device_id):
id int
did int
dateVal dateTime
I am trying to find the top n device_id's for each day with the most rows.
For example (ignoring id and the time part of dateTime),
did dateVal
1 2017-01-01
1 2017-01-01
1 2017-01-01
2 2017-01-01
3 2017-01-01
3 2017-01-01
1 2017-01-02
1 2017-01-02
2 2017-01-02
2 2017-01-02
2 2017-01-02
3 2017-01-02
Finding the top 2 would yield...
1, 2017-01-01
3, 2017-01-01
2, 2017-01-02
1, 2017-01-02
My current naive approach is only giving me the top 2 across all dates.
--Using SQLite
select date(dateVal) || did
from data
group by date(dateVal), did
order by count(*) desc
limit 2
I'm using the concatenation operator so that I can later extract the rows.
I am using SQLite, but any general SQL explanation would be appreciated.
Similarly to this question, define a CTE that computes all device counts for your desired groups, then use it in a WHERE ... IN subquery, limited to the top 2 devices for that date:
WITH device_counts AS (
SELECT did, date(dateval) AS dateval, COUNT(*) AS device_count
FROM data
GROUP BY did, date(dateval)
)
SELECT did, date(dateval) FROM device_counts DC_outer
WHERE did IN (
SELECT did
FROM device_counts DC_inner
WHERE DC_inner.dateval = DC_outer.dateval
GROUP BY did, date(dateval)
ORDER BY DC_inner.device_count DESC LIMIT 2
)
ORDER BY date(dateval), did
I tested the query using sql server
select top 2 did, dateVal
from (select *, count(*) as c
from test
group by did,dateVal) as t
order by t.c desc
I am learning SQLite and I am currently posing the question whether there is a simple way of adding a sequential numbering to the output of a query. Underneath, I provide an example of what I am trying to achieve.
For instance, I have the following query:
SELECT
splTicker AS 'Ticker',
count(splTicker) AS '# of Splits'
FROM Splits
GROUP BY splTicker
ORDER BY count(splTicker) DESC, splTicker ASC;
The output of this query is as follows:
bash-3.2$ sqlite3 myShares < Queries/Split.sql
Ticker # of Splits
---------- -----------
AI.PA 7
ASML.AS 3
BN.PA 3
ALTR.LS 2
BOKA.AS 2
DG.PA 2
...
SON.LS 1
SU.PA 1
SW.PA 1
TEC.PA 1
UMI.BR 1
VIV.PA 1
VPK.AS 1
I am trying to add a sequential number to the rows to obtain the following output:
# Ticker # of Splits
-- ---------- -----------
1 AI.PA 7
2 ASML.AS 3
3 BN.PA 3
4 ALTR.LS 2
5 BOKA.AS 2
6 DG.PA 2
...
Currently, I use a workaround and add the row numbers post-query in Perl. I am posing the question whether I could do this directly in SQLite. The idea seems simple, but I have not found a solution yet. Any help would be appreciated.
Best regards,
GAM
Try this:
SELECT
(SELECT COUNT(*)
FROM Splits AS s2
WHERE s2.splTicker <= s1.splTicker) AS '#',
splTicker AS 'Ticker',
count(splTicker) AS '# of Splits'
FROM Splits s1
GROUP BY s1.splTicker
ORDER BY count(s1.splTicker) DESC, s1.splTicker ASC;
I use SQLite to log every user's every access to my server. Every time a user uses a function, I append a record to the database.
The database looks like:
usr_id fun_id
3 1 // user_3 used function_1
2 13 // user_2 used function_13
3 11 // user_3 used function_11
2 1 // user_2 used function_1
7 2 // ...
usr_id stands for a user, fun_id stands for functions like login / send_text / logout...
I want to know each function's usage, used by who and how many times, to plot with gnuplot. In short, I need this for plotting:
fun_id usr_id used_count
1 2 1 // user_2 used function_1 once
1 3 1 // user_3 used function_1 once
2 7 1 // user_7 used function_2 once
13 2 3 // user_2 used function_13 three times
How to generate this with a SQL query?
Just use count(*) along with a grouping:
select fun_id, usr_id, count(*) as used_count
from tablename
group by fun_id, usr_id
order by fun_id, usr_id;
Using SQL Server 2008 R2 we are looking for a way to select the shift hours that an employee has that are during the night which in the this case 22.00 and 6.00 +1.
Our problem becomes how to get the hours when the shift crosses midnight or how we get the overlap when a shift begins 05.30 to 22.30 and has an overlap in both the beginning and end of the shift.
Here is an example, theses are the data available in the database and the result we are looking for:
startDateTime | endDateTime | nightHours
--------------------------+---------------------------+----------------
2012-07-04 05:00:00.000 2012-07-04 23:00:00.000 2
2012-07-04 18:00:00.000 2012-07-05 05:00:00.000 7
Does anyone have an example or a few good pointer that we can use.
This may be overly complex, but it does work. We use a number of CTEs to construct useful intermediate representations:
declare #Times table (
ID int not null,
StartTime datetime not null,
EndTime datetime not null
)
insert into #Times (ID,StartTime,EndTime)
select 1,'2012-07-04T05:00:00.000','2012-07-04T23:00:00.000' union all
select 2,'2012-07-04T18:00:00.000','2012-07-05T05:00:00.000'
;With Start as (
select MIN(DATEADD(day,DATEDIFF(day,0,StartTime),0)) as StartDay from #Times
), Ends as (
select MAX(EndTime) EndTime from #Times
), Nights as (
select DATEADD(hour,-2,StartDay) as NightStart,DATEADD(hour,6,StartDay) as NightEnd from Start
union all
select DATEADD(DAY,1,NightStart),DATEADD(DAY,1,NightEnd) from Nights n
inner join Ends e on n.NightStart < e.EndTime
), Overlaps as (
select
t.ID,
CASE WHEN n.NightStart > t.StartTime THEN n.NightStart ELSE t.StartTime END as StartPeriod,
CASE WHEN n.NightEnd < t.EndTime THEN n.NightEnd ELSE t.EndTime END as EndPeriod
from
#Times t
inner join
Nights n
on
t.EndTime > n.NightStart and
t.StartTime < n.NightEnd
), Totals as (
select ID,SUM(DATEDIFF(hour,StartPeriod,EndPeriod)) as TotalHours
from Overlaps
group by ID
)
select
*
from
#Times t
inner join
Totals tot
on
t.ID = tot.ID
Result:
ID StartTime EndTime ID TotalHours
----------- ----------------------- ----------------------- ----------- -----------
1 2012-07-04 05:00:00.000 2012-07-04 23:00:00.000 1 2
2 2012-07-04 18:00:00.000 2012-07-05 05:00:00.000 2 7
You'll note that I had to add an ID column in order to get my correlation to work.
The Start CTE finds the earliest applicable midnight. The End CTE finds the last time for which we need to find overlapping nights. Then, the recursive Nights CTE computes every night between those two points in time. We then join this back to the original table (in Overlaps) to find those periods in each night which apply. Finally, in Totals, we compute how many hours each overlapping period contributed.
This should work for multi-day events. You might want to change the Totals CTE to use minutes, or apply some other rounding functions, if you need to count partial hours.
I think, the best way would be a function that takes start time and end time of the shift. Then inside the function have 2 cases: first when shift starts and ends on the same day and another case when starts on one day and finishes on the next one.
For the case when it starts and finishes on the same day do
#TotalOvernightHours=0
#AMDifference = Datediff(hh, #shiftStart, #6amOnThatDay);
if #AMDIfference > 0 than #TotalOvernightHours = #TotalOvernightHours + #AMDifference
#PMDifference Datediff(hh, #10pmOnThatDay, #ShiftEnd)
if #PMDifference > 0 than #TotalOvernightHours = #TotalOvernightHours + #PMDifference
For the case when start and finish are on different days pretend it is 2 shifts: first starts at #ShiftStart, but finishes at midnight. Second one starts at midnight, finishes at #ShiftEnd. And for every shift do apply the logic above.
In case you have shifts that a longer than 24 hours, break them up into smaller sub-shifts, where midnight is a divider. So if you have shift starting on 1 Jun 19:00 and finishing at 3 Jun 5:00 then you would end up with three sub-shifts:
1 Jun 19:00 - 1 Jun 24:00
2 Jun 00:00 - 2 Jun 24:00
3 Jun 00:00 - 3 Jun 5:00
And for every sub-shift you do calculate the overnight hours.
I'd probably would write a function that calculates overnight hours for one 24hrs period and another function that breaks the whole shift into 24hrs chunks, then feeds it into the first function.
p.s. this is not sql, only pseudo-code.
p.p.s. This would work only if you have ability to create functions. And it would get you a clean, easy-to ready code.
i have a table with the following layout.
Email Blast Table
EmailBlastId | FrequencyId | UserId
---------------------------------
1 | 5 | 1
2 | 2 | 1
3 | 4 | 1
Frequency Table
Id | Frequency
------------
1 | Daily
2 | Weekly
3 | Monthly
4 | Quarterly
5 | Bi-weekly
I need to come up with a grid display on my asp.net page as follows.
Email blasts per month.
UserId | Jan | Feb | Mar | Apr |..... Dec | Cumulative
-----------------------------------------------------
1 7 6 6 7 6 #xx
The only way I can think of doing this is as below, for each month have a case statement.
select SUM(
CASE WHEN FrequencyId = 1 THEN 31
WHEN FrequencyId = 2 THEN 4
WHEN FrequencyId = 3 THEN 1
WHEN FrequencyId = 4 THEN 1
WHEN FrequencyId = 5 THEN 2 END) AS Jan,
SUM(
CASE WHEN FrequencyId = 1 THEN 28 (29 - leap year)
WHEN FrequencyId = 2 THEN 4
WHEN FrequencyId = 3 THEN 1
WHEN FrequencyId = 4 THEN 0
WHEN FrequencyId = 5 THEN 2 END) AS Feb, etc etc
FROM EmailBlast
Group BY UserId
Any other better way of achieving the same?
Is this for any given year? I'm going to assume you want the schedule for the current year. If you want a future year you can always change the DECLARE #now to specify any future date.
"Once in 2 weeks" (usually known as "bi-weekly") doesn't fit well into monthly buckets (except for February in a non-leap year). Should that possibly be changed to "Twice a month"?
Also, why not store the coefficient in the Frequency table, adding a column called "PerMonth"? Then you only have to deal with the Daily and Quarterly cases (and is it an arbitrary choice that this will happen only in January, April, and so on?).
Assuming that some of this is flexible, here is what I would suggest, assuming this very minor change to the table schema:
USE tempdb;
GO
CREATE TABLE dbo.Frequency
(
Id INT PRIMARY KEY,
Frequency VARCHAR(32),
PerMonth TINYINT
);
CREATE TABLE dbo.EmailBlast
(
Id INT,
FrequencyId INT,
UserId INT
);
And this sample data:
INSERT dbo.Frequency(Id, Frequency, PerMonth)
SELECT 1, 'Daily', NULL
UNION ALL SELECT 2, 'Weekly', 4
UNION ALL SELECT 3, 'Monthly', 1
UNION ALL SELECT 4, 'Quarterly', NULL
UNION ALL SELECT 5, 'Twice a month', 2;
INSERT dbo.EmailBlast(Id, FrequencyId, UserId)
SELECT 1, 5, 1
UNION ALL SELECT 2, 2, 1
UNION ALL SELECT 3, 4, 1;
We can accomplish this using a very complex query (but we don't have to hard-code those month numbers):
DECLARE #now DATE = CURRENT_TIMESTAMP;
DECLARE #Jan1 DATE = DATEADD(MONTH, 1-MONTH(#now), DATEADD(DAY, 1-DAY(#now), #now));
WITH n(m) AS
(
SELECT TOP 12 m = number
FROM master.dbo.spt_values
WHERE number > 0 GROUP BY number
),
months(MNum, MName, StartDate, NumDays) AS
( SELECT m, mn = CONVERT(CHAR(3), DATENAME(MONTH, DATEADD(MONTH, m-1, #Jan1))),
DATEADD(MONTH, m-1, #Jan1),
DATEDIFF(DAY, DATEADD(MONTH, m-1, #Jan1), DATEADD(MONTH, m, #Jan1))
FROM n
),
grp AS
(
SELECT UserId, MName, c = SUM (
CASE x.Id WHEN 1 THEN NumDays
WHEN 4 THEN CASE WHEN MNum % 3 = 1 THEN 1 ELSE 0 END
ELSE x.PerMonth END )
FROM months CROSS JOIN (SELECT e.UserId, f.*
FROM EmailBlast AS e
INNER JOIN Frequency AS f
ON e.FrequencyId = f.Id) AS x
GROUP BY UserId, MName
),
cumulative(UserId, total) AS
(
SELECT UserId, SUM(c)
FROM grp GROUP BY UserID
),
pivoted AS
(
SELECT * FROM (SELECT UserId, c, MName FROM grp) AS grp
PIVOT(MAX(c) FOR MName IN (
[Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])
) AS pvt
)
SELECT p.*, c.total
FROM pivoted AS p
LEFT OUTER JOIN cumulative AS c
ON p.UserId = c.UserId;
Results:
UserId Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec total
1 7 6 6 7 6 6 7 6 6 7 6 6 76
Clean up:
DROP TABLE dbo.EmailBlast, dbo.Frequency;
GO
In fact the schema change I suggested doesn't really buy you much, it just saves you two additional CASE branches inside the grp CTE. Peanuts, overall.
I think you're going to end up with a lot more complicated logic. Sure Jan has 31 days.. but Feb doesn't... and Feb changes depending on the year. Next, are email blasts sent even on weekends and holidays or are certain days skipped for various reasons... If that's the case then the number of business days for a given month changes each year.
Next the number of full weeks in a given month also changes year by year. What happens to those extra 4 half weeks? Do they go on the current or next month? What method are you using to determine that? For an example of how complicated this gets read: http://en.wikipedia.org/wiki/ISO_week_date Specifically the part where it talks about the first week, which actually has 9 different definitions.
I'm usually not one to say this, but you might be better off writing this with regular code instead of a sql query. Just issue a 'select * from emailblast where userid = xxx' and transform it using a variety of code methods.
Depends on what you're looking for. Suggestion 1 would be to track your actual email blasts (with a date :-).
Without actual dates, whatever you come-up with for one month will be the same for every month.
Anyway, If you're going to generalize, then I'd suggest using something other than ints -- like maybe floats or decimals. Since your output based on the tables listed in your post can only ever approximate what actually happens (e.g., January actually has 4-1/2 weeks, not 4), you'll have a compounding error-bounds over any range of months -- getting worse, the further out you extrapolate. If you output an entire 12 months, for example, your extrapolation will under-estimate by over 4 weeks.
If you use floats or decimals, then you'll be able to come much closer to what actually happens. For starters: find a common unit of measure (I'd suggest using a "day") E.g., 1 month = 365/12 days; 1 quarter = 365/4 days; 1 2week = 14 days; etc.
If you do that -- then your user who had one 1 per quarter actually had 1 per 91.25 days; 1 per week turns into 1 per 7 days; 1 per BiWeek turns into 1 per 14 days.
**EDIT** -- Incidentally, you could store the per-day value in your reference table, so you didn't have to calculate it each time. For example:
Frequency Table
Id | Frequency | Value
-------------------------------
1 | Daily | 1.0
2 | Weekly | .14286
3 | Monthly | .03288
4 | Quarterly | .01096
5 | Once in 2 weeks | .07143
Now do math -- (1/91.25 + 1/7 + 1/14) needs a common denom (like maybe 91.25 * 14), so it becomes (14/1277.5 + 182.5/1277.5 + 91.25/1277.5).
That adds-up to 287.75/1277.5, or .225 emails per day.
Since there are 365/12 days per month, multiple .225 * (365/12) to get 6.85 emails per month.
Your output would then look something like this:
Email blasts per month.
UserId | Jan | Feb | Mar | Apr |..... Dec | Cumulative
-----------------------------------------------------
1 6.85 6.85 6.85 6.85 6.85 #xx
The math may seem a little tedious, but once you step it out on your code, you'll never have to do it again. Your results will be more accurate (I rounded to 2 decimal places, but you could go further out if you wanted to). And if your company is using this data to determine budgets / potential income for the upcoming year, that might be worth it.
Also worth mentioning is that after YOU get done extrapolating (and the error bounds that entails), your consumers of this output will do THEIR OWN extrapolating, not on the raw data, but on your output. So it's kind of a double-whammy of error bounds. The more accurate you can be early-on, the more reliable these numbers will be at each subsequent levels.
You might want to consider adding a 3rd table called something like Schedule.
You could structure it like this:
MONTH_NAME
DAILY_COUNT
WEEKLY_COUNT
MONTHLY_COUNT
QUARTERLY_COUNT
BIWEEKLY_COUNT
The record for JAN would be
JAN
31
4
1
1
2
Or you could structure it like this:
MONTH_NAME
FREQUENCY_ID
EMAIL_COUNT
and have multiple records for each month:
JAN 1 31
JAN 2 4
JAN 3 1
JAN 4 1
JAN 5 2
I let you figure out if the logic to retrieve this is better than your CASE structure.