I would like to count parts of the dataset.
This is my dataset:
YEAR ㅣ FIRMCode ㅣ FIRMName
2000 ㅣ 10 ㅣ 1
2001 ㅣ 11 ㅣ 1
.
.
2020 ㅣ 17 ㅣ 1
2000 ㅣ 11 ㅣ 2
.
.
2020 ㅣ 16 ㅣ 2
I want to count the number of types of firm codes each year, regardless of the firm name. The firm codes are from 10 to 20. So my output would look like:
YEAR ㅣ FIRMCode(10) ㅣ FIRMCode(11) ... ㅣ FIRMCode(20)
2000 ㅣ #firms with code10 ㅣ #firms with code11
.
.
2020 ㅣ #firms with code10 ㅣ #firms(11)
Thank you so much in advance.
One way is to use SQL to compute count (distinct FIRMName) within a group and then Proc REPORT or Proc TABULATE to present the distinct counts as columns.
Your data does not appear to model a scenario in which a firm can have multiple codes within a year; if so, the need for distinct is not specifically needed.
Example:
Compute the number of firms with a code within a year, maintaining a categorical data structure. Present the counts in a wide layout presentation. Also, show a data transformation from two level category into a wide data set (generally not recommended)
data have;
call streaminit (20210425);
do firmname = 'a', 'b', 'c', 'd';
do year = 2000 to 2010;
code = 10 + rand('integer', 0, 5);
output;
end;
end;
run;
proc sql;
create table counts as
select year, code, count(distinct firmname) as firm_ucount
from have
group by year, code
order by year, code;
proc tabulate data=counts;
class year code;
var firm_ucount;
table year, code * firm_ucount='' * mean='' * f=4.;
run;
proc report data=counts;
columns year firm_ucount,code;
define year / group;
define code / '' across;
define firm_ucount / '#firms with code';
run;
proc transpose data=counts out=want prefix=n_code_;
by year;
id code;
var firm_ucount;
run;
TABULATE, count class in column-dimension
REPORT, count variable ACROSS
TRANSPOSE, code as part of column name
Related
First, I'm sure there is a cleaner way to do this, but it's the only way I've been able to make the code combine the DX's into one column. Originally they were in separate columns as 0/1's and I needed them in one column. I tried the PIVOT function, but was not able to figure it out.
The issue is I need the paid amounts to be based on duplicated instances DX's. Which sounds counterintuitive, but for this report it's what I need.
For example. If member A has COPD, ASTHMA, AND DIABETES. The member's paid claims were 40,000 so I need the paid amount for that member to reflect 120,000, etc. and so forth.
The code:
SELECT
DX_FLAG
,Sum( AMT_PAID) AS PHARM_PAID_AMT
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
FROM
(SELECT
st.MEMBER_AMISYS_NBR
,ph.PHARMACY_CLAIM_CK
,ph.AMT_PAID
,FILL.DATE_DATE AS Fill_Date
,Coalesce(CASE WHEN DX_ASTHMA = 'ASTHMA' THEN 'Asthma' END,
CASE WHEN DX_COPD = 'COPD' THEN 'COPD' END,
CASE WHEN DX_DIABETES = 'DIABETES' THEN 'DIABETES' END,
CASE WHEN DX_HEART_FAILURE = 'HEART FAILURE' THEN 'HEART_FAILURE' END,
CASE WHEN DX_HYPERTENSION = 'HYPERTENSION' THEN 'HYPERTENSION' END)
AS DX_FLAG
FROM
STATE_OVERALL_MBRS st
JOIN FT_PHARMACY_CLAIM ph ON st.MEMBER_CURR_CK = ph.PRESCRIBER_MEMBER_CURR_CK AND ph.DELETED_IND = 'N'
JOIN DIM_DATE FILL ON ph.FILL_DATE_DIM_CK = FILL.DATE_DIM_CK
WHERE FILL.DATE_DATE BETWEEN '2021-10-01' AND '2022-09-30'
AND ph.PLAN_DIM_CK =10
AND ph.REVERSAL_IND = 'N'
AND ph.AMT_PAID > 0
) rx
My output looks like this .
DX_FLAG
PHARM_PAID_AMT
MEMBER_COUNT
DIABETES
70,000,000
14,144
COPD
38,266,409
6,641
HEART_FAILURE
10,908,000
2,544
ASTHMA
125,000,000
30,000
HYPERTENSION
52,900
22,325
I have tried adding/removing the Distinct from each select statement and the only one that made a difference was removing distinct from this line, in which case I ended up with far too many member counts (even taking into account the duplicate DX counts).
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
The State_Overall_Mbrs table with DX_Flag looks like this and I needed all the diagnosis to be in one column (with duplicate rows for members depending on how many diagnoses they have):
Member ID Asthma COPD Hypertension Diabetes CHF
55555555 0 1 1 1 0
66666666 1 0 0 1 0
77777777 0 0 1 0 0
Normalize the members table, then join and aggregate; something like this:
SELECT
DX_FLAG
,Sum(AMT_PAID) AS PHARM_PAID_AMT
,Count(DISTINCT(MEMBER_AMISYS_NBR)) AS MEMBER_COUNT
FROM
(SELECT * FROM State_Overall_Members
UNPIVOT (has_dx /* New column to hold the 0 or 1 value */
FOR DX_FLAG IN (Asthma,COPD,Hypertension,Diabetes,CHF)
/* Original column names become the values in new column DX_FLAG */
) nmlz
WHERE has_dx = 1 /* Only unpivot rows with a 1 in original column */
) st
JOIN FT_PHARMACY_CLAIM ph ON st.MEMBER_CURR_CK = ph.PRESCRIBER_MEMBER_CURR_CK AND ph.DELETED_IND = 'N'
JOIN DIM_DATE FILL ON ph.FILL_DATE_DIM_CK = FILL.DATE_DIM_CK
WHERE FILL.DATE_DATE BETWEEN '2021-10-01' AND '2022-09-30'
AND ph.PLAN_DIM_CK =10
AND ph.REVERSAL_IND = 'N'
AND ph.AMT_PAID > 0
GROUP BY DX_FLAG;
Another option to normalize the members table would be to have a subquery for each DX and UNION those, along these lines:
... FROM
(SELECT MEMBER_CURR_CK, MEMBER_AMISYS_NBR, AMT_PAID, 'Asthma' (VARCHAR(16)) AS DX_FLAG
FROM State_Overall_Members
WHERE Asthma = 1
UNION ALL
SELECT MEMBER_CURR_CK, MEMBER_AMISYS_NBR, AMT_PAID, 'COPD' (VARCHAR(16)) AS DX_FLAG
FROM State_Overall_Members
WHERE COPD = 1
UNION ALL
...
) st
JOIN ...
"How can i create in sqlite a Table with 365 Rows and how can i insert the dates?"
For example:
Table MyTable
id month date
1 jan 2021-01-01
2 jan 2021-01-02
3 jan 2021-01-03
.
.
.
365 dec 2021-12-31
How can i create this automatically ?
Thanks
You may reach this using recursive query
create table days as
with recursive qq as (
select 1 id, 'jan' month, '2021-01-01' date_col union all
select id + 1, substr('janfebmaraprmayjunjulaugsepoctnovdec', 1 + 3 *
strftime('%m', date(date_col, '+1 day')), -3), date(date_col, '+1 day')
from qq
where id <= 364
)
The monstrous line
substr('janfebmaraprmayjunjulaugsepoctnovdec', 1 + 3 * strftime('%m', date(date_col, '+1 day')), -3)
will cut the name of the month you need from the line based on the month number of the date. Ths one is needed because SQLIte seems to be not able to get month name from date.
There were suggestions of how one can get month names from date but it did not worked for me. Try those if you want
Here's an example on dbfiddle
I have a source table that looks like this
I start counting ID of the Pd based on the first date then go to the 2nd date and check if it is Pd the add the ID, the go the 3rd date and check if Pd from the previous date are change or not if the change the count them to new group. Please see the desired output. Could you please help?
Thank you
In a single pass solution you will need to track each ids prior inv. When this tracking is in place you will
decrement an invs count based on ids prior inv
increment an invs count based on ids current inv
in the tracker replace the ids prior inv with the current inv
The number of ids is dynamic and not known apriori, and ids prior inv value lookup is keyed on id. The best DATA Step feature for dynamic lookup is HASH
Also, because the counts output is a pivot based on inv values, you will need to either
have a series of if/then or select/when statements to increment/decrement the invs counts
output data as date inv count and Proc TRANSPOSE
Data
data have;
format id 4. date yymmdd10. inv $2.;
input id date yymmdd10. event $ e_seq inv ; datalines;
100 2018-01-01 In 1 Pd
101 2018-01-01 In 1 Pd
102 2018-02-04 In 1 Pd
100 2018-02-07 N 2 NG
101 2018-02-14 P 2 G
101 2018-02-18 A 3 Pd
100 2018-03-15 A 3 Pd
102 2018-05-01 P 2 G
103 2018-06-03 In 1 Pd
run;
Sample code
Nested DOW loops are used to test for end of input data and ensure one row output for each date (the group)
data want(keep=date G NG Pd);
if 0 then set have; * prep pdv for hash;
* ids is the 'tracker';
declare hash ids();
ids.defineKey('id');
ids.defineData('id', 'lastinv');
ids.defineDone();
lastinv = inv; * prep lastinv in pdv;
do until (end);
do until (last.date);
set have end=end;
where inv in ('Pd' 'G' 'NG');
by date;
if ids.find() = 0 then do; * decrement count based on ids prior inv;
select (lastinv);
when ('G') G + -1;
when ('NG') NG + -1;
when ('Pd') Pd + -1;
otherwise ;
end;
end;
* update ids prior inv;
lastinv = inv;
ids.replace();
* increment count based on ids prior inv;
select (lastinv);
when ('G') G + 1;
when ('NG') NG + 1;
when ('Pd') Pd + 1;
otherwise ;
end;
end;
OUTPUT; * <------------ output one row of counts per date;
end;
run;
In my SQL database I have a column that contains a fiscal year value. Fiscal Year start from Oct 1st to Sept 30th of the following year. For example the current fiscal year is 2011-2012 or "2011". I need a calculation to pull dates from my database.
I have a column in my table that contains dates (i.e. 05/04/2012), I need a calculation that will pull the dates for the selected fiscal year? So when I need to see the data for the date 02/14/2003, then I would need the fiscal year of 2002.
This all ties into my ASP.NET page, where a user selects which fiscal year they want to view and a gridview appears with the information requested. So, if I choose fiscal year 2010, the data pulled into the gridview should be all records from 10/01/2010 to 09/30/2011. Thanks for the help, I have yet to try anything because I am not sure where to begin (sql newbie).
You can find the fiscal year by adding three months:
year(dateadd(month,3,'2011-09-30'))
So to select all entries in Fiscal year 2011, try:
select *
from YourTable as yt
where year(dateadd(month,3,yt.TransactionDt)) = 2011
I assume you are using SqlParameters to send data to the SQL Server. Then:
int fiscal_year = 2002;
SqlParameter fyStart = new SqlParameter("#FiscalYearStart",
SqlDbType.DateTime);
fyStart.Value = new SqlDateTime(fiscalYear, 10, 01);
SqlParameter fyEnd = new SqlParameter("#FiscalYearEnd",
SqlDbType.DateTime);
fyEnd.Value = new SqlDateTime(fiscalYear+1, 10, 01);
Then you can pass these two params to an Stored Procedure for example, and to query the table with
WHERE date BETWEEN #FiscalYearStart AND #FiscalYearEnd
N.B. FiscalYearEnd should be 10-Oct-YEAR+1, as it will be represented as YYYY-10-01T00:00:00, or will include the whole 30 Sept.
You could query for it:
SELECT *
FROM YourTable
WHERE YourTableDate >= CAST(#FiscalYear AS CHAR(4)) + '-10-01'
AND YourTableDate < CAST(#FiscalYear + 1 AS CHAR(4)) + '-10-01';
or, if you need the flexibility, you could alternatively have a table of date ranges which join to fiscal years. This gives you the ability to have multiple fiscal year definitions for, say, multiple tenants or companies, or allows the definition to change without changing your queries. You could then join to this table as needed to filter your results.
CompanyID FiscalYear StartDate EndDate
1 2010 2010-10-01 2011-09-30
1 2011 2011-10-01 2012-09-30
2 2010 2010-01-01 2011-12-31
2 2011 2011-01-01 2012-12-31
SELECT *
FROM YourTable t
INNER JOIN FiscalYear y
ON y.FiscalYear = t.FiscalYear
WHERE t.YourTableDate >= y.StartDate AND t.YourTableDate < DATEADD(d, 1, y.EndDate);
i have a table with the following layout.
Email Blast Table
EmailBlastId | FrequencyId | UserId
---------------------------------
1 | 5 | 1
2 | 2 | 1
3 | 4 | 1
Frequency Table
Id | Frequency
------------
1 | Daily
2 | Weekly
3 | Monthly
4 | Quarterly
5 | Bi-weekly
I need to come up with a grid display on my asp.net page as follows.
Email blasts per month.
UserId | Jan | Feb | Mar | Apr |..... Dec | Cumulative
-----------------------------------------------------
1 7 6 6 7 6 #xx
The only way I can think of doing this is as below, for each month have a case statement.
select SUM(
CASE WHEN FrequencyId = 1 THEN 31
WHEN FrequencyId = 2 THEN 4
WHEN FrequencyId = 3 THEN 1
WHEN FrequencyId = 4 THEN 1
WHEN FrequencyId = 5 THEN 2 END) AS Jan,
SUM(
CASE WHEN FrequencyId = 1 THEN 28 (29 - leap year)
WHEN FrequencyId = 2 THEN 4
WHEN FrequencyId = 3 THEN 1
WHEN FrequencyId = 4 THEN 0
WHEN FrequencyId = 5 THEN 2 END) AS Feb, etc etc
FROM EmailBlast
Group BY UserId
Any other better way of achieving the same?
Is this for any given year? I'm going to assume you want the schedule for the current year. If you want a future year you can always change the DECLARE #now to specify any future date.
"Once in 2 weeks" (usually known as "bi-weekly") doesn't fit well into monthly buckets (except for February in a non-leap year). Should that possibly be changed to "Twice a month"?
Also, why not store the coefficient in the Frequency table, adding a column called "PerMonth"? Then you only have to deal with the Daily and Quarterly cases (and is it an arbitrary choice that this will happen only in January, April, and so on?).
Assuming that some of this is flexible, here is what I would suggest, assuming this very minor change to the table schema:
USE tempdb;
GO
CREATE TABLE dbo.Frequency
(
Id INT PRIMARY KEY,
Frequency VARCHAR(32),
PerMonth TINYINT
);
CREATE TABLE dbo.EmailBlast
(
Id INT,
FrequencyId INT,
UserId INT
);
And this sample data:
INSERT dbo.Frequency(Id, Frequency, PerMonth)
SELECT 1, 'Daily', NULL
UNION ALL SELECT 2, 'Weekly', 4
UNION ALL SELECT 3, 'Monthly', 1
UNION ALL SELECT 4, 'Quarterly', NULL
UNION ALL SELECT 5, 'Twice a month', 2;
INSERT dbo.EmailBlast(Id, FrequencyId, UserId)
SELECT 1, 5, 1
UNION ALL SELECT 2, 2, 1
UNION ALL SELECT 3, 4, 1;
We can accomplish this using a very complex query (but we don't have to hard-code those month numbers):
DECLARE #now DATE = CURRENT_TIMESTAMP;
DECLARE #Jan1 DATE = DATEADD(MONTH, 1-MONTH(#now), DATEADD(DAY, 1-DAY(#now), #now));
WITH n(m) AS
(
SELECT TOP 12 m = number
FROM master.dbo.spt_values
WHERE number > 0 GROUP BY number
),
months(MNum, MName, StartDate, NumDays) AS
( SELECT m, mn = CONVERT(CHAR(3), DATENAME(MONTH, DATEADD(MONTH, m-1, #Jan1))),
DATEADD(MONTH, m-1, #Jan1),
DATEDIFF(DAY, DATEADD(MONTH, m-1, #Jan1), DATEADD(MONTH, m, #Jan1))
FROM n
),
grp AS
(
SELECT UserId, MName, c = SUM (
CASE x.Id WHEN 1 THEN NumDays
WHEN 4 THEN CASE WHEN MNum % 3 = 1 THEN 1 ELSE 0 END
ELSE x.PerMonth END )
FROM months CROSS JOIN (SELECT e.UserId, f.*
FROM EmailBlast AS e
INNER JOIN Frequency AS f
ON e.FrequencyId = f.Id) AS x
GROUP BY UserId, MName
),
cumulative(UserId, total) AS
(
SELECT UserId, SUM(c)
FROM grp GROUP BY UserID
),
pivoted AS
(
SELECT * FROM (SELECT UserId, c, MName FROM grp) AS grp
PIVOT(MAX(c) FOR MName IN (
[Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])
) AS pvt
)
SELECT p.*, c.total
FROM pivoted AS p
LEFT OUTER JOIN cumulative AS c
ON p.UserId = c.UserId;
Results:
UserId Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec total
1 7 6 6 7 6 6 7 6 6 7 6 6 76
Clean up:
DROP TABLE dbo.EmailBlast, dbo.Frequency;
GO
In fact the schema change I suggested doesn't really buy you much, it just saves you two additional CASE branches inside the grp CTE. Peanuts, overall.
I think you're going to end up with a lot more complicated logic. Sure Jan has 31 days.. but Feb doesn't... and Feb changes depending on the year. Next, are email blasts sent even on weekends and holidays or are certain days skipped for various reasons... If that's the case then the number of business days for a given month changes each year.
Next the number of full weeks in a given month also changes year by year. What happens to those extra 4 half weeks? Do they go on the current or next month? What method are you using to determine that? For an example of how complicated this gets read: http://en.wikipedia.org/wiki/ISO_week_date Specifically the part where it talks about the first week, which actually has 9 different definitions.
I'm usually not one to say this, but you might be better off writing this with regular code instead of a sql query. Just issue a 'select * from emailblast where userid = xxx' and transform it using a variety of code methods.
Depends on what you're looking for. Suggestion 1 would be to track your actual email blasts (with a date :-).
Without actual dates, whatever you come-up with for one month will be the same for every month.
Anyway, If you're going to generalize, then I'd suggest using something other than ints -- like maybe floats or decimals. Since your output based on the tables listed in your post can only ever approximate what actually happens (e.g., January actually has 4-1/2 weeks, not 4), you'll have a compounding error-bounds over any range of months -- getting worse, the further out you extrapolate. If you output an entire 12 months, for example, your extrapolation will under-estimate by over 4 weeks.
If you use floats or decimals, then you'll be able to come much closer to what actually happens. For starters: find a common unit of measure (I'd suggest using a "day") E.g., 1 month = 365/12 days; 1 quarter = 365/4 days; 1 2week = 14 days; etc.
If you do that -- then your user who had one 1 per quarter actually had 1 per 91.25 days; 1 per week turns into 1 per 7 days; 1 per BiWeek turns into 1 per 14 days.
**EDIT** -- Incidentally, you could store the per-day value in your reference table, so you didn't have to calculate it each time. For example:
Frequency Table
Id | Frequency | Value
-------------------------------
1 | Daily | 1.0
2 | Weekly | .14286
3 | Monthly | .03288
4 | Quarterly | .01096
5 | Once in 2 weeks | .07143
Now do math -- (1/91.25 + 1/7 + 1/14) needs a common denom (like maybe 91.25 * 14), so it becomes (14/1277.5 + 182.5/1277.5 + 91.25/1277.5).
That adds-up to 287.75/1277.5, or .225 emails per day.
Since there are 365/12 days per month, multiple .225 * (365/12) to get 6.85 emails per month.
Your output would then look something like this:
Email blasts per month.
UserId | Jan | Feb | Mar | Apr |..... Dec | Cumulative
-----------------------------------------------------
1 6.85 6.85 6.85 6.85 6.85 #xx
The math may seem a little tedious, but once you step it out on your code, you'll never have to do it again. Your results will be more accurate (I rounded to 2 decimal places, but you could go further out if you wanted to). And if your company is using this data to determine budgets / potential income for the upcoming year, that might be worth it.
Also worth mentioning is that after YOU get done extrapolating (and the error bounds that entails), your consumers of this output will do THEIR OWN extrapolating, not on the raw data, but on your output. So it's kind of a double-whammy of error bounds. The more accurate you can be early-on, the more reliable these numbers will be at each subsequent levels.
You might want to consider adding a 3rd table called something like Schedule.
You could structure it like this:
MONTH_NAME
DAILY_COUNT
WEEKLY_COUNT
MONTHLY_COUNT
QUARTERLY_COUNT
BIWEEKLY_COUNT
The record for JAN would be
JAN
31
4
1
1
2
Or you could structure it like this:
MONTH_NAME
FREQUENCY_ID
EMAIL_COUNT
and have multiple records for each month:
JAN 1 31
JAN 2 4
JAN 3 1
JAN 4 1
JAN 5 2
I let you figure out if the logic to retrieve this is better than your CASE structure.