SAS Counting Occurrences based on multiple layers within set time period - count

I am trying to count occurrences where the same person was billed for an item, four or more times, by the same place within 30 days of each instance. For example, input would look something like:
person service place date
A x shop1 01/01/15
A x shop1 01/15/15
A x shop1 01/20/15
B y shop2 03/20/15
B y shop2 04/01/15
C z shop1 05/05/15
And output would look something like:
person service place date count
A x shop1 01/01/15 3
A x shop1 01/15/15 3
A x shop1 01/20/15 3
B y shop2 03/20/15 2
B y shop2 04/01/15 2
C z shop1 05/05/15 1
I have tried stuff similar to:
data work.want;
do _n_ =1 by 1 until (last.PLACE);
set work.rawdata;
by PERSON PLACE;
if first.PLACE then count=0;
count+1;
end;
frequency= count;
do _n_ = 1 by 1 until (last.PLACE);
set work.rawdata;
by PERSON PLACE;
output;
end;
run;
this gives a count based on person and place but does not factor in time. Any help or suggestions would be greatly appreciated! Thank you

This can be done easily with proc sql...
Your data:
data have;
input person $ service $ place $;
datalines;
A x shop1
A x shop1
A x shop1
B y shop2
B y shop2
C z shop1
;
run;
Then we count the occurences of "place" for each 1,2 group, and join the original table.
proc sql;
create table want as
select a.*, b._count
from have as a
inner join
(
select person, service, count(place) as _count
from have
group by 1,2
) as b
on a.person = b.person
and a.service = b.service
;
quit;
Is there a date field? We need it in order to group the data by month (or 30 days), for example.

proc sql;
create table summary as
select person, service, place, count(*) as count
from rawdata
group by person, service, place
having count>=4;
quit;
Note: This doesn't check to see if the events occurred within 30 days of each other. I didn't know the type of data you had for this in your dataset.

Related

How do I create a table that displays Current Inventory based on the location of said Inventory using Pl/SQL?

Here is what I have:
product location quantity moved dttm
apple shop1 30 null '08/10/22'
orange shop1 20 null '08/15/22'
pear shop1 40 null '08/20/22'
apple shop2 10 shop1 '08/22/22'
orange shop3 15 shop1 '08/22/22'
Where Location is the current location of product, with that quantity, and moved is the previous location of the inventory (which is sometimes null if it is being added to the system), and dttm the date that change occurred.
I'm looking for a way to show the current inventory based changes made to the data set. The view Should look something like the below:
Location Product Quantity
shop1 apple 20
shop1 orange 5
shop1 pear 40
shop2 apple 10
shop3 orange 15
What is the best practice for making a view this way? I have yet to come up with a working query that gives accurate numbers. I have the side that adds inventory to a location working (using an outer apply statement. I'm getting hung up on how to get my move column to substract inventory from products at a given location.
This answer seems to be close to what I want, but with the added complexity of location also being a factor in the totals for the items.
What am I missing? or does my dataset need to be remade to accomplish what I want?
Thanks for any and all help
Here's one option: create a view (or a subquery, or use a CTE - as I did) as union of locations (either as the original locations, or moved ones), and the rest is then easy - a simple aggregation.
Sample data:
SQL> with test (product, location, quantity, moved) as
2 (select 'apple' , 'shop1', 30, null from dual union all
3 select 'orange' , 'shop1', 20, null from dual union all
4 select 'pear' , 'shop1', 40, null from dual union all
5 select 'apple' , 'shop2', 10, 'shop1' from dual union all
6 select 'orange' , 'shop3', 15, 'shop1' from dual
7 ),
Query begins here:
8 temp as
9 (select location, product, quantity from test
10 union all
11 select moved , product, -quantity from test where moved is not null
12 )
13 select location, product, sum(quantity) quantity
14 from temp
15 group by location, product
16 order by location, product
17 /
LOCAT PRODUC QUANTITY
----- ------ ----------
shop1 apple 20
shop1 orange 5
shop1 pear 40
shop2 apple 10
shop3 orange 15
SQL>

Count ID based on start date

I have a source table that looks like this
I start counting ID of the Pd based on the first date then go to the 2nd date and check if it is Pd the add the ID, the go the 3rd date and check if Pd from the previous date are change or not if the change the count them to new group. Please see the desired output. Could you please help?
Thank you
In a single pass solution you will need to track each ids prior inv. When this tracking is in place you will
decrement an invs count based on ids prior inv
increment an invs count based on ids current inv
in the tracker replace the ids prior inv with the current inv
The number of ids is dynamic and not known apriori, and ids prior inv value lookup is keyed on id. The best DATA Step feature for dynamic lookup is HASH
Also, because the counts output is a pivot based on inv values, you will need to either
have a series of if/then or select/when statements to increment/decrement the invs counts
output data as date inv count and Proc TRANSPOSE
Data
data have;
format id 4. date yymmdd10. inv $2.;
input id date yymmdd10. event $ e_seq inv ; datalines;
100 2018-01-01 In 1 Pd
101 2018-01-01 In 1 Pd
102 2018-02-04 In 1 Pd
100 2018-02-07 N 2 NG
101 2018-02-14 P 2 G
101 2018-02-18 A 3 Pd
100 2018-03-15 A 3 Pd
102 2018-05-01 P 2 G
103 2018-06-03 In 1 Pd
run;
Sample code
Nested DOW loops are used to test for end of input data and ensure one row output for each date (the group)
data want(keep=date G NG Pd);
if 0 then set have; * prep pdv for hash;
* ids is the 'tracker';
declare hash ids();
ids.defineKey('id');
ids.defineData('id', 'lastinv');
ids.defineDone();
lastinv = inv; * prep lastinv in pdv;
do until (end);
do until (last.date);
set have end=end;
where inv in ('Pd' 'G' 'NG');
by date;
if ids.find() = 0 then do; * decrement count based on ids prior inv;
select (lastinv);
when ('G') G + -1;
when ('NG') NG + -1;
when ('Pd') Pd + -1;
otherwise ;
end;
end;
* update ids prior inv;
lastinv = inv;
ids.replace();
* increment count based on ids prior inv;
select (lastinv);
when ('G') G + 1;
when ('NG') NG + 1;
when ('Pd') Pd + 1;
otherwise ;
end;
end;
OUTPUT; * <------------ output one row of counts per date;
end;
run;

Sqlite how to select case for multiple values in a column

So, I have this table:
id|otherid|key|value
--------------------
1 1 ak av
2 1 bk bv
3 2 ak av
3 2 ak av2
The things to note is that other ids are repeating and they can have same keys with values multiple times. The thing I want to retrieve would be the value for the key, or, if there are multiple values for same key some string.
So, I'd like to receive for otherids
otherid|key|value
-----------------
1 ak av
1 bk bv
2 ak SEQUENCE
Where 'SEQUENCE' string allows me to know that there are multiple values for the single key for otherid. What query would accomplish this?
To get one output row for multiple input rows, use grouping.
The count of rows in the group is available with COUNT(*); you can handle the cases with a CASE expression:
SELECT otherid,
key,
CASE COUNT(*)
WHEN 1 THEN MIN(value)
ELSE 'SEQUENCE'
END AS value
FROM MyTable
GROUP BY otherid,
key;
SELECT DISTINCT
otherid,key,
(SELECT
CASE
WHEN COUNT(value)=1 THEN value
WHEN COUNT(value)=0 THEN '*nil*'
ELSE '*sequence*'
END)
FROM datasingle
WHERE otherid=myid GROUP BY key;

Sum all field with the other field < itself in sqlite

Sorry because I dont think good title for my problem.
I have table a(f1 integer, date Long), date increase, and the data
f1 date
1 1
2 2
3 3
...
I need to sum f1 by date, with record 1{1,1} the sum f1 is 1,with record 2 the sum f1 is 1+2, record 3 the sum f1 is 1+2+3...
How can I do that?
This requires a correlated subquery:
SELECT date,
(SELECT SUM(f1)
FROM a AS a2
WHERE a2.date <= a.date
) AS f1_sum
FROM a
ORDER BY date;
But it's inefficient. Consider just scanning the table, sorted by the date, and summing f1 as you're reading it.

combining IDs under GROUP BY?

I have an sqlite table "log" that looks like this:
ID p_id viewer
----------------------
1 1 100
2 1 200
3 1 300
4 3 550
5 3 230
6 5 420
7 2 320
8 2 203
9 9 10
10 9 55
And I want to get the average viewers from each p_id. That'd be
SELECT avg(viewer) FROM log GROUP BY p_id
But I want to treat p_id 1 and 5 as the same, so I'd get the average viewers of p_id 1 and 5 combined. How do I do that? Note that the table is much larger, and I need to treat two p_id's as the same multiple times. Can I still do a "group by" to achieve this, or are there other ways?
I tried this and it works with mySQL at least.
SELECT avg(viewer) FROM log GROUP BY CASE p_id WHEN 5 THEN 1 ELSE p_id END;
Edit: When using an alias-table named 'aliases' with the fields 'alias_from_id' and 'aliased_as':
SELECT
CASE ISNULL((SELECT aliased_as FROM aliases WHERE (alias_from_id = log.pid)))
WHEN 1 THEN log.pid ELSE aliased_as END AS the_id,
SUM(value) AS value_sum
FROM log
LEFT JOIN aliases ON (alias_from_id = pid)
GROUP BY
CASE ISNULL((SELECT aliased_as FROM aliases WHERE (alias_from_id = log.pid)))
WHEN 1 THEN log.pid ELSE aliased_as END
I've tested this with mySQL and it works like a charm. It might be possible to simplify this SQL-query a bit but this is the best I can do at the moment :)
Edit2: Changed ISNULL to the corresponding SQLite IFNULL function
SELECT
IFNULL((SELECT aliased_as FROM aliases WHERE (alias_from_id = log.pid)), log.pid) AS the_id,
SUM(value) AS value_sum
FROM log
LEFT JOIN aliases ON (alias_from_id = pid)
GROUP BY
IFNULL((SELECT aliased_as FROM aliases WHERE (alias_from_id = log.pid)), log.pid)
I don't know if SQLite supports the AS keyword, if it doesn't then just remove that keyword - the functionality should be the same.

Resources