Recursively add to a data table in SAS

Recursively add to a data table in SAS - recursion

I am new to SAS. I need to do x-iterations to populate my dataset called MYRS.
Each iteration needs to JOIN TABLE1 with (TABLE2+ MYRS) MINUS the records which are already in MYRS table.
Then, I need to update MYRS table with additional matches. The GOAL is to track a chain of emails.
MYRS is essentially a copy of TABLE1 and contains matching records. Kind of tricky. (simplified schema). Table1 Can have DUPS.
For example
TABLE1:
ID | EMAIL1 | EMAIL2 | EMAIL3 | EMAIL4|
1 | A | s | d | F
2 | g | F | j | L
3 | z | x | L | v
4 | z | x | L | v
2 | g | F | j | L
TABLE2:
EMAIL
A
MYRS (starts as empty dataset)
EMAIL1 | EMAIL2 | EMAIL3 | EMAIL4
Logic: TABLE1 has email that matches email in TABLE2. Therefore this record need to show up. Other records don't match anything in TABLE2. But because Record1 and Record2 share the same ALTERNATIVE email F, Record2 also need to be shown. But because Record2 and Record3 share same alternative email L, Record3 also needs to be shown. And so fourth...
proc sql;
SELECT TABLE1.id,
TABLE1.email1,
TABLE1.email2,
TABLE1.email3,
TABLE1.email4
FROM TABLE1
INNER JOIN (
SELECT EMAIL
FROM TABLE2
UNION
SELECT EMAIL1 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL2 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL3 AS EMAIL
FROM MYRS
UNION
SELECT EMAIL4 AS EMAIL
FROM MYRS
)
ON EMAIL=EMAIL1 OR EMAIL=EMAIL2 OR EMAIL=EMAIL3 OR EMAIL=EMAIL4
WHERE TABLE1.id NOT IN (
SELECT DISTINCT ID
FROM MYRS
)
quit;
How can I create the following logic:
Wrap this into some sort of function
Before sql execution, count amount of records in MYDS and SAVE the count
Execute SQL and update MYDS
Count amount of records in MYDS
If MYDS count did not change, stop execution
Else, goto #3
I am very new to SAS (3 days to be exact) and trying to put everything together. (I would use the logic above if I was to do that in Java)

Here is a macro approach, it mostly follows your logic but transforms your data first and the input/output is a list of IDs (you can easily get to and from emails with this).
This code will probably introduce quite a few SAS features that you are unfamiliar with, but the comments and explanations below should help . If any of it is still unclear take a look at the links or add a comment.
It expects input data:
inData: Your TABLE1 with ID and EMAIL* variables
matched: An initial list of known wanted IDs
It returns:
matched: An updated list of wanted IDs
/* Wrap the processing in a macro so that we can use a %do loop */
%macro looper(maxIter = 5);
/* Put all the emails in one column to make comparison simpler */
proc transpose data = inData out = trans (rename = (col1 = email));
by ID;
var email:;
run;
/* Initialise the counts for the %where condition */
%let _nMatched = 0;
%let nMatched = 1;
%let i = 0;
/* Loop until no new IDs are added (or maximum number of iterations) */
%do %while(&_nMatched. < &nMatched. and &i < &maxIter.);
%let _nMatched = &nMatched.;
%let i = %eval(&i. + 1);
%put NOTE: Loop &i.: &nMatched. matched.;
/* Move matches to a temporary table */
proc datasets library = work nolist nowarn;
delete _matched;
change matched = _matched;
quit;
/* Get new matched IDs */
proc sql noprint;
create table matched as
select distinct c.ID
from _matched as a
left join trans as b
on a.ID = b.ID
left join trans as c
on b.email = c.email;
/* Get new count */
select count(*) into :nMatched from matched;
quit;
%end;
%mend looper;
%looper(maxIter = 10);
The interesting bits are:
proc transpose: Converts the input into a deep table so that all the email addresses are in one variable, this makes writing the email comparison logic simpler (less repetition needed) and puts the data in a format that will make it easier for you to clean the email addresses if necessary (think upcase(), strip(), etc.).
%macro %mend: The statements used to define a macro. This is necessary as you cannot use macro logic or loops in open code. I've also added an argument so you can see how that works.
%let and select into :: Two ways to create macro variables. Macro variables are referenced with the prefix & and are used to insert text into the SAS program before it is executed.
%do %while() %end: One of the ways to perform a loop within a macro. The code within will be run repeatedly until the condition evaluates to false.
proc datasets: A procedure for performing admin tasks on datasets and libraries. Used here to delete and rename temporary tables.

Related

Converting PL/SQL select rows in custom columns

I am stuck with this problem, I'm doing select like this:
select * from mytable;
and getting following results
-------------
|NAME |VALUE|
-------------
|nam1 |val1 |
-------------
|nam2 |val2 |
-------------
|nam3 |val3 |
-------------
Result is always formatted like this, NAME->VALUE.
Also, there are constraints placed on table so only one distinct NAME could appear in result. Also, values could be numbers, varchars, nulls, I don't want to do aggregation on this values.
Now I would like to convert this result to this:
--------------------
|NAM1 |NAM2 | NAM3 |
--------------------
|val1 |val2 | val3 |
--------------------
I tried achieving this result with pivot() function but without much success.
Thank you for your time, best regards :)
EDIT
This is the working example, with hardcoded column values, which is what I want to avoid.
select * from (select name, value from mytable)
pivot (min(value) for name in (
'nam1' as nam1
'nam2' as nam2
'nam3' as nam3
));

Using DECODE Function:
select name,
decode(name,'Name1','value1',0) as Name1,
decode(name,'Name2','value2',0) as Name2,
decode(name,'Name3','value3',0) as Name3,
decode(name,'Name4','value4',0) as Name4
from mytable
group by name
order by name;

Return multiple COLUMN_JSON results as JSON array

I am storing data in standard tables in a MariaDB, but would like to return records from related tables as a JSON string.
What I intend to do is have a function where I can pass in exerciseId and the function returns a JSON string of all related exerciseMuscle records, meaning each exercise record returned by a stored proc can also include nested data from child tables.
I have been able to create JSON records using COLUMN_JSON and COLUMN_CREATE but can only get this to return as a set of individual records, rather than an array of JSON values as a need. The SQL I'm using is:
select
e.id,
CONVERT(COLUMN_JSON(COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)) USING utf8) as musclesJson
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
This returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"}
| 96 | {"role":"secondary","muscle":"shoulders"}
When what I want is:
| id | musclesJson
| 96 | [{"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}]
Is it possible to return multiple results in one row without having to iterate through the results and build it manually? If I add a group by to the SQL then the JSON only includes the first record.

Turns out it was GROUP_CONCAT that I needed, and specifying a comma as the delimiter. So changing my SQL to:
select
e.id,
CONVERT(
GROUP_CONCAT(
COLUMN_JSON(
COLUMN_CREATE(
'role', em.muscleRoleName,
'muscle', em.muscleName
)
)
SEPARATOR ','
) USING utf8) as muscles
from
exercise e
inner join exerciseMuscle em
on e.id = em.exerciseId
where
e.id = 96;
Returns:
| id | musclesJson
| 96 | {"role":"main","muscle":"biceps"},{"role":"secondary","muscle":"shoulders"}

i have two tables student and trainer. i want to get values from student and trainer table based on some condition

i have student and trainer tables :
student table:
student_id (primary key)
name
email
trainer table:
trainer_id
student_id
amount
output has to:
sid name email amount
22 ram r#g 200
34 sam r#f
i want to get (student_id,name,email) from student table and (amount) from trainer table(imp : trainer_id and student_id should match(like sid = 46,tid =78,amount=500) then only the amount has to display value. otherwise amount will display empty but (student_id,name,email) should display)
in trainer table, student_id and trainer_id has to match...based on that amount will come..i mean if we send the select query as "select amount from trainer where student_id= 20 and trainer_id=36...". that column should match for sid and tid

If you do it this way, It wil not show data if amount is empty :
select st.student_id,
st.name,
st.email,
tt.amount
from student_table st, trainer_table tt
where st.student_id = tt.student_id
NVL function lets you substitute a value when a null value is encountered,
so if you do it this way, It wil show data and show 0 instead of null:
select st.student_id,
st.name,
st.email,
(nvl(select tt.amount
from trainer_table tt
where st.student_id = tt.student_id,0))) amount
from student_table st

This can be accomplished with PLSQL. Sorry it is so extensive but I hope it allows you to see the power of PLSQL if you need to manipulate data based on conditionals.
DECLARE
TYPE result_record IS RECORD
( sid NUMBER
, name VARCHAR2(60)
, email VARCHAR2(60)
, amount NUMBER);
CURSOR c IS select st.student_id sid,
st.name name,
st.email email,
tt.amount amount
from student_table st, trainer_table tt
where st.student_id = tt.student_id;
TYPE results_table IS TABLE OF results_record INDEX BY BINARY_INTEGER;
c_rec c%ROWTYPE;
temp_rec RESULTS_RECORD;
results RESULTS_TABLE;
lv_index NUMBER := 0;
BEGIN
OPEN c;
WHILE lv_index <= c%ROWCOUNT LOOP
FETCH c INTO c_rec;
temp_rec.sid := c_rec.sid;
temp_rec.name := c_rec.name;
temp_rec.email := c_rec.email;
temp_rec.amount := c_rec.amount;
results(lv_index) := temp_rec;
lv_index := lv_index + 1;
END LOOP;
CLOSE c;
-- Now we can access and modify our table from inside PLSQL
SELECT * FROM results;
-- Use PLSQL logic to make the table output pretty with '$' and conditionals
FOR i IN results LOOP
dbms_output.put_line(i.sid||' $'||i.amount); -- example for how to access
-- your code here
END LOOP;
END;
/
As always, I hope this gives you some ideas.
-V

confused on how to properly use %rowtype for many tables

I'm attempting to create a derived table of country data from several other tables. Those tables look something like this:
Countries
ID | Name
Country_demographics
ID | date | Population | urban_pop | birth_rate
country_financials
ID | date | GDP | GDP_per_capita
Now, I'm trying to make a new table with
New_Table
ID | Name | date | population | urban_pop | birth_rate | gdp | gdp_per_capita
I have a stored procedure that currently looks something like this:
CREATE OR REPLEACE PROCEDURE SP_COUNTRY (
chunkSize IN INT
) AS
--create tables to hold IDs and stats
TYPE idTable IS TABLE OF COUNTRIES.ID%TYPE;
TYPE dateTable IS TABLE OF COUNTRY_DEMOGRAPHICS.EVALUATION_DATE%TYPE;
TYPE totPopTable IS TABLE OF COUNTRY_DEMOGRAPHICS.POPULATION_TOTAL_COUNT%TYPE;
TYPE urbanPopTable IS TABLE OF COUNTRY_DEMOGRAPHICS.POPULATION_URBAN_COUNT%TYPE;
--constructors
ids idTable;
dates dateTable;
totpop totPopTable;
urbanpop urbanPopTable;
--cursors
CURSOR countryCur IS
SELECT c.ID,cd.EVALUATION_DATE,cd.POPULATION_TOTAL_COUNT,cd.POPULATION_URBAN_COUNT
FROM COUNTRIES c,COUNTRY_DEMOGRAPHICS cd
WHERE c.id=cd.COUNTRY_ID
ORDER BY ID,EVALUATION_DATE;
BEGIN
dbms_output.enable(999999);
--open cursor
OPEN countryCur;
LOOP
--fetch and bulk collect
FETCH countryCur BULK COLLECT INTO ids,dates,totpop,urbanpop
LIMIT chunkSize;
--loop over collections
FOR j in ids.FIRST..ids.LAST
LOOP
--populate record
country.COUNTRY_ID := ids(j);
country.EVALUATION_DATE := dates(j);
country.POPULATION_TOTAL_COUNT := totpop(j);
country.POPULATION_URBAN_COUNT := urbanpop(j);
--update/insert table with record (much confusion here on how to update/insert and check if already exists in derived table..)
UPDATE NEW_TABLE SET ROW = country WHERE COUNTRY_ID = ids(j);
dbms_output.put_line('id: ' || country.COUNTRY_ID || ' date: ' || country.EVALUATION_DATE);
dbms_output.put_line(' pop: ' || country.POPULATION_TOTAL_COUNT || ' urban: ' || country.POPULATION_URBAN_COUNT);
END LOOP;
END LOOP;
--close cursor
CLOSE countryCur;
END;
As you can see, I'm using a different table type for each piece of data. I then plan on making a loop and then just inserting/updating in my new_table. I think there must be a better way to do this with %rowtype, or maybe creating a record and inserting the record? I'm not sure

Unless I'm missing something by simplifying this, and assuming cd.date and cf.date are equal, this should work:
INSERT INTO NEW_TABLE (ID, Name, date, population, urban_pop, birth_rate, gdp, gdp_per_capita)
values
(select c.id, c.name, cd.date,
cd.population, cd.urban_pop, cd.birthrate,
cf.gdp, cf.gdp_per_capita)
from Countries c, country_demographics cd, country_financials cf
where c.id = cd.id
and cd.id = cf.id);
Edit: Use the MERGE statement to update or insert depending on if the primary key exists:
MERGE INTO NEW_TABLE nt
USING ( select c.id, c.name, cd.date,
cd.population, cd.urban_pop, cd.birthrate,
cf.gdp, cf.gdp_per_capita
from Countries c, country_demographics cd, country_financials cf
where c.id = cd.id
and cd.id = cf.id ) a
ON (nt.id = a.id )
WHEN MATCHED THEN
UPDATE SET nt.Name = a.Name,
nt.date = a.date,
nt.population = a.population,
nt.urban_pop = a.urban_pop,
nt.birth_rate = a.birth_rate,
nt.gdp = a.gdp,
nt.gdp_per_capita = a.gdp_per_capita
WHEN NOT MATCHED THEN
INSERT (ID, Name, date, population, urban_pop, birth_rate, gdp, gdp_per_capita)
VALUES (a.id, a.Name, a.date, a.population, a.urban_pop, a.birth_rate, a.gdp, a.gdp_per_capita);

trigger insert different row

The sqlite3 trigger that I want to create might or might not be possible with sql. Five tables are involved:
Members Groups GroupMembers Accounts
mId |name| accId gId | name | accId gId | mId accId | balance
Orders
oId | accId | ammount
When someone deletes a group, I want to make an order for each of the group members with the average of the group balance. So it should do something like this:
CREATE NEW TRIGGER triggername
BEFORE DELETE ON Groups
WHEN ((SELECT balance FROM Accounts WHERE accId=OLD.accId) = 0)
FOR EACH ROW IN
(SELECT accId
FROM GroupMembers JOIN Members ON GroupMembers.mId = Members.mId
WHERE GroupMembers.gId = OLD.gId)
BEGIN
INSERT INTO Orders(accId,ammount) VALUES(accId,
(SELECT balance FROM Accounts WHERE accId = OLD.accId)
/
(SELECT SUM(mId) FROM GroupMembers WHERE gId = OLD.gId)
);
END
The question is: is it possible to create a FOR EACH ROW in any other table than the table at which the trigger applies? Is it possible to put the WHEN statement before the FOR EACH statement?

As documented, there is no such a thing as a FOR EACH ROW IN ... clause.
However, the INSERT statement can use a SELECT statement as the source of the data to be inserted.
Write a SELECT statement that returns one row for each group member:
CREATE TRIGGER triggername
AFTER DELETE ON Groups
FOR EACH ROW
BEGIN
INSERT INTO Orders(accId, amount)
SELECT accId,
(SELECT balance
FROM Accounts
WHERE accId = OLD.accId) /
(SELECT COUNT(*)
FROM GroupMembers
WHERE gId = OLD.gId)
FROM Members
WHERE mId IN (SELECT mId
FROM GroupMembers
WHERE gId = OLD.gId);
END;
(I dropped the balance = 0 filter.)