Lua: recursive function builts wrong table - pil4 - recursion

while working on the exercise 2.2 of "programming in Lua 4" I do have to create a function to built all permutations of the numbers 1-8. I decided to use Heaps algorithm und made the following script. I´m testing with numbers 1-3.
In the function I store the permutations as tables {1,2,3} {2,1,3} and so on into local "a" and add them to global "perm". But something runs wrong and at the end of the recursions I get the same permutation on all slots. I can´t figure it out. Please help.
function generateperm (k,a)
if k == 1 then
perm[#perm + 1] = a -- adds recent permutation to table
io.write(table.unpack(a)) -- debug print. it shows last added one
io.write("\n") -- so I can see the algorithm works fine
else
for i=1,k do
generateperm(k-1,a)
if k % 2 == 0 then -- builts a permutation
a[i],a[k] = a[k],a[i]
else
a[1],a[k] = a[k],a[1]
end
end
end
end
--
perm = {}
generateperm(3,{1,2,3}) -- start
--
for k,v in ipairs (perm) do -- prints all stored permutations
for k,v in ipairs(perm[k]) do -- but it´s 6 times {1,2,3}
io.write(v)
end
io.write("\n")
end
debug print:
123
213
312
132
231
321
123
123
123
123
123
123

Related

Will a nest loop help in parsing results

I am trying to pull information from two different dictionaries. (excuse me because I am literally hacking to understand.)
I have a for loop that gives me the vmname. I have another for loop that gives me the other information like 'replicationid'.
I could be doing a very huge assumption here but hey ill start there. what I want to do it to integrate for loop 1 and for loop 2. as so the results are like this, is it even possible?
initial output of for loop1 which I can get:
vma
vmb
vmc
initial output of for loop2 which I can get:
replication job 1
replication job 2
replication job 3
desired results is:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
things = [(cl['replicationJobId'], cl['serverId']) for cl in
resp['replicationJobList']]
thangs = [cl['vmServer'] for cl in resp['replicationJobList']]
for i in thangs:
print()
print("this is vm " + (i['vmName']))
print("this is the vm location " + (i['vmPath']))
print("this is the vm address, " +(str(i['vmServerAddress'])))
for j in things:
print("The Replication ID is : " +(str(j[0])))
again I want:
vma
replication job 1
vmb
replication job 2
vmc
replication job 3
im am getting:
vma
replication job 1
replication job 2
replication job 3
vmb
replication job 1
replication job 2
replication job 3
..
..
..
If you are sure that your lists both have the same length, then what you need is python built-in zip function:
for thing, thang in zip(things, thangs):
print()
print(thing)
print(thang)
But if one of the lists is longer then another then zip will crop both lists to have the same length as the shortest, for example:
>>> for i, j in zip(range(3), range(5)):
... print(i, j)
...
(0, 0)
(1, 1)
(2, 2)
UPD:
You can also unpack your tuples right in for loop definition, so each item (they are 2-tuples) in things list gets saved to two variables:
for (replicationJobId, serverId), thang in zip(things, thangs):
print()
print(replicationJobId)
print(serverId)
print(thang)
UPD 2:
Why do you split resp into to two lists?
def get_replication_job_status():
sms = boto3.client('sms')
resp = sms.get_replication_jobs()
#print(resp)
for replication_job in resp['replicationJobList']:
vm_server = replication_job['vmServer']
print()
print("this is vm:", vm_server['vmName'])
print("this is the vm location:", vm_server['vmPath'])
print("this is the vm address:", vm_server['vmServerAddress'])
print("The Replication ID is :", replication_job['replicationJobId'])

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

Hash Table + Binary Search

I'm using an Hash Table to store some values. Here are the details:
There will be roughly 1M items to store (not known before, so no perfect-hash possible).
Table is 10M large.
Hash function is MurMurHash3.
I did some tests and storing 1M values I get 350,000 collisions and 30 elements at the most-colliding hash table's slot.
Are these result good?
Would it make sense to implement Binary Search for lists that get created at colliding hash-table's slots?
What' your advice to improve performances?
EDIT: Here is my code
var
HashList: array [0..10000000 - 1] of Integer;
for I := 0 to High(HashList) do
HashList[I] := 0;
for I := 1 to 1000000 do
begin
Y := MurmurHash3(UIntToStr(I));
Y := Y mod Length(HashList);
Inc(HashList[Y]);
if HashList[Y] > 1 then
Inc(TotalCollisionsCount);
if HashList[Y] > MostCollidingSlotItemCount then
MostCollidingSlotItemCount := HashList[Y];
end;
Writeln('Total: ' + IntToStr(TotalCollisionsCount) + ' Max: ' + IntToStr(MostCollidingSlotItemCount));
Here is the result I get:
Total: 48169 Max: 5
Am I missing something?
This is what you get when you put 1M items randomly into 10M cells
calendar_size=10000000 nperson = 1000000
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 9048262 (0.904826) 0 (0.000000) 0 0 0
1: 905064 (0.090506) 905064 (0.905064) 1 905064 905064
2: 45136 (0.004514) 90272 (0.090272) 3 135408 1040472
3: 1488 (0.000149) 4464 (0.004464) 6 8928 1049400
4: 50 (0.000005) 200 (0.000200) 10 500 1049900
----+---------+--------+----------+--------+------+--------+--------
5: 10000000 1000000 1.049900 1049900
The left column is the number of items in a cell. The second: the number of cells having this itemcount.
WRT the binary search: it is obvious that for small tables like this (maximum chain length=4, but most chains are of length=1), linear search outperforms binary search. The takeover-point is probably somewhere between 10 and 100.

Stata counting substring

My table looks like this:
ID AQ_ATC amountATC
. "A05" 1
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 1
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 1
441 "C05AA03" 130
The count for the full 8-character AQ_ATC codes is already correct.
The shorter codes are unique in the table and are substrings of the complete 8-character codes (they represent the first x characters).
What I am looking for is the count of the appearances of the shorter codes throughout the entire table.
For example in this case the resulting table would be
ID AQ_ATC amountATC
. "A05" 2715 <-- 2525 + 190
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 7430 <-- 4330 + 3100
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 130 <-- 130
441 "C05AA03" 130
The partial codes do not overlap, by what I mean that if there is "C05" there wont be another partial code "C05A1".
I created the amountATC column using
bysort ATC: egen amountATC = total(AQ_ATC==AQ_ATC)
I attempted recycling the code that I had received yesterday but failed in doing so.
My attempt looks like this:
levelsof AQ_ATC, local(ATCvals)
quietly foreach y in AQ_ATC {
local i = 0
quietly foreach x of local ATCvals {
if strpos(`y', `"`x'"') == 1{
local i = `i'+1
replace amountATC = `i'
}
}
}
My idea was to use a counter "i" and increase it by 1 everytime the an AQ_ATC starts with another AQ_ATC code. Then I write "i" into amountATC and after I iterated over the entire table for my AQ_ATC, I will have an "i"-value that will be equal to the amount of occurences of the substring. Then I reset "i" to 0 and continue with the next AQ_ATC.
At least thats how I intended for it to work, what it did in the end is set all amountATC-values to 1.
I also attempted looking into different egen-functions such as noccur and moss, but my connection keeps timing out when I attempt to install the packages.
It seems as if you come from another language and you insist in using loops when not strictly necessary. Stata does many things without explicit loops, precisely because commands already apply to all observations.
One way is:
clear
set more off
input ///
ID str15 AQ_ATC amountATC
. "A05" 1
123 "A05AA02" 2525
234 "A05AA02" 2525
991 "A05AD39" 190
. "C10" 1
441 "C10AA11" 4330
229 "C10AA22" 3100
. "C05AA" 1
441 "C05AA03" 130
end
*----- what you want -----
sort AQ_ATC ID
gen grou = sum(missing(ID))
bysort grou AQ_ATC: gen tosum = amountATC if _n == 1 & !missing(ID)
by grou: egen s = total(tosum)
replace amountATC = s if missing(ID)
list, sepby(grou)
Edit
With your edit the same principles apply. Below code that adjusts to your change and slightly changes the code (one line less):
*----- what you want -----
sort AQ_ATC
gen grou = sum(missing(ID))
bysort grou: gen s = sum(amountATC) if AQ_ATC != AQ_ATC[_n+1] & !missing(ID)
by grou: replace amountATC = s[_N] if missing(ID)
More efficient should be:
<snip>
bysort grou: gen s = sum(amountATC) if AQ_ATC != AQ_ATC[_n+1]
by grou: replace amountATC = s[_N] - 1 if missing(ID)
Some comments:
sort is a very handy command. If you sort the data by AQ_ATC they are arranged in such a way that the short (sub)strings are placed before corresponding long strings.
The by: prefix is fundamental and very helpful, and I noticed you can use it after defining appropriate groups. I created the groups taking advantage of the fact that all short (sub)strings have a missing(ID).
Then (by the groups just defined) you only want to add up one value (observation) per amountATC. That's what the condition if AQ_ATC != AQ_ATC[_n+1] does.
Finally, replace back into your original variable. I would usually generate a copy and work with that, so my original variable doesn't suffer.
An excellent read for the by: prefix is Speaking Stata: How to move step by: step, by Nick Cox.
Edit2
Yet another slightly different way:
*----- what you want -----
sort AQ_ATC
gen grou = sum(missing(ID))
egen t = tag(grou AQ_ATC)
bysort grou: gen s = sum(amountATC * t)
by grou: replace amountATC = s[_N] - 1 if missing(ID)

SAS - hash tables and has_next

I'm looking for an elegant solution to the below issue that will help avoid code duplication. You can see that this line:
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
Gets repeated in this code:
data results;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
ht1.has_next(result: ht1_has_more);
do while(ht1_has_more);
rc = ht1.find_next();
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
ht1.has_next(result: ht1_has_more);
end;
end;
run;
I've simplified the above example to a single line as the real code block is quite long and complex. I'd like to avoid using a %macro snippet or a %include if possible as I'd like to keep the logic "within" the data step.
Here's some sample data:
data auction;
input auction_id;
datalines;
111
222
333
;
run;
data buyers;
input auction_id potential_buyer $;
datalines;
111 a
111 c
222 a
222 b
222 c
333 d
;
run;
I figured it out. Turned out to be pretty simple in the end just had a little trouble wrapping my brain around it:
data results;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
keep_processing = 1;
do while(keep_processing);
put auction_id= potential_buyer= ;* THIS GETS DOESNT GET REPEATED ANYMORE =);
ht1.has_next(result: keep_processing);
rc = ht1.find_next();
end;
end;
run;
You can solve it this way....but Rob's answer is better.
data results;
%Macro NoDuplicate;
Put auction_id= potential_buyer= ; * No Longer Duplicated;
%Mend noduplicate;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
%NoDuplicate
ht1.has_next(result: ht1_has_more);
do while(ht1_has_more);
rc = ht1.find_next();
%NoDuplicate
ht1.has_next(result: ht1_has_more);
end;
end;
run;

Resources