SAS: PROC SGPLOT BY GROUP auto file names - plot

I am plotting some data using BY GROUP option. While I am able to use #byval option to automatically put BY GROUP value in title of each plot, but I want to save each plot individually and want to name it after #byval instead of calling it - SGPLOT01, SGPLOT02 ...
e.g. Lets say I have:
data xyz;
input type$ x y1 y2##;
cards;
A 1 5 7
A 2 7 9
A 3 8 10
B 1 5 7
B 2 7 9
B 3 8 10
;;
RUN;
PROC SGPLOT DATA=xyz;
by type;
series1 x=x y=y1/markers;
series2 x=x y=y2/markers;
title "#byval";
RUN;
In this example, two plots will be created one each for type A and B. But program will automatically name them as SGPLOT1.pdf and SGPLOT2.pdf. I would rather want to name them A.pdf and B.pdf, and want to save them to directory "C:/SGPLOTS/".
Thanks for your help.

One option is to use ODS and put use a macro to print each TYPE separately, like in the following example.
data xyz;
input type$ x y1 y2 ##;
cards;
A 1 5 7
A 2 7 9
A 3 8 10
B 1 5 7
B 2 7 9
B 3 8 10
;
RUN;
ods listing close;
%macro plot_it(type=);
goptions reset
device = sasprtc
target = sasprtc
;
ods pdf file="C:/SGPLOTS/&type..pdf" notoc;
PROC SGPLOT DATA=xyz;
by type;
where type = "&type";
series x=x y=y1/markers;
series x=x y=y2/markers;
title "#byval";
RUN;
ods pdf close;
%mend plot_it;
%plot_it(type=A);
%plot_it(type=B);

You want to add the variable name within parentheses, after #BYVAL. In this example you want to put #byval(type) in your title.
I have put your example within something SAS calls an "HTML sandwich," which is two lines on top and two lines on the bottom. In addition I added the helpbrowser option, which tells SAS to use its own capabilities to display the html output.
option helpbrowser=sas;
/**** top of html sandwich *****/
ods html ;
ods graphics on;
/*******************************/
data xyz;
input type$ x y1 y2##;
cards;
A 1 5 7
A 2 7 9
A 3 8 10
B 1 5 7
B 2 7 9
B 3 8 10
;;
RUN;
PROC SGPLOT DATA=xyz;
by type;
series x=x y=y1/markers;
series x=x y=y2/markers;
title "Here is the type: #byval(type)";
RUN;
/**** bottom of html sandwich *****/
ods graphics off;
ods html close;
/**********************************/

Related

Create graph having nodes with no labels and no separations between its edges with Graphviz

I want to generate this kind of graphs with graphviz :
I tried the following code :
graph{
node [shape=none label=""]
1 [pos="0,0!"]
2 [pos="1.2145,0.694!"]
3 [pos="1.2145,2.082!"]
4 [pos="0.0,2.776!"]
5 [pos="-1.2145,2.082!"]
6 [pos="-1.2145,0.694!"]
1 -- 2
2 -- 3
3 -- 4
4 -- 5
5 -- 6
6 -- 1
}
But I get the following output
Is it possible to make nodes without labels and edges without any separation between than ?
Thanks for your answers.
digraph D {
graph [nodesep=.02]
node [shape=hexagon]
A B C
}
Gives this:
And this:
digraph D {
graph [nodesep=.02]
node [shape=hexagon orientation=30]
A B C
}
Gives this:

Checking on equal values of 2 different data frame row by row

I have 2 different data frame, one is of 5.5 MB and the other is 25 GB. I want to check if these two data frame have the same value in 2 different columns for each row.
For e.g.
x 0 0 a
x 1 2 b
y 1 2 c
z 3 4 d
and
x 0 0 w
x 1 2 m
y 5 6 p
z 8 9 q
I want to check if the 2° and 3° column are equal for each row, if yes I return the 4° columns for the both data frame.Then I should have:
a w
b m
c m
the 2 data frame are sorted respect the 2° and 3° column value. I try in R but the 2° file (25 GB) is too big. How can I obtain this new file in a "faster" (even some hours) way ???
With GNU awk for arrays of arrays:
$ cat tst.awk
NR==FNR { a[$2,$3][$4]; next }
($2,$3) in a {
for (val in a[$2,$3]) {
print val, $4
}
}
$ awk -f tst.awk small_file large_file
a w
b m
c m
and with any awk (a bit less efficiently):
$ cat tst.awk
NR==FNR { a[$2,$3] = a[$2,$3] FS $4; next }
($2,$3) in a {
split(a[$2,$3],vals)
for (i in vals) {
print vals[i], $4
}
}
$ awk -f tst.awk small_file large_file
a w
b m
c m
The above when reading small_file (NR==FNR is only true for the first file read - look up those variables in the awk man page or google) creates an associative array a[] that maps an index created from the concatenation of the 2nd+3rd fields to the list of value of the 4th field for those 2nd/3rd field combinations. Then when reading large_file it looks up that array for the current 2nd/3rd field combination and loops through all of the values stored for that combination in the previous phase printing that value (the $4 from small_file) plus the current $4.
You said your small file is 5.5 MB and the large file is 25 GB. Since 1 MB is about 1,047,600 characters (see https://www.computerhope.com/issues/chspace.htm) and each of your lines is about 8 characters long that means your small file is about 130 thousand lines long and your large one about 134 million lines long so I expect on an average powered computer the above should take no more than a minute or 2 to run, it certainly won't take anything like an hour!
An alternative to the solution of Ed Morton, but with an identical idea:
$ cat tst.awk
NR==FNR { a[$2,$3] = a[$2,$3] $4 ORS; next }
($2,$3) in a {
s=a[$2,$3]; gsub(ORS,OFS $4 ORS,s)
printf "%s",s;
}
$ awk -f tst.awk small_file large_file
a w
b m
c m

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

Gnuplot - plotting label with different lines in hypertext

Having the following text file
0 0 net0 aaaa bbbb cccc
1 1 net1 zzz
2 2 net2 xxx
3 3 net3 yyy
4 5 net4 ttt 0 0
5 5 net5
I need to plot all points described by the first two columns as x,y coordinates and anchoring the informations reported in the following columns (say 3:6) at each point.These info have to be plot separated by newlines, e.g. the point in (0,0) should report (when mouse over it)
net0
aaaa
bbbb
cccc
The script I'm using is the following but it works only with three columns
set terminal canvas enhanced mousing
set termoption enhanced
set label at 0,0 "Origin"
set title 'mouse over points'
plot 'test.txt' using 1:2:3 with labels hypertext point pt 7 ps var lc rgb "black"
It seems that the datafile modifier using works only with three entries.
Any help?
It is unfortunate that your text is not surrounded by "". Nevertheless, you can handle your issue with gnuplot without any external tool.
You didn't specify whether your data columns are separated by TAB or by space.
In the following, I assume that they are separated by single space (code needs to be adapted accordingly if not).
Procedure:
Read the data as full line by setting set datafile separator "\n"
extract the numbers with word()
take the rest of the line as your label text
replace spaces by '\n'
Code:
### Hypertext with columns
reset session
$Data <<EOD
0 0 net0 aaaa bbbb cccc
1 1 net1 zzz
2 2 net2 xxx
3 3 net3 yyy
4 5 net4 ttt 0 0
5 5 net5
EOD
# replace function
# replaces string s1 by string s2 in string s
Replace(s,s1,s2) = (RP_s="", RP_n=1, (sum[RP_i=1:strlen(s)] \
((s[RP_n:RP_n+strlen(s1)-1] eq s1 ? (RP_s=RP_s.s2, RP_n=RP_n+strlen(s1)) : \
(RP_s=RP_s.s[RP_n:RP_n], RP_n=RP_n+1)), 0)), RP_s)
set datafile separator "\n"
GetNumber(n) = real(word(strcol(1),n))
GetText(s) = (s[strstrt(s," ")+1:])[strstrt(s[strstrt(s," ")+1:]," ")+1:]
TextToColumn(s) = Replace(GetText(s),' ','\n')
plot $Data u (GetNumber(1)):(GetNumber(2)):(TextToColumn(strcol(1))) w labels hypertext \
point pt 7 ps 3 lc rgb "red" notitle
### end of code
Result:

autoIt3 how to select n-th matched control?

When using autoIt to get Window's Text and the WinGetText matches multiple controls (e.g. with the same Class SciCalc in this case), the WinGetText will concatenate the text of all the controls. How can I get the Text of the n-th (say 3rd 'MR') control?
e.g.
Local $output = WinGetText("[CLASS:SciCalc]", "")
print
output:666666.
MC
MR
MS
M+
7
4
1
0
8
5
2
+/-
9
6
3
.
/
*
-
+
=
Backspace
CE
C
1/x
sqt
%
Something like this
ControlGetText("[CLASS:SciCalc]","","[CLASS:Button; INSTANCE:3]")
Use AutoIt Window Info to find the Advanced mode details on the wanted control.

Resources