proc gplot Spaghetti Plot on more than 255 subjects - graph

Hi I have a question when trying to make a spaghetti plot. I don't want each subject to have different symbols or colors. I just need them to each have a black segmented line. I have been able to do it successfully with fewer subjects, by just create the same symbol statement for everyone and use gplot, but when I do it with more than 255 subjects, SAS complains that I can't have more than 255 symbols. Is there a way to do this?
data _null_;
set ptdata&trtn. end=eof;
retain patcount 0;
by usubjid;
if first.usubjid then patcount+1;
if last.usubjid then lastgfr='Y';
call symput('sym'||trim(left(patcount)),
'symbol'||trim(left(patcount))
|| ' '|| 'c=black'|| ' '||'v=Dot'||' '
|| 'i=join'|| ' ' || 'line=1' || 'width=1' ||';');
if eof then call symput('total',patcount);
run;
%macro symbol;
%do j=1 %to &total;
&&sym&j
%end;
%mend symbol;
%symbol
proc gplot data = ptdata&trtn. ;
plot change_since_bl*FUPTIME=usubjid /haxis=axis3 vaxis=axis4 href=0 nolegend;
format change_since_bl 8. ;*/
run ;

I would use PROC SGPLOT, it is not limited to 255 like GPLOT and it is easier to use.
Try this:
data test;
do person=1 to 256;
value = 100;
do time=0 to 10;
value = value + rannor(1);
output;
end;
end;
run;
proc sgplot data=test noautolegend;
series x=time y=value / group=person lineattrs=(color=black pattern=dash) ;
run;
I think this is what you are looking for.

Related

SAS: Can you save the input table of a SAS generated bar-line chart?

So I am generating a SAS bar-line chart in SAS with a dataset which looks like this:
id date default var1 log_var1 square_var1 ... cubic_var1
1 1 1 5 -3.3 0.9 1.2
1 2 0 15 -9.9 2.7 3.6
2 1 1 10 -6.6 1.8 2.4
...
Note, the transformations are not
log(var1)
but actually the transformation from the regression so
log_var1 = alpha + beta log(var1)
Now I use the following code, generated by the SAS task for bar-line chart:
SYMBOL1
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CI=WHITE
CV = _STYLE_
;
SYMBOL2
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CV = _STYLE_
;
SYMBOL3
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CV = _STYLE_
;
SYMBOL4
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CV = _STYLE_
;
SYMBOL5
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CV = _STYLE_
;
SYMBOL6
INTERPOL=JOIN
HEIGHT=10pt
VALUE=SQUARE
LINE=1
WIDTH=2
CI=WHITE
CV = _STYLE_
;
Legend2
FRAME
;
Legend1
FRAME
;
Axis1
STYLE=1
WIDTH=1
MINOR=NONE
;
Axis2
STYLE=1
WIDTH=1
;
Axis3
STYLE=1
WIDTH=1
MINOR=NONE
;
TITLE;
TITLE1 "Bar-Line Chart";
FOOTNOTE;
FOOTNOTE1 "Generated by the SAS System (&_SASSERVERNAME, &SYSSCPL) on %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) at %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC GBARLINE DATA=WORK.SORTTempTableSorted
;
BAR var1
/
FRAME LEVELS=25
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
LEGEND=LEGEND2
;
PLOT / SUMVAR=default
TYPE=MEAN
AXIS=AXIS3
LEGEND=LEGEND1
;
PLOT / SUMVAR=lin_var1
TYPE=MEAN
AXIS=AXIS3
;
PLOT / SUMVAR=sigmoid_var1
TYPE=MEAN
AXIS=AXIS3
;
PLOT / SUMVAR=square_var1
TYPE=MEAN
AXIS=AXIS3
;
PLOT / SUMVAR=cubic_var1
TYPE=MEAN
AXIS=AXIS3
;
PLOT / SUMVAR=log_var1
TYPE=MEAN
AXIS=AXIS3
;
/* -------------------------------------------------------------------
End of task code
------------------------------------------------------------------- */
RUN; QUIT;
%_eg_conditional_dropds(WORK.SORTTempTableSorted);
TITLE; FOOTNOTE;
GOPTIONS RESET = SYMBOL;
My question is:
Can I somehow store or save the input to create this histogram?
I.e. a table that contains the mean value for default,
var1, square_var1, cubic_var1 for the 25 equally spaced bins?
The premise of doing this is that all the inputs are on different scales and so I'd like to standardise the inputs and then plot the graphs
Note: I can take the time to code up the binning myself but this would truly be a trick of a lazy programmer!
There is no option on the GBARLINE procedure for outputting the plotting parameters it computes. Your default graphical options probably creates a png image for an html page that is used to present the chart for viewing.
Change the graphics devices to svg and ODS will create html source that contains the drawing instructions for creating the image seen. The instructions will be in the <g> tag. So, if you are truly motivated to be lazy and not hand code the midpoints and axis values, you can write code to parse the html and scrape the computed midpoints and axis ticks from within the <g> tag.
ods html5 file="c:\temp\gbarline.html";
goptions reset=all;
goptions device=svg;
… gbarline …
ods html5 close;
… parse the ODS created c:\temp\gbarline.html …

Cannot modify specific variable values over a specific dimension in netcdf

I have a netcdf file containing 4-D variables:
variables:
double maxvegetfrac(time_counter, veget, lat, lon) ;
maxvegetfrac:_FillValue = 1.00000002004088e+20 ;
maxvegetfrac:history = "From Topo.115MaCTRL_WAM_360_180" ;
maxvegetfrac:long_name = "Vegetation types" ;
maxvegetfrac:missing_value = 1.e+20f ;
maxvegetfrac:name = "maxvegetfrac" ;
maxvegetfrac:units = "-" ;
double mask_veget(time_counter, veget, lat, lon) ;
mask_veget:missing_value = -1.e+34 ;
mask_veget:_FillValue = -1.e+34 ;
mask_veget:long_name = "IF MYVEG4 EQ 10 AND I GE 610 AND J GT 286 THEN 16 ELSE MYVEG4" ;
mask_veget:history = "From desert_115Ma_3" ;
I'd like to use the variable "mask_veget" as a mask to alter values of the variable "maxvegetfrac" over specific regions, and over chosen values of its "veget" dimension.
To do so I am using ncap2. For example, if I want to set maxvegetfrac values over the 5th rank of veget dimension to 500 where mask_veget equals 6, I do :
> ncap2 -s "where (mask_veget(:,:,:,:)== 6) maxvegetfrac(:,5,:,:) = 500" test.nc
My problem is that in the resulting test.nc file, maxvegetfrac has been modified at the first rank of "veget" dimension, not the 5th one. And I get the same result if I run the script over the entire veget dimension:
ncap2 -s "where (mask_veget(:,:,:,:)== 6) maxvegetfrac(:,:,:,:) = 500" test.nc
So I am mistaking somewhere, but... where ?
Any help appreciated !
A couple of things you may not be aware of
you shouldn't be hyperslabbing a variable in the where body -it makes no sense at the moment.
It is ok to hyperslab in the where statement proving its a single index
as a dim with a single value collapses
Try this:
/*** hyper.nco *****/
maxvegetfrac5=maxvegetfrac(:,5,:,:);
where( mask_veget(:,5,:,:)== 6 )
maxvegetfrac5=500.0;
/* put the hyperslab back in */
maxvegetfrac(:,5,:,:)=maxvegetfrac5;
/* script end *****/
run the script now with the command
ncap2 -v -O -S hyper.nco test.nc out.nc
...Henry

awk count and sum based on slab:

Would like to extract all the lines from first file (GunZip *.gz i.e Input.csv.gz), if the first file 4th field is falls within a range of
Second file (Slab.csv) first field (Start Range) and second field (End Range) then populate Slab wise count of rows and sum of 4th and 5th field of first file.
Input.csv.gz (GunZip)
Desc,Date,Zone,Duration,Calls
AB,01-06-2014,XYZ,450,3
AB,01-06-2014,XYZ,642,3
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,205,3
AB,01-06-2014,XYZ,98,1
AB,01-06-2014,XYZ,455,1
AB,01-06-2014,XYZ,120,1
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,193,1
AB,01-06-2014,XYZ,0,0
AB,01-06-2014,XYZ,161,2
Slab.csv
StartRange,EndRange
0,0
1,10
11,100
101,200
201,300
301,400
401,500
501,10000
Expected Output:
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,0,0
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
I am using below two commands to get the above output , expect "NotFound"cases .
awk -F, 'NR==FNR{s[NR]=$1;e[NR]=$2;c[NR]=$0;n++;next} {for(i=1;i<=n;i++) if($4>=s[i]&&$4<=e[i]) {print $0,","c[i];break}}' Slab.csv <(gzip -dc Input.csv.gz) >Op_step1.csv
cat Op_step1.csv | awk -F, '{key=$6","$7;++a[key];b[key]=b[key]+$4;c[key]=c[key]+$5} END{for(i in a)print i","a[i]","b[i]","c[i]}' >Op_step2.csv
Op_step2.csv
101,200,3,474,4
501,10000,1,642,3
0,0,3,0,0
401,500,2,905,4
11,100,1,98,1
201,300,1,205,3
Any suggestions to make it one liner command to achieve the Expected Output , Don't have perl , python access.
Here is another option using perl which takes benefits of creating multi-dimensional arrays and hashes.
perl -F, -lane'
BEGIN {
$x = pop;
## Create array of arrays from start and end ranges
## $range = ( [0,0] , [1,10] ... )
(undef, #range)= map { chomp; [split /,/] } <>;
#ARGV = $x;
}
## Skip the first line
next if $. ==1;
## Create hash of hash
## $line = '[0,0]' => { "count" => counts , "sum4" => sum_of_col4 , "sum5" => sum_of_col5 }
for (#range) {
if ($F[3] >= $_->[0] && $F[3] <= $_->[1]) {
$line{"#$_"}{"count"}++;
$line{"#$_"}{"sum4"} +=$F[3];
$line{"#$_"}{"sum5"} +=$F[4];
}
}
}{
print "StartRange,EndRange,Count,Sum-4,Sum-5";
print join ",", #$_,
$line{"#$_"}{"count"} //"NotFound",
$line{"#$_"}{"sum4"} //"NotFound",
$line{"#$_"}{"sum5"} //"NotFound"
for #range
' slab input
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,0,0
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
Here is one way using awk and sort:
awk '
BEGIN {
FS = OFS = SUBSEP = ",";
print "StartRange,EndRange,Count,Sum-4,Sum-5"
}
FNR == 1 { next }
NR == FNR {
ranges[$1,$2]++;
next
}
{
for (range in ranges) {
split(range, tmp, SUBSEP);
if ($4 >= tmp[1] && $4 <= tmp[2]) {
count[range]++;
sum4[range]+=$4;
sum5[range]+=$5;
next
}
}
}
END {
for(range in ranges)
print range, (count[range]?count[range]:"NotFound"), (sum4[range]?sum4[range]:"NotFound"), (sum5[range]?sum5[range]:"NotFound") | "sort -t, -nk1,2"
}' slab input
StartRange,EndRange,Count,Sum-4,Sum-5
0,0,3,NotFound,NotFound
1,10,NotFound,NotFound,NotFound
11,100,1,98,1
101,200,3,474,4
201,300,1,205,3
301,400,NotFound,NotFound,NotFound
401,500,2,905,4
501,10000,1,642,3
Set the Input, Output Field Separators and SUBSEP to ,. Print the Header line.
If it is the first line skip it.
Load the entire slab.txt in to an array called ranges.
For every range in the ranges array, split the field to get start and end range. If the 4th column is in the range, increment the count array and add the value to sum4 and sum5 array appropriately.
In the END block, iterate through the ranges and print them.
Pipe the output to sort to get the output in order.

Using Macros to generate graph parameters SAS

I am attempting to automate a graph process using SAS macros. Since this will be used for several different subsets, the axes of the graph must be adjusted accordingly. I haver tried a few different ways and feel that I'm going the wrong way down the rabbit hole.
Here is my dataset.
data want;
input A B C D;
cards;
100 5 6 1
200 5 5 2
150 5.5 5.5 3
457 4.2 6.2 4
500 3.7 7.0 5
525 3.5 7.2 6
;
run;
What I want is a graph that has the following axis specs:
x-axis from min(D) to max(D) by some reasonable increment
left-axis from min(A) to max(A)
right-axis from min (B,C) to max(B,C)
Here is my latest attempt:
proc sql;
select roundz((max(A)+100), 100),
roundz(min(A), 100),
(&maxA.-&minA.)/10,
roundz(max(B, C)+1, 1),
roundz(min(B, C), 1),
(&maxBC.-&minBC.)/10,
roundz(max(D), 1),
roundz(min(D), 1),
(&maxD.-&minD.+1)/3
into :maxA, :minA, :Ainc,
:maxBC, :minBC, :BCinc,
:maxD, :minD, :Dinc
from want;
run;
goptions reset=all ftext=SWISS htext=2.5 ;
axis1 order=(&minA to &maxA by &Ainc) minor=none label=(angle=90 'A label' ) offset=(1) ;
axis2 order=(&minBC to &maxBC by &BCinc) minor=(number=1) label=(angle=90 'BC Label') offset=(1);
axis3 order=(&minD to &maxD by &Dinc) minor=(number=2) label=('D') offset=(1) ;
symbol1 color=black i=join value=circle height=2 width=2 ;
symbol2 color=black i=join value=square height=2 width=2 ;
symbol3 color=black i=join value=triangle height=2 width=2 ;
legend1 label=none mode=reserve position=(top center outside) value=('Label here' ) shape=symbol(5,1) ;
legend2 label=none mode=reserve position=(top center outside) value=('label 1' 'label 2') shape=symbol(3,1) ;
proc gplot data=want;
plot A*D=1 /overlay legend=legend1 vaxis=axis1 haxis=axis3 ;
plot2 B*D=2 &var_C*D=3 /overlay legend=legend2 vaxis=axis2 ;
run ;
Any help would be greatly appreciated. Even if that means a completely different way of doing it (though I'd also be interested to see where I am going wrong here).
Thanks, Pyll
What you're doing is sort-of writing a macro without writing a macro. Write the macro and this is easier. Also, if you're going to have the INCs always be 1/10ths, put that in let statements (although if they might vary in their conception, then leave them as parameters).
%macro graph_me(minA=,maxA=, minBC=,maxBC=, minD=, maxD=);
%let incA = %sysevalf((&maxA.-&minA.)/10); *same for incD and incBC;
goptions reset=all ftext=SWISS htext=2.5 ;
axis1 order=(&minA to &maxA by &incA) minor=none label=(angle=90 'A label' ) offset=(1) ;
axis2 order=(&minBC to &maxBC by &incBC) minor=(number=1) label=(angle=90 'BC Label') offset=(1);
axis3 order=(&minD to &maxD by &incD) minor=(number=2) label=('D') offset=(1) ;
symbol1 color=black i=join value=circle height=2 width=2 ;
symbol2 color=black i=join value=square height=2 width=2 ;
symbol3 color=black i=join value=triangle height=2 width=2 ;
legend1 label=none mode=reserve position=(top center outside) value=('Label here' ) shape=symbol(5,1) ;
legend2 label=none mode=reserve position=(top center outside) value=('label 1' 'label 2') shape=symbol(3,1) ;
%mend graph_me;
Now write your SQL call to grab those parameters into the macro call itself.
proc sql NOPRINT;
select
cats('%graph_me(minA=',roundz(min(A), 100),
',maxA=', roundz((max(A)+100), 100),
... etc. ...
into :mcall
from want;
quit;
This gives you the advantage that you may be able to generate multiple calls if you, for example, want to do this grouped by some variable (having one graph per variable value).
2 things in the sql:
you cannot use the macros you are creating and you need just one value, when doing max(B,C) you are creating as many values as there are obs in the dataset, you need another max.
I cannot check the sas graph part as I do not have it, but
proc sql NOPRINT;
select roundz((max(A)+100), 100) as maxA,
roundz(min(A), 100) as minA,
((calculated maxA)-(calculated minA))/10,
roundz(max(max(B, C))+1, 1) as maxBC,
roundz(min(min(B, C)), 1) as minBC,
((calculated maxBC)-(calculated minBC))/10,
roundz(max(D), 1) as maxD,
roundz(min(D), 1) as minD,
((calculated maxD)-(calculated minD)+1)/3
into :maxA, :minA, :Ainc,
:maxBC, :minBC, :BCinc,
:maxD, :minD, :Dinc
from want;
quit;

SAS - hash tables and has_next

I'm looking for an elegant solution to the below issue that will help avoid code duplication. You can see that this line:
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
Gets repeated in this code:
data results;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
ht1.has_next(result: ht1_has_more);
do while(ht1_has_more);
rc = ht1.find_next();
put auction_id= potential_buyer= ;* THIS GETS REPEATED;
ht1.has_next(result: ht1_has_more);
end;
end;
run;
I've simplified the above example to a single line as the real code block is quite long and complex. I'd like to avoid using a %macro snippet or a %include if possible as I'd like to keep the logic "within" the data step.
Here's some sample data:
data auction;
input auction_id;
datalines;
111
222
333
;
run;
data buyers;
input auction_id potential_buyer $;
datalines;
111 a
111 c
222 a
222 b
222 c
333 d
;
run;
I figured it out. Turned out to be pretty simple in the end just had a little trouble wrapping my brain around it:
data results;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
keep_processing = 1;
do while(keep_processing);
put auction_id= potential_buyer= ;* THIS GETS DOESNT GET REPEATED ANYMORE =);
ht1.has_next(result: keep_processing);
rc = ht1.find_next();
end;
end;
run;
You can solve it this way....but Rob's answer is better.
data results;
%Macro NoDuplicate;
Put auction_id= potential_buyer= ; * No Longer Duplicated;
%Mend noduplicate;
attrib potential_buyer length=$1;
set auction;
if _n_ eq 1 then do;
declare hash ht1(dataset:'buyers', multidata: 'y');
ht1.definekey('auction_id');
ht1.definedata('potential_buyer');
ht1.definedone();
call missing (potential_buyer);
end;
**
** LOOP THROUGH EACH POTENTIAL BUYER AND PROCESS THEM
*;
if ht1.find() eq 0 then do;
%NoDuplicate
ht1.has_next(result: ht1_has_more);
do while(ht1_has_more);
rc = ht1.find_next();
%NoDuplicate
ht1.has_next(result: ht1_has_more);
end;
end;
run;

Resources