SAS: plot number still surviving by # days - graph

Given data with a number-of-days-to-event, and an outcome, like so:
data pretend ;
do subject = 1 to 1000 ;
fup_time = round(uniform(83386)*500, 1) ;
select(round(uniform(778523)*5, 1)) ;
when(1) outcome = 'cholys' ;
when(2) outcome = 'death' ;
when(3) outcome = 'tx end' ;
when(4) outcome = 'vascul' ;
otherwise outcome = 'reop' ;
end ;
output ;
end ;
label
fup_time = "The day on which ::outcome:: occurred"
outcome = "What the subject's last observed event was"
;
run ;
What's the easiest way to generate curves for each outcome that show what proportion of the sample was still being observed on day X, broken out by outcome?
I tried:
proc lifetest data = pretend plots = all notable ;
strata outcome ;
time fup_time*censored(1) ;
run ;
Where 'censored' is set to 1 whenever outcome is 'tx end' or 'death'. I quite like the product-limit survival curves that produces, except that the lines for death & tx end are completely flat, at y = 1.0.
I'm not actually looking to do any inference here at all--just want the pretty pictures. Is there an easy way?

Related

How to find the filter coefficients for a DVBS2 shaping SRRC?

in the DVBS2 Standard the SRRC filter is defined as
How can i find the filter's time domain coefficients for implementation? The Inverse Fourier transform of this is not clear to me.
For DVBS2 signal you can use RRC match filter before timing recovery. For match filter, you can use this expression:
For example for n_ISI = 32 and Roll of factor = 0.25 with any sample per symbol you can use this Matlab code:
SPS = 4; %for example
n_ISI=32;
rolloff = 0.25;
n = linspace(-n_ISI/2,n_ISI/2,n_ISI*SPS+1) ;
rrcFilt = zeros(size(n)) ;
for iter = 1:length(n)
if n(iter) == 0
rrcFilt(iter) = 1 - rolloff + 4*rolloff/pi ;
elseif abs(n(iter)) == 1/4/rolloff
rrcFilt(iter) = rolloff/sqrt(2)*((1+2/pi)*sin(pi/4/rolloff)+(1-2/pi)*cos(pi/4/rolloff)) ;
else
rrcFilt(iter) = (4*rolloff/pi)/(1-(4*rolloff*n(iter)).^2) * (cos((1+rolloff)*pi*n(iter)) + sin((1-rolloff)*pi*n(iter))/(4*rolloff*n(iter))) ;
end
end
But if you want to use SRRC, there are two ways: 1. You can use its frequency representation form if you use filtering in the frequency domain. And for implementation, you can use the expression that you've noted. 2. For time-domain filtering, you should define the FIR filter with its time representation sequence. The time representation of such SRRC pulses is shown to adopt the following form:

Scilab - Finding the average of randomly generated numbers with different indices

Say I have a function that generates random integers between say (1,10).
Then, I define a (for loop) which has 5 iterations.
This (for loop) calls the function defined above, and then it checks the value of the integer generated by the function. Depending on the value of the integer, the variable "cost" is decided.
function integer = UniformInt(a, b)
integer = min( floor( rand()*(b-a) ) + a , b);
endfunction
for i=1:5
x(i) = UniformInt(1,10);
if (x(i)<4) then
cost(i) = 15;
elseif (4<=x(i) && x(i)<=8) then
cost(i) = 27;
else
cost(i) = 35;
end
end
Now, when I run that and find x, say the values generated are:
5.
9.
5.
2.
2.
And so, the different cost values will be:
27.
35.
27.
15.
15.
This is all good so far. Now, what I want to do with these results is:
Check how many times each value of x appeared. I can do this via Scilab's tabul function:
9. 1.
5. 2.
2. 2.
Now, what I really want to code is:
x=9 only showed up once, so, the average of cost is 35/1 = 35.
x=5 showed up twice, so, the average of cost is (27+27)/2 = 27.
x=2 showed up twice, so, the average of cost is (15+15)/2 = 15.
How would I do that?
For posterity, a code in which the answer provided by user #Stéphane Mottelet would be useful (because my code above is trivial) is as follows:
function integer = UniformInt(a, b)
integer = min( floor( rand()*(b-a) ) + a , b);
endfunction
for i=1:5
x(i) = UniformInt(1,10);
if (x(i)<4) then
cost(i) = 15*rand();
elseif (4<=x(i) && x(i)<=8) then
cost(i) = 27*rand();
else
cost(i) = 35*rand();
end
end
Now the cost values are multiplied by a random number, and so,
if the value of say x=10 showed up 2 times, the average cost when x=10 will not simply be (35+35)/2.
I would do like this, but the answer is quite trivial since for now the value of cost is the same for a given x value (I suppose your real application draws random values of cost)
t = tabul(x);
for k = 1:size(t,1)
avg(k) = mean(cost(x == t(k)));
end

Getting a zero value for a variable that clearly should not be zero

I have written the following code to calculate temperature distribution along a fin. For some reason, it keeps calculating certain values as zero, even when they aren't!
for t = 0:300:3000
for i = 2:L2
if i ==2
A(i)= 0.75*rhocp*DX/DT + 3*k/dx;
B(i)= k/dx;
C(i)= 2*k/dx;
D(i)= 0.75*rhocp*(DX/DT)*T(i);
elseif i == L2
A(i)= 0.75*rhocp*DX/DT + 3*k/dx;
B(i)= 2*k/dx;
C(i)= k/dx;
D(i)= 0.75*rhocp*(DX/DT)*T(i);
else
A(i)= 0.75*rhocp*DX/DT + 2*k/dx;
B(i)= k/dx;
C(i)= k/dx;
D(i)= rhocp*(DX/DT)*T(i);
end
P(1) = 0;
Q(1) = T(1);
for i = 2:L2
DENO = A(i) - C(i)*P(i-1);
NUM = D(i)+ C(i)*Q(i-1);
P(i) = B(i)/DENO;
Q(i) = NUM/DENO;
end
for i =L2:-1:2
if i == L2
T(i) = Q(i);
else
T(i) = P(i)*T(i+1) + Q(i);
end
end
T(L1) = ((2*k*T(L2) - h*DX*Tinf)/(2*k - h*DX));
end
disp (T);
`
DENO and NUM is calculated as zero in the first iteration, even though on calculating their values is not zero! This leads to "Division by zero" error.
A(2)-C(2)*P(1)
ans =
3750.
Analytically it has got a value though.
Please give the parameter values so one can run your script...
DT was missing , i set it to 1 .
Finally I found the problem: you forgot an end to close the loop
for i = 2:L2
the end has to be put just before P(1)=0
Moreover this loop does not depend on the t value so it could be put before the loop for t = 0:300:3000

plot average of n'th rows in gnuplot

I have some data that I want to plot them with gnuplot. But I have for the same x value many y values, I will show you to understand well:
0 0.650765 0.122225 0.013325
0 0.522575 0.001447 0.010718
0 0.576791 0.004277 0.104052
0 0.512327 0.002268 0.005430
0 0.530401 0.000000 0.036541
0 0.518333 0.001128 0.017270
20 0.512864 0.001111 0.005433
20 0.510357 0.005312 0.000000
20 0.526809 0.001089 0.033523
20 0.527076 0.000000 0.034215
20 0.507166 0.001131 0.000000
20 0.513868 0.001306 0.004344
40 0.531742 0.003295 0.0365
In this example, I have 6 values for each x value.So how can I draw the average and the confidence bar(interval) ??
thanks for help
To do this, you will need some kind of external processing. One possibility would be to use gawk to calculate the required quantities and the feed this auxiliary output to Gnuplot to plot it. For example:
set terminal png enhanced
set output 'test.png'
fName = 'data.dat'
plotCmd(col_num)=sprintf('< gawk -f analyze.awk -v col_num=%d %s', col_num, fName)
set format y '%0.2f'
set xr [-5:25]
plot \
plotCmd(2) u 1:2:3:4 w yerrorbars pt 3 lc rgb 'dark-red' t 'column 2'
This assumes that the script analyze.awk resides in the same directory from which Gnuplot is launched (otherwise, it would be necessary to modify the path in the -f option of gawk. The script analyze.awk itself reads:
function analyze(x, data){
n = 0;mean = 0;
val_min = 0;val_max = 0;
for(val in data){
n += 1;
delta = val - mean;
mean += delta/n;
val_min = (n == 1)?val:((val < val_min)?val:val_min);
val_max = (n == 1)?val:((val > val_max)?val:val_max);
}
if(n > 0){
print x, mean, val_min, val_max;
}
}
{
curr = $1;
yval = $(col_num);
if(NR==1 || prev != curr){
analyze(prev, data);
delete data;
prev = curr;
}
data[yval] = 1;
}
END{
analyze(curr, data);
}
It directly implements the online algorithm to calculate the mean and for each distinct value of x prints this mean as well as the min/max values.
In the Gnuplot script, the column of interest is then passed to the plotCmd function which prepares the command to be executed and the output of which will be plotted with u 1:2:3:4 w yerrorbars. This syntax means that the confidence interval is stored in the 3rd/4th columns while the value itself (the mean) resides in the second column.
In total, the two scripts above produce the picture below. The confidence interval on the last point is not visible since the example data in your question contain only one record for x=40, thus the min/max values coincide with the mean.
You can easily plot the average in this case:
plot "myfile.dat" using ($1):($2 + $3 + $4)/3
If you want average of only second and fourth column for example, you can write ($2+$4)/2 and so on.

Stagger labels on an Influence plot in SAS?

I'm generating some plots (for a class) for a colorblind professor. The JOURNAL2 style, in SAS, uses grey scale. However, the plots put all of the points right on top of each other. Is there an option to scatter them around the point or use call out lines so that they are easier to read?
Here's the code I'm using
ODS HTML STYLE = JOURNAL2;
PROC LOGISTIC DATA = fludata PLOTS(UNPACK ONLY LABEL) = (LEVERAGE DFBETAS DPC INFLUENCE PHAT);
CLASS gender(PARAM = ref REF = 'Female')
newincome(PARAM = ref REF = '03 - High ');
MODEL flu(EVENT = 'Yes') = gender newincome / CTABLE PPROB = .49 TO .5 BY .001;
OUTPUT OUT = predict P = pred;
RUN;
Here's an example of an illegible plot:
Any thoughts about a better way to do this?
Don's suggestion of contacting SAS Support is probably apropos, but in the meanwhile here's an example of rolling your own.
ODS HTML STYLE = journal;
data us_data;
set sashelp.us_data;
length density $8 seat_change $15;
if density_2010 < 50 then density="1 Low";
else if density_2010 < 400 then density="2 Med";
else density="3 High";
if seat_change_2010 > 0 then seat_change='Positive';
else seat_change="Nonpositive";
keep density seat_change region;
run;
PROC LOGISTIC DATA = us_data PLOTS(UNPACK ONLY LABEL) = (LEVERAGE DFBETAS DPC INFLUENCE PHAT);
CLASS REGION(PARAM = ref REF = 'Northeast')
density(PARAM = ref REF = '3 High');
MODEL seat_change(EVENT = 'Positive') = REGION density / CTABLE PPROB = .49 TO .5 BY .001;
OUTPUT OUT = predict P = pred difchisq=difchisq c=cidisp;
RUN;
proc sgplot data=predict;
scatter x=pred y=difchisq /group=region groupdisplay=cluster datalabel;
run;
Obviously you'd have to run each one separately this way, although the programming isn't all that hard.

Resources