Graph in SAS coming up with strange values using gchart function - graph

I am trying to create a graph from a table I've made. I want to graph the values for month with the numbers in the Scheduled column. Unfortunately, it is displaying the months as like .75 or 2.25 and 4.75 instead of the actual month numbers and I don't know why.
I have tried changing the type of graph, the sumvar, the axes and values for them, but none of this has helped... it worked at one point but then simply stopped and I cannot figure out why.
1 SKED 7573
1 UNSK 1882
2 SKED 6635
2 UNSK 1642
3 SKED 817
3 UNSK 208
4 SKED 9494
4 UNSK 2376
5 SKED 1900
5 UNSK 551
6 SKED 9864
6 UNSK 3319
7 SKED 9770
7 UNSK 4145
pattern1 value=solid color=CXc01933;
pattern2 value=solid color=CX003366;
axis1 label=(angle=90 'Amount of Wheelchair Requests');
axis2 label=('Month') order=(0 to 12 by 1);
proc gchart data=Overall_Arr;
vbar month / type=sum SUMVAR=Arr_num subgroup=scheduled raxis=axis1 maxis=axis2
autoref clipref ;
run;
This is the table and this is the code to make the graph. I am expecting an output of a graph with two different colored bars, signifying the scheduled number and the unscheduled number. Before I put the order on the second axis it would output a graph but would have strange numbers for the month, like .75 or 4.25, etc, instead of using the 1 2 3 etc to signify the months. Now it is outputting no bars, I am assuming because it is trying to use those weird numbers but I've restricted the axis to whole numbers for the month... Any help would be appreciated.

Alright I actually think I figured it out, the problem was that month is also a command, so changing my variable's name allowed for it to be a variable instead of a command.

Related

Calendar (again) manipulations in R

I have code like this:
today<-as.Date(Sys.Date())
spec<-as.Date(today-c(1:1000))
df<-data.frame(spec)
stage.dates<-as.Date(c('2015-05-31','2015-06-07','2015-07-01','2015-08-23','2015-09-15','2015-10-15','2015-11-03'))
stage.vals<-c(1:8)
stagedf<-data.frame(stage.dates,stage.vals)
df['IsMonthInStage']<-ifelse(format(df$spec,'%m')==(format(stagedf$stage.dates,'%m')),stagedf$stage.vals,0)
This is producing the incorrect output, i.e.
df.spec, df.IsMonthInStage
2013-05-01, 0
2013-05-02, 1
2013-05-03, 0
....
2013-05-10, 1
It seems to be looping around, so stage.dates is 8 long, and it is repeating the 'TRUE' match every 8th. How do I fix this so that it would flag 1 for the whole month that it is in stage vals?
Or for bonus reputation - how do I set it up so that between different stage.dates, it will populate 1, 2, 3, etc of the most recent stage?
For example:
31st of May to 7th of June would be populated 1, 7th of June to 1st of July would be populated 2, etc, 3rd of November to 30th of May would be populated 8?
Thanks
Edit:
I appreciate the latter is functionally different to the former question. I am ultimately trying to arrive at both (for different reasons), so all answers appreciated
see if this works.
cut and split your data based on the stage.dates consider them as your buckets. you don't need btw stage.vals here.
Cut And Split
data<-split(df, cut(df$spec, stagedf$stage.dates, include.lowest=TRUE))
This should give you list of data.frame splitted as per stage.dates
Now mutate your data with index..this is what your stage.vals were going to be
Mutate
data<-lapply(seq_along(data), function(index) {mutate(data[[index]],
IsMonthInStage=index)})
Now join the data frame in the list using ldply
Join
data=ldply(data)
This will however give out or order dates which you can arrange by
Sort
arrange(data,spec)
Final Output
data[1:10,]
spec IsMonthInStage
1 2015-05-31 1
2 2015-06-01 1
3 2015-06-02 1
4 2015-06-03 1
5 2015-06-04 1
6 2015-06-05 1
7 2015-06-06 1
8 2015-06-07 2
9 2015-06-08 2
10 2015-06-09 2

SAS proc SGPANEL controlling line or marker color based on a data value

I'm making a graph using proc SGPANEL in SAS. It is animal data, so it's paneled by animal. In each animal's graph there are 3 lines representing different blood test values. I would like to know if I can control the color such that if the value goes out of normal limits (identified by a separate flag variable), then the data point would be red, but if the value is within normal limits the data point would be black.
I've done similar plots for just a single blood test and in that case I've presented a reference line for normal limits. The problem with this case is that each blood test has different normal limits so I can't use that strategy.
My existing code (which doesn't incorporate the color linking with the flag variable, it just presents the data) is as follows:
proc sgpanel data=all;
panelby animal / spacing=5 novarname columns=5;
series x=dy y=value/ group=parameter markers;
colaxis label='Day';
rowaxis label='Value';
run;
One way to accomplish this is to overlay your markers with new markers that only exist when the flag is set. Here's an example. Basically I add a new value value_outrange which only has a value when you want a red marker, then I ask for a scatterplot with it and red marker color.
You could also have all of the markers be overlaid in two scatterplots, one with value_outrange and one with value_inrange, which avoids two markers being in these locations; all in all it doesn't look bad though so I think just the one is fine.
data all;
input animal $ dy value flag parameter;
if flag=1 then value_outrange=value;
else call missing(value_outrange);
datalines;
bear 1 5 0 1
bear 2 6 0 1
bear 3 7 0 1
bear 4 8 0 1
bear 5 13 1 1
bear 6 10 0 1
dog 1 8 0 2
dog 2 9 0 2
dog 3 9 0 2
dog 4 11 1 3
dog 5 10 0 3
dog 6 11 0 3
;;;;
run;
proc sgpanel data=all;
panelby animal / spacing=5 novarname columns=5;
series x=dy y=value/ group=parameter markers;
scatter x=dy y=value_outrange/group=parameter markerattrs=(color=red);
colaxis label='Day';
rowaxis label='Value';
run;

easy way to subset data into bins

I have a data frame as seen below with over 1000 rows. I would like to subset the data into bins by 1m intervals (0-1m, 1-2m, etc.). Is there an easy way to do this without finding the minimum depth and using the subset command multiple times to place the data into the appropriate bins?
Temp..ÂșC. Depth..m. Light time date
1 17.31 -14.8 255 09:08 2012-06-19
2 16.83 -21.5 255 09:13 2012-06-19
3 17.15 -20.2 255 09:17 2012-06-19
4 17.31 -18.8 255 09:22 2012-06-19
5 17.78 -13.4 255 09:27 2012-06-19
6 17.78 -5.4 255 09:32 2012-06-19
Assuming that the name of your data frame is df, do the following:
split(df, findInterval(df$Depth..m., floor(min(df$Depth..m.)):0))
You will then get a list where each element is a data frame containing the rows that have Depth..m. within a particular 1 m interval.
Notice however that empty bins will be removed. If you want to keep them you can use cut instead of findInterval. The reason is that findInterval returns an integer vector, making it impossible for split to know what the set of valid bins is. It only knows the values it has seen and discards the rest. cut on the other hand returns a factor, which has all valid bins defined as levels.

gnuplot to plot muliple data set from a file and group all this bars

I want to plot column 3 and 4 with bars for each data set in the file, data set are identified by multiple newline and are referred using index as show in script below. I can draw this data with "linespoint". My graph looks like my graph. But I want to plot data with "boxes" as I want graph like this.
x-axis will have column 3 (1,2,3) and y-axis will have column 4, For each value of x (1,2,3) there should be 2 bars, one from index 0 and second from index 1.
Data file looks like:
2-100
2 100 1 3.10 249
2 100 2 3.41 250
2 100 4 3.70 249
3-100
3 100 1 3.10 252
3 100 2 3.48 252
3 100 4 3.72 254
2-100 an 3-100 will be used as title "first row of block and first column", first 4 lines are read as "index o" in script and second 4 lines as "index 1"
script I used:
plot \
"$1" index 0 using 3:4 with boxes fs solid title columnhead(1),\
"$1" index 1 using 3:4 with boxes fs solid title columnhead(1)
I've reformatted your datafile a little bit (at least, if I understood your original question correctly) -- It now looks like:
2-100
2 100 1 3.10 249
2 100 2 3.41 250
2 100 4 3.70 249
3-100
3 100 1 3.10 252
3 100 2 3.48 252
3 100 4 3.72 254
You should be able to format your datafile like this using sed:
sed -e '/^$/ d' -e '/[0-9]-100/{x;p;p;x}' datafile.dat
# #remove all newlines #reinsert newlines where appropriate
(this assumes that the column heads always start with a number (0-9) and then "-100". You're re might need to be a little more interesting if your datafile is a little more complicated.
This can be plotted using:
set yrange [0:*]
set style fill solid
plot for [i=0:1] 'test2.dat' index i u ($3+i*0.25):4:(0.25) w boxes title columnhead(1)
Of course, you can break up the for loop to assign special properties to each plot or whatever...
If you want special labels, you can do this
set xtics scale 0,0 format ""
set xtics ("This is at 1" 1, "this is at 2" 2, "this is at 3" 3)
before your plot command.
Here's what I get using the above with the png (libgd) terminal:

Organizing data for gnuplot barcharts

I'm trying to organize the data in the file.dat, such that I could then use gnuplot for bar chart creation. Namely, the current data looks like:
Nodes Rows PS
30 0 0.16545666
30 5 0.13318791
30 10 0.13621247
30 993 0.17842487
31 0 0.26545666
31 5 0.23318791
31 10 0.23621247
31 992 0.27842487
I would like to create bar charts that would have Nodes (30 and 31) at the x axis, and PS
values on the y axis. The data in Rows should be accumulated side-to-size around the base which is Nodes. For instance, the chard would have displayed bar showing PS for Nodes 30, Rows 0, to its immediate right side should be the bar showing PS for Nodes 30, but with Rows 5... then, after Nodes 30 is finished, there should be a gap to 31 (or 5 gaps if 35 is considered), with similar accumulation.
How may I achieve this with gnuplot? In case I should reorganize the data, please consider including the sequence of code I should invoke for a particular organization.
Thanks.
If you reorganize your data to
30 0 0.16545666 5 0.13318791 10 0.13621247 993 0.17842487
31 0 0.26545666 5 0.23318791 10 0.23621247 992 0.27842487
which is Nodes row_1 ps_1 row_2 pw_2 ..., I think you can plot your data with
set key off
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
plot for [col=3:7:2] "Data.csv" u col:xticlabels(1)
which gives you this plot:

Resources