I have dataset include about 100 observations, say all of them are in (x,y) format, all of y is in integer format. I need proc sgplot to make a graphic about them. The range about my y is from 1 to 150. I hope I can force the graphic to show every corresponding y value on the y-axis instead of automatically reducing the ticks to a small number in order to show them clearly. For example, if the first five value of my y is (1,3,4,6,7,....), I hope the y tick shows exactly (1,3,4,6,7,....) instead (1,5,...).
I tried
yaxis value=(1 to 150 by 1) valueshint display=all;
It does not work as maybe I have too many observations. I know the result maybe overwhelming, but I just want to see the result. Thanks.
You don't say if you're using SAS/GRAPH or ODS GRAPHICS (SGPLOT etc.), so I'll answer the latter which is what I know; the answer should be useful for both in concept.
You likely cannot get SAS to plot so much on the axis unless the axis is very large itself. This means you have two options.
Raise the size of the graphic produced a lot in terms of pixels(and then shrink that to a usable size via image physical size, or using an external tool). Not necessarily usable in all cases, but produces a very high resolution plot (which is very big size-wise). This page explains how to do that for ODS graphics (use image_dpi as a high number, and width and height in inches as a normal number), and this page explains for SAS/GRAPH. You may need to make your font small to make it work (if you're adding numbers, which I assume you are), or you may need to make an initially large plot first and then go into paint/photoshop/gimp/etc. and make it smaller.
Use annotate to create the axis marks. This is fairly easy if you know how to use annotate, as you're just writing to the location of the axis (y) and the item (x), and then a bit below that for the text. This will make it very easy to make a total garbage plot, but it will likely work ultimately.
These likely work in both SAS/GRAPH and ODS GRAPHICS, and I can't test either as you don't post any code or simulated data to test with, but I think both approaches have some merit (as does the approach of "don't do this", but you've thought that through).
Related
I am trying to build a plot for a numeric variable rider_count vs a categorical variable weekdays("Mon", "Tue"....), and this plot is required to be a faceting plot with 55 categories,
I tried to use
ggplot(aes(x=wday, y=rider_count_sum)) +
geom_bar(stat = "identity") +
facet_wrap(~counter_edited, scales="free")
However, the output of it is twisted very hard due to the scale does not fit.
Are there any ways to make it scale normally?
The issue you here is your faceting. It produces a grid of 8 x 7 cells. The plot displays on my monitor at about 18cm x 11cm in size. That means each cell is approximately 2.25cm x 1.5cm. Is a cell of that size large enough to provide meaningful information in the form of a plot? I would say "no".
So, you have two options: increase the size of the graphic or reduce the size of the grid.
Is increasing the size of the plot an option? Well, how big would each cell have to be to be meaningful? I don't know: you'd have to experiment: it would depend on the viewing distance and the level of information you'd want to convey. As a thought experiment, let's say you need each cell to be 8cm x 8cm to be interpretable. That means the graphic would need to be at least 64cm x 56cm. That would require an A1/ANSI D sheet of paper. That's heading to paper size. Unless you're talking posters, that's not reasonable. Even as a poster, a reader would have to stand so close that they wouldn't get the message of the whole graphic. On a digital display, you'd again be talking about a wall mounted unit. Standing close enough to look at a cell, pixel resulution would be an issue. Scrolling on a smaller unit would destroy the whole purpose of using a facted display.
Pagination would also destroy the benefit of faceting: you wouldn't be able to see all the data at the same time.
So, whilst increasing the size of your plot might be technical possible, I don't think it would be practically useful.
What about reducing the number of cells? That to me would be the way to go. Simplify your presentation to allow your message to come across. For example, summairse weekdays vs weekends in one graphic, differences between weekdays in another. That reduces one dimension from 7 to either 2 or 5. I don't know how you construct counter_edited, so I don't know what the columns of your facet represent, but could you perhaps reduce the number of categories to 3 or 4? Combined with my weekday/weekend suggestion, would give you grids of between 4x5 and 2x3. Much more managable (though even 4x5 may be too complex).
In short: even if making you current graphic look better is technically possible, I doubt it will ever be practically useful. I suggest adopting a different approach. The question I would ask is deeper than the simple technical one of improving your graphic: what is your underlying purpose? Once you know that, adapt your presentation to best address your objective.
I utilized proc gchart in SAS and the following code to generate the graph displayed here.
proc gchart data=combined;
vbar distrct / discrete type=sum sumvar=PERCENT
subgroup= population coutline=gray width=6;
run;
However, as you can see it seems that individual variable bars are stacked extremely close together and is difficult to comprehend. I have 110 variable bars representing densities of ethnic groups
My question is
is there a way to make this graph look less cluttered (I tried reducing the width but it does not seem to work)?
Should I be using a different procedure than the g chart procedure?
2 is easier to answer; proc gchart is mostly replaced by proc sgplot nowadays. It's still maintained, but I don't think much new work is being done in gchart or the other sas/graph procedures.
As for how to make it better; there are some answers, definitely, for how to improve it, but ultimately trying to show 110 bars each split by four ethnicities, means your'e showing 440 data points on one graph. That's going to be a tough lift no matter what.
The first thing I'd consider is switching to horizontal. Horizontal may allow you to have a larger graph, allowing for more spacing, and often times readers have an easier time reading horizontal charts when combining that with stacked bar charts. Scrolling is also easier up-down for most people (a mouse wheel), so if it's okay that they not see it on one screen this may be better. It also allows the bar titles to be presented in the usual left-to-right manner.
Second, consider if your bars can be grouped together. Do you have regions or such that allow you to group bars together, with a bit more spacing between the region? Or more importantly, are there bars that you'd like the readers to be comparing visually to each other? Right now it looks like it's sorted alphabetically, but that is probably not the right way to sort it if there's any sort of relationship between the bars. For example, does the area have sub-areas that are ethnically related? Maybe group those together; or by just geographies (here is the north-east section, here's the east, here's the south-west, etc.) Any time you can group like-things together it makes it easier for the reader to understand what they're looking at and draw sensible conclusions.
You could also sort them by a particular racial makeup - say, in descending order of "color" which seems the dominant population group - which is often an effective way to present data that's this cluttered, as a reader can both see the trend and can find, say, their neighborhood and see where it falls in the order just by looking.
Best overall though might be to group the district up and then display that, so you have many fewer bars. If there's a sensible way to do that, that'll get your idea across more effectively.
There should be an easy way to deal with this, but I don't know. I'm plotting multiple figures with the par(mfrow=c(5,5)) subplot function of R (i.e. 25 figures). After plotting 10 figures say for example I've done something wrong with the 11th plot, now if I want to plot it again using plot function it takes the space for 12th subplot which means the whole subplot structure changes. I know that par(new=TRUE) would let me re-plotting on the top of the 11th figure, but what if the revised plot is so different that overlapping doesn't work? The idea is to erase the 11th figure and then plot it all over again. How about changing the 1st plot after plotting all 25 figures??
It is possible to use the screen family of functions, though I confess to not being an aficionado of them. As you would hope against, it is only to be used exclusive of par(mfrow=c(5.5)) or even layout(...).
Having said that, it is entirely possible to redraw over a screen. For instances:
split.screen(c(5,5))
for (scr in 1:25) {
screen(scr)
par(mar=rep(0,4)+0.1)
plot(0)
}
screen(7)
par(bg='white') # necessary for some display types
erase.screen()
plot(2)
(This is certainly not a beautiful example, but it is functional.)
Notice the explicit setting of the background color (bg) to white; with some displays where transparency is assumed, not doing this will appear to have no affect (that is, erase.screen() will do nothing).
Having said that, there are many modern and near-modern graphing functions/libraries/packages that do things that this package does not support. I have not tested this with image-capturing mechanisms (such as sandwiching things in png(file="...") and dev.off()). Caveat emptor!
I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.
I have computed values for several categories for three networks. I'd like to create a bar plot in R to show the differences between these parameters for the networks. So far I plotted this with the barplot R function with the categories on the x-axis, their values on the y-axis and to each category three bars (one for each network).
But now I have one value which is much higher than all the others. Therefore the differences for the rest cannot be seen since they're represented only by a thin line because of that one large bar which almost fills the whole plot.
My idea was now to plot the values on the y-axis on an irregular scale, meaning for example, that one half represents the values from 0 to 300, and the other half from 300 to 3000. Is there any way to do this? Or a good alternative approach to handle this problem? I also thought of plotting the logarithm but unfortunatly I have also negative values.
I would suggest that an irregular scale isn't a good plan - I think it confuses viewers of the chart. Instead, you could use the layout() function to plot three separate barplots in a horizontal layout. Thus, each category could have it's own plot, with it's own scale.
If, however, you still have a single bar at 3000, while everything else is at 300, that won't really help. In that case, you could manually set your y-axis limits with ylim=c(min,max). To keep the bar from stretching off the screen, you can just use simple logic to define anything > 300 as 300, or something similar. Then, put a text point there stating the actual value (using text, maybe with arrow).
With those ideas out there, I would suggest that a graph where one value is 10x the other values might not really be worth presenting, or if it is, the main takeaway from it isn't going to be "how do values 2 and 3 compare to each other", it's going to be "holy moley look how much bigger 1 is than 2 and 3". So, it might not be a big deal if one bar is giant and two are small, as long as you aren't doing all 9 on a single plot (which would screw up other, relevant comparisons). So, if you split them using layout(), then it wouldn't be as big of a deal.