Faceting in ggplot() in R - r

I am trying to build a plot for a numeric variable rider_count vs a categorical variable weekdays("Mon", "Tue"....), and this plot is required to be a faceting plot with 55 categories,
I tried to use
ggplot(aes(x=wday, y=rider_count_sum)) +
geom_bar(stat = "identity") +
facet_wrap(~counter_edited, scales="free")
However, the output of it is twisted very hard due to the scale does not fit.
Are there any ways to make it scale normally?

The issue you here is your faceting. It produces a grid of 8 x 7 cells. The plot displays on my monitor at about 18cm x 11cm in size. That means each cell is approximately 2.25cm x 1.5cm. Is a cell of that size large enough to provide meaningful information in the form of a plot? I would say "no".
So, you have two options: increase the size of the graphic or reduce the size of the grid.
Is increasing the size of the plot an option? Well, how big would each cell have to be to be meaningful? I don't know: you'd have to experiment: it would depend on the viewing distance and the level of information you'd want to convey. As a thought experiment, let's say you need each cell to be 8cm x 8cm to be interpretable. That means the graphic would need to be at least 64cm x 56cm. That would require an A1/ANSI D sheet of paper. That's heading to paper size. Unless you're talking posters, that's not reasonable. Even as a poster, a reader would have to stand so close that they wouldn't get the message of the whole graphic. On a digital display, you'd again be talking about a wall mounted unit. Standing close enough to look at a cell, pixel resulution would be an issue. Scrolling on a smaller unit would destroy the whole purpose of using a facted display.
Pagination would also destroy the benefit of faceting: you wouldn't be able to see all the data at the same time.
So, whilst increasing the size of your plot might be technical possible, I don't think it would be practically useful.
What about reducing the number of cells? That to me would be the way to go. Simplify your presentation to allow your message to come across. For example, summairse weekdays vs weekends in one graphic, differences between weekdays in another. That reduces one dimension from 7 to either 2 or 5. I don't know how you construct counter_edited, so I don't know what the columns of your facet represent, but could you perhaps reduce the number of categories to 3 or 4? Combined with my weekday/weekend suggestion, would give you grids of between 4x5 and 2x3. Much more managable (though even 4x5 may be too complex).
In short: even if making you current graphic look better is technically possible, I doubt it will ever be practically useful. I suggest adopting a different approach. The question I would ask is deeper than the simple technical one of improving your graphic: what is your underlying purpose? Once you know that, adapt your presentation to best address your objective.

Related

How to decided between font size, margins and png() parameters to achieve good definition and consistent visualisation?

This is a question that has me banging my head against a wall for a while now. Much of R coding produces consistent results when used for analysis, in a sense that sometimes there are more than one ways to achieve something but your output would be something shareable and consistent. Let's say a dataframe or a datatable and so on and so forth.
However, I'm finding myself struggling to understand how can I achieve a mainstreamed process when generating plots. Font size, margin size, height, width and resolution. All those influence each other.
You change your resolution and suddenly your font size changes drastically when saving with png(). You go back and you change the dimensions and there you are with extremely small font size or with a pixeled chart looking at you.
So, because I still trust in the ggplot and png() process and believe that it must be me that messes up or doesn't do the correct steps in his workflow the question is:
What is the sweet point between all those factors that makes plotting with R easy, consistent and high-quality?
I understand that some of these factors cannot be standardised since it depends on the amount of information and how complex a chart is. But how do others ensure consistent font size against changes in resolution, height, width and plot margins?
I've came across some useful resources such as:
[https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html][1]
[https://support.rstudio.com/hc/en-us/articles/200488548-Problem-with-Plots-or-Graphics-Device][1]
But none really speaks to how you mainstream a visualization process in R. Still great tips though.
Any advice or ideas are honestly appreciated. Thank you.

heatmap.2 color legend custom bins

Hi there stackoverflow community!
I am a graduate student inquiring for some consultation on an aethetics R problem I am encountering.
The data I am working with is in the form of a VERY large matrix (49x51).
My problem is that my data ranges from very small to very large, with the bulk of my data falling within the "very large" end of the spectrum, so unless I convert my data to log10, the heatmap is rather boring and almost entirely the same color.
The spectrum of my data is totally within the range I am expecting, but I am hoping to display it in a more aesthetic way.
Proposed solution: I think I need to bin my data in a non-uniform way. If you look at the attached image, you will see that their heatmap looks nice and the color key shows the heat spectrum in a non-fixed bin format. I would like to do something like that, however, I am not sure how to declare cutoffs for each bin. I would ideally like to declare the cutoffs.
For example, bin 1 (0-1), bin 2 (2-50), bin 3 (51-5000). As you can see, my bins would not be fixed in equal increments.
I have been using heatmap.2 for this. Thanks so much in advance!
heatmap with color legend in non-uniform bins:
Hey #Punintended and #S Rivero,
I think I have reached the point that my heatmap will only improve marginally. Both of you contributed deeply to this success, so thanks! First, to condense the matrix values as much as possible, I normalized by column. I was then able to assign gradients. This turned out much better than I had hoped. As you can see, most of my data is clustered (check out the density in the key) at very low values, this is okay though, for I am interested in the higher values. I had to use custom color gradients to account for possible instances of colorblind attendees that might look at my poster. Anyways, if you guys have comments or recommendations, they will be much appreciated :). Again, thanks a bunch!
enter image description here

how to de-clutter graph created using proc gchart?

I utilized proc gchart in SAS and the following code to generate the graph displayed here.
proc gchart data=combined;
vbar distrct / discrete type=sum sumvar=PERCENT
subgroup= population coutline=gray width=6;
run;
However, as you can see it seems that individual variable bars are stacked extremely close together and is difficult to comprehend. I have 110 variable bars representing densities of ethnic groups
My question is
is there a way to make this graph look less cluttered (I tried reducing the width but it does not seem to work)?
Should I be using a different procedure than the g chart procedure?
2 is easier to answer; proc gchart is mostly replaced by proc sgplot nowadays. It's still maintained, but I don't think much new work is being done in gchart or the other sas/graph procedures.
As for how to make it better; there are some answers, definitely, for how to improve it, but ultimately trying to show 110 bars each split by four ethnicities, means your'e showing 440 data points on one graph. That's going to be a tough lift no matter what.
The first thing I'd consider is switching to horizontal. Horizontal may allow you to have a larger graph, allowing for more spacing, and often times readers have an easier time reading horizontal charts when combining that with stacked bar charts. Scrolling is also easier up-down for most people (a mouse wheel), so if it's okay that they not see it on one screen this may be better. It also allows the bar titles to be presented in the usual left-to-right manner.
Second, consider if your bars can be grouped together. Do you have regions or such that allow you to group bars together, with a bit more spacing between the region? Or more importantly, are there bars that you'd like the readers to be comparing visually to each other? Right now it looks like it's sorted alphabetically, but that is probably not the right way to sort it if there's any sort of relationship between the bars. For example, does the area have sub-areas that are ethnically related? Maybe group those together; or by just geographies (here is the north-east section, here's the east, here's the south-west, etc.) Any time you can group like-things together it makes it easier for the reader to understand what they're looking at and draw sensible conclusions.
You could also sort them by a particular racial makeup - say, in descending order of "color" which seems the dominant population group - which is often an effective way to present data that's this cluttered, as a reader can both see the trend and can find, say, their neighborhood and see where it falls in the order just by looking.
Best overall though might be to group the district up and then display that, so you have many fewer bars. If there's a sensible way to do that, that'll get your idea across more effectively.

plot every tick on axis[SAS]

I have dataset include about 100 observations, say all of them are in (x,y) format, all of y is in integer format. I need proc sgplot to make a graphic about them. The range about my y is from 1 to 150. I hope I can force the graphic to show every corresponding y value on the y-axis instead of automatically reducing the ticks to a small number in order to show them clearly. For example, if the first five value of my y is (1,3,4,6,7,....), I hope the y tick shows exactly (1,3,4,6,7,....) instead (1,5,...).
I tried
yaxis value=(1 to 150 by 1) valueshint display=all;
It does not work as maybe I have too many observations. I know the result maybe overwhelming, but I just want to see the result. Thanks.
You don't say if you're using SAS/GRAPH or ODS GRAPHICS (SGPLOT etc.), so I'll answer the latter which is what I know; the answer should be useful for both in concept.
You likely cannot get SAS to plot so much on the axis unless the axis is very large itself. This means you have two options.
Raise the size of the graphic produced a lot in terms of pixels(and then shrink that to a usable size via image physical size, or using an external tool). Not necessarily usable in all cases, but produces a very high resolution plot (which is very big size-wise). This page explains how to do that for ODS graphics (use image_dpi as a high number, and width and height in inches as a normal number), and this page explains for SAS/GRAPH. You may need to make your font small to make it work (if you're adding numbers, which I assume you are), or you may need to make an initially large plot first and then go into paint/photoshop/gimp/etc. and make it smaller.
Use annotate to create the axis marks. This is fairly easy if you know how to use annotate, as you're just writing to the location of the axis (y) and the item (x), and then a bit below that for the text. This will make it very easy to make a total garbage plot, but it will likely work ultimately.
These likely work in both SAS/GRAPH and ODS GRAPHICS, and I can't test either as you don't post any code or simulated data to test with, but I think both approaches have some merit (as does the approach of "don't do this", but you've thought that through).

Irregular scaling of axis in R

I have computed values for several categories for three networks. I'd like to create a bar plot in R to show the differences between these parameters for the networks. So far I plotted this with the barplot R function with the categories on the x-axis, their values on the y-axis and to each category three bars (one for each network).
But now I have one value which is much higher than all the others. Therefore the differences for the rest cannot be seen since they're represented only by a thin line because of that one large bar which almost fills the whole plot.
My idea was now to plot the values on the y-axis on an irregular scale, meaning for example, that one half represents the values from 0 to 300, and the other half from 300 to 3000. Is there any way to do this? Or a good alternative approach to handle this problem? I also thought of plotting the logarithm but unfortunatly I have also negative values.
I would suggest that an irregular scale isn't a good plan - I think it confuses viewers of the chart. Instead, you could use the layout() function to plot three separate barplots in a horizontal layout. Thus, each category could have it's own plot, with it's own scale.
If, however, you still have a single bar at 3000, while everything else is at 300, that won't really help. In that case, you could manually set your y-axis limits with ylim=c(min,max). To keep the bar from stretching off the screen, you can just use simple logic to define anything > 300 as 300, or something similar. Then, put a text point there stating the actual value (using text, maybe with arrow).
With those ideas out there, I would suggest that a graph where one value is 10x the other values might not really be worth presenting, or if it is, the main takeaway from it isn't going to be "how do values 2 and 3 compare to each other", it's going to be "holy moley look how much bigger 1 is than 2 and 3". So, it might not be a big deal if one bar is giant and two are small, as long as you aren't doing all 9 on a single plot (which would screw up other, relevant comparisons). So, if you split them using layout(), then it wouldn't be as big of a deal.

Resources