Auto-Scaling Scatter Plot Axis Based on Input Data - math

I am writing a module that creates a scatter plot from a 2 dimensional array of numbers provided by the user (x and y values). It is intended that the graph axis will be scaled to the value required to encompass all the input numbers, while also rounding to an aesthetically pleasing value. For example, if the maximum value entered is 4.56, I would like it to round the maximum axis value to 5. If the maximum value is 850, I'd like it to round the axis to 1000.
This initially seems like a simple task. Simply take the max value and round up. However, what makes it difficult is my module could be dealing with input values as small as 0.00000001 or as large as many billions.
Can anyone suggest a workflow for making this happen? I don't need the code itself, just the process required. The only way I have come up with is an extremely cumbersome iterative approach that still handles unusual values rather poorly.
Any advice would be most appreciated!
Thanks,
Greg

Related

Faceting in ggplot() in R

I am trying to build a plot for a numeric variable rider_count vs a categorical variable weekdays("Mon", "Tue"....), and this plot is required to be a faceting plot with 55 categories,
I tried to use
ggplot(aes(x=wday, y=rider_count_sum)) +
geom_bar(stat = "identity") +
facet_wrap(~counter_edited, scales="free")
However, the output of it is twisted very hard due to the scale does not fit.
Are there any ways to make it scale normally?
The issue you here is your faceting. It produces a grid of 8 x 7 cells. The plot displays on my monitor at about 18cm x 11cm in size. That means each cell is approximately 2.25cm x 1.5cm. Is a cell of that size large enough to provide meaningful information in the form of a plot? I would say "no".
So, you have two options: increase the size of the graphic or reduce the size of the grid.
Is increasing the size of the plot an option? Well, how big would each cell have to be to be meaningful? I don't know: you'd have to experiment: it would depend on the viewing distance and the level of information you'd want to convey. As a thought experiment, let's say you need each cell to be 8cm x 8cm to be interpretable. That means the graphic would need to be at least 64cm x 56cm. That would require an A1/ANSI D sheet of paper. That's heading to paper size. Unless you're talking posters, that's not reasonable. Even as a poster, a reader would have to stand so close that they wouldn't get the message of the whole graphic. On a digital display, you'd again be talking about a wall mounted unit. Standing close enough to look at a cell, pixel resulution would be an issue. Scrolling on a smaller unit would destroy the whole purpose of using a facted display.
Pagination would also destroy the benefit of faceting: you wouldn't be able to see all the data at the same time.
So, whilst increasing the size of your plot might be technical possible, I don't think it would be practically useful.
What about reducing the number of cells? That to me would be the way to go. Simplify your presentation to allow your message to come across. For example, summairse weekdays vs weekends in one graphic, differences between weekdays in another. That reduces one dimension from 7 to either 2 or 5. I don't know how you construct counter_edited, so I don't know what the columns of your facet represent, but could you perhaps reduce the number of categories to 3 or 4? Combined with my weekday/weekend suggestion, would give you grids of between 4x5 and 2x3. Much more managable (though even 4x5 may be too complex).
In short: even if making you current graphic look better is technically possible, I doubt it will ever be practically useful. I suggest adopting a different approach. The question I would ask is deeper than the simple technical one of improving your graphic: what is your underlying purpose? Once you know that, adapt your presentation to best address your objective.

heatmap.2 color legend custom bins

Hi there stackoverflow community!
I am a graduate student inquiring for some consultation on an aethetics R problem I am encountering.
The data I am working with is in the form of a VERY large matrix (49x51).
My problem is that my data ranges from very small to very large, with the bulk of my data falling within the "very large" end of the spectrum, so unless I convert my data to log10, the heatmap is rather boring and almost entirely the same color.
The spectrum of my data is totally within the range I am expecting, but I am hoping to display it in a more aesthetic way.
Proposed solution: I think I need to bin my data in a non-uniform way. If you look at the attached image, you will see that their heatmap looks nice and the color key shows the heat spectrum in a non-fixed bin format. I would like to do something like that, however, I am not sure how to declare cutoffs for each bin. I would ideally like to declare the cutoffs.
For example, bin 1 (0-1), bin 2 (2-50), bin 3 (51-5000). As you can see, my bins would not be fixed in equal increments.
I have been using heatmap.2 for this. Thanks so much in advance!
heatmap with color legend in non-uniform bins:
Hey #Punintended and #S Rivero,
I think I have reached the point that my heatmap will only improve marginally. Both of you contributed deeply to this success, so thanks! First, to condense the matrix values as much as possible, I normalized by column. I was then able to assign gradients. This turned out much better than I had hoped. As you can see, most of my data is clustered (check out the density in the key) at very low values, this is okay though, for I am interested in the higher values. I had to use custom color gradients to account for possible instances of colorblind attendees that might look at my poster. Anyways, if you guys have comments or recommendations, they will be much appreciated :). Again, thanks a bunch!
enter image description here

plot every tick on axis[SAS]

I have dataset include about 100 observations, say all of them are in (x,y) format, all of y is in integer format. I need proc sgplot to make a graphic about them. The range about my y is from 1 to 150. I hope I can force the graphic to show every corresponding y value on the y-axis instead of automatically reducing the ticks to a small number in order to show them clearly. For example, if the first five value of my y is (1,3,4,6,7,....), I hope the y tick shows exactly (1,3,4,6,7,....) instead (1,5,...).
I tried
yaxis value=(1 to 150 by 1) valueshint display=all;
It does not work as maybe I have too many observations. I know the result maybe overwhelming, but I just want to see the result. Thanks.
You don't say if you're using SAS/GRAPH or ODS GRAPHICS (SGPLOT etc.), so I'll answer the latter which is what I know; the answer should be useful for both in concept.
You likely cannot get SAS to plot so much on the axis unless the axis is very large itself. This means you have two options.
Raise the size of the graphic produced a lot in terms of pixels(and then shrink that to a usable size via image physical size, or using an external tool). Not necessarily usable in all cases, but produces a very high resolution plot (which is very big size-wise). This page explains how to do that for ODS graphics (use image_dpi as a high number, and width and height in inches as a normal number), and this page explains for SAS/GRAPH. You may need to make your font small to make it work (if you're adding numbers, which I assume you are), or you may need to make an initially large plot first and then go into paint/photoshop/gimp/etc. and make it smaller.
Use annotate to create the axis marks. This is fairly easy if you know how to use annotate, as you're just writing to the location of the axis (y) and the item (x), and then a bit below that for the text. This will make it very easy to make a total garbage plot, but it will likely work ultimately.
These likely work in both SAS/GRAPH and ODS GRAPHICS, and I can't test either as you don't post any code or simulated data to test with, but I think both approaches have some merit (as does the approach of "don't do this", but you've thought that through).

Intelligent Y Axis Scaling BarPlot R

I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.

Irregular scaling of axis in R

I have computed values for several categories for three networks. I'd like to create a bar plot in R to show the differences between these parameters for the networks. So far I plotted this with the barplot R function with the categories on the x-axis, their values on the y-axis and to each category three bars (one for each network).
But now I have one value which is much higher than all the others. Therefore the differences for the rest cannot be seen since they're represented only by a thin line because of that one large bar which almost fills the whole plot.
My idea was now to plot the values on the y-axis on an irregular scale, meaning for example, that one half represents the values from 0 to 300, and the other half from 300 to 3000. Is there any way to do this? Or a good alternative approach to handle this problem? I also thought of plotting the logarithm but unfortunatly I have also negative values.
I would suggest that an irregular scale isn't a good plan - I think it confuses viewers of the chart. Instead, you could use the layout() function to plot three separate barplots in a horizontal layout. Thus, each category could have it's own plot, with it's own scale.
If, however, you still have a single bar at 3000, while everything else is at 300, that won't really help. In that case, you could manually set your y-axis limits with ylim=c(min,max). To keep the bar from stretching off the screen, you can just use simple logic to define anything > 300 as 300, or something similar. Then, put a text point there stating the actual value (using text, maybe with arrow).
With those ideas out there, I would suggest that a graph where one value is 10x the other values might not really be worth presenting, or if it is, the main takeaway from it isn't going to be "how do values 2 and 3 compare to each other", it's going to be "holy moley look how much bigger 1 is than 2 and 3". So, it might not be a big deal if one bar is giant and two are small, as long as you aren't doing all 9 on a single plot (which would screw up other, relevant comparisons). So, if you split them using layout(), then it wouldn't be as big of a deal.

Resources