Broken axis in Google charts - graph

Is there any way to create a break in my vertical scale on the Google charts api?
I have a couple of dozen data points all about 600-2000 on the y-axis except for one value which is almost 300,000; this makes all the smaller data points nearly unreadable. I need to represent all this data and a logarithmic scale is not an option.

Simple answer: no, it is not possible.
Breaking axes is (generally) frowned upon in the visualization community and therefore isn't supported most of the time in various software.
If you want to be tricky, you can create a function to find outliers, and then move them to a second series in your data. Plot that series on the second axis, and have it with a different color. This says, "This figure is different and does not fit" which brings added attention to it, while still allowing the rest of the data to be seen in the same scale.
Personally I would just cut off the graph at an arbitrary value, set the value of that point to the maximum value, and add a tooltip saying, "Outlier: 300,000" or whatever it is. This will allow people to see the other numbers, but show that this number itself is an outlier without coloring it differently or removing it from the single series.
Either way is doable.

You need use a log scale. It's a vAxis and hAxis attribute. The supported values are:
log: Conventional logarithm scale
mirrorLog: Logarithm scale that allows 0 values
var options = {
vAxis: {
scaleType: 'mirrorLog',
}
};
var data = {};//your data
chart.draw(data, options);

Related

Is there a way to create a geom_path heatmap in ggplot?

For example, this is a heatmap from a website using GPS data:
I have gotten some degree of success with adding a weight parameter to each vertex and calculating the number of events that have vertices near those, but that takes a long time, especially with a large amount of data. It also appears a bit spotty when the distance between vertices is a bit wonky, which causes random splotches of different colors throughout the heatmap. It looks kind of cool, but it makes the data a bit harder to read.
When you zoom out, it looks a bit more continuous due to the paths overlapping more.
In R, the closest I can do to this involves using an alpha channel, but that only gets me a monochromatic heatmap, which is not always desirable, especially when you want to see lesser-traveled paths visibly. In theory I could do two lines to resolve the visibility part (first opaque, second semi-transparent), but I would like to be able to have different hue values.
Ideally I would like this to work with ggplot, but if it cannot, I would accept other methods, provided they are reasonably quick computationally.
Edit: The data format is a data frame with sequential (latitude, longitude) coordinate pairs, along with some associated data that can be used for filter & grouping (such as activity type and event ID).
Here is a sample of the data for the region displayed in the above images (~1.5 MB):
https://www.dropbox.com/s/13p2jtz4760m26d/sample_coordinate_data.csv?dl=0
I would try something like
ggplot() + geom_count(data, aes(longitude, latitude, alpha=..prop..))
but you need to show some data to check how it works.

plot every tick on axis[SAS]

I have dataset include about 100 observations, say all of them are in (x,y) format, all of y is in integer format. I need proc sgplot to make a graphic about them. The range about my y is from 1 to 150. I hope I can force the graphic to show every corresponding y value on the y-axis instead of automatically reducing the ticks to a small number in order to show them clearly. For example, if the first five value of my y is (1,3,4,6,7,....), I hope the y tick shows exactly (1,3,4,6,7,....) instead (1,5,...).
I tried
yaxis value=(1 to 150 by 1) valueshint display=all;
It does not work as maybe I have too many observations. I know the result maybe overwhelming, but I just want to see the result. Thanks.
You don't say if you're using SAS/GRAPH or ODS GRAPHICS (SGPLOT etc.), so I'll answer the latter which is what I know; the answer should be useful for both in concept.
You likely cannot get SAS to plot so much on the axis unless the axis is very large itself. This means you have two options.
Raise the size of the graphic produced a lot in terms of pixels(and then shrink that to a usable size via image physical size, or using an external tool). Not necessarily usable in all cases, but produces a very high resolution plot (which is very big size-wise). This page explains how to do that for ODS graphics (use image_dpi as a high number, and width and height in inches as a normal number), and this page explains for SAS/GRAPH. You may need to make your font small to make it work (if you're adding numbers, which I assume you are), or you may need to make an initially large plot first and then go into paint/photoshop/gimp/etc. and make it smaller.
Use annotate to create the axis marks. This is fairly easy if you know how to use annotate, as you're just writing to the location of the axis (y) and the item (x), and then a bit below that for the text. This will make it very easy to make a total garbage plot, but it will likely work ultimately.
These likely work in both SAS/GRAPH and ODS GRAPHICS, and I can't test either as you don't post any code or simulated data to test with, but I think both approaches have some merit (as does the approach of "don't do this", but you've thought that through).

Tableau map shapes overlapped

I am trying to render some geographic data onto the map in Tableau. However, some data points located at the same point, so the shape images of the data points overlaps together. By clicking on a shape, you could only get the top one.
How can we distinguish the overlapped data points in Tableau? I know that we can manually exclude the top data to see another, but is there any other way, for example, make a drop down list in the right click menu to select the overlapped data points?
Thank you!
There are a couple of ways to deal with this issue.
Some choices you can try are:
Add some transparency to the marks by editing the color shelf properties. That way at least you get a visual indication when there are multiple marks stacked on top of each other. This approach can be considered a poor man's heat map if you have many points in different areas as the denser/darker sections will have more marks. (But that just affects the appearance and doesn't help you select and view details for marks that are covered by others)
Add some small pseudo-random jitter to each coordinate using calculated fields. This will be easier when Tableau supports a rand() function, but in the meantime you can get creative enough using other fields and the math function to add a little jitter. The goal here is to slightly shift locations enough that they don't stack exactly, but not enough to matter in precision. Depends on the scale.
Make a grid style heat map where the color indicates the number of data points in each grid. To do this, you'll need to create calculated fields to bin together nearby latitudes or longitudes. Say to round each latitude to a certain number of decimal places, or use the hex bin functions in Tableau. Those calculated fields will need to have a geographic role and be treated as continuous dimensions.
Define your visualization to display one mark for each unique location, and then use color or size to indicate the number of data points at that location, as opposed to a mark for each individual data point

Irregular scaling of axis in R

I have computed values for several categories for three networks. I'd like to create a bar plot in R to show the differences between these parameters for the networks. So far I plotted this with the barplot R function with the categories on the x-axis, their values on the y-axis and to each category three bars (one for each network).
But now I have one value which is much higher than all the others. Therefore the differences for the rest cannot be seen since they're represented only by a thin line because of that one large bar which almost fills the whole plot.
My idea was now to plot the values on the y-axis on an irregular scale, meaning for example, that one half represents the values from 0 to 300, and the other half from 300 to 3000. Is there any way to do this? Or a good alternative approach to handle this problem? I also thought of plotting the logarithm but unfortunatly I have also negative values.
I would suggest that an irregular scale isn't a good plan - I think it confuses viewers of the chart. Instead, you could use the layout() function to plot three separate barplots in a horizontal layout. Thus, each category could have it's own plot, with it's own scale.
If, however, you still have a single bar at 3000, while everything else is at 300, that won't really help. In that case, you could manually set your y-axis limits with ylim=c(min,max). To keep the bar from stretching off the screen, you can just use simple logic to define anything > 300 as 300, or something similar. Then, put a text point there stating the actual value (using text, maybe with arrow).
With those ideas out there, I would suggest that a graph where one value is 10x the other values might not really be worth presenting, or if it is, the main takeaway from it isn't going to be "how do values 2 and 3 compare to each other", it's going to be "holy moley look how much bigger 1 is than 2 and 3". So, it might not be a big deal if one bar is giant and two are small, as long as you aren't doing all 9 on a single plot (which would screw up other, relevant comparisons). So, if you split them using layout(), then it wouldn't be as big of a deal.

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources