Align bars in ciplot - plot

I'm working with the ciplot graphing module for Stata and am encountering a problem with the alignment of bars when I use the by() option. Here's a trivial example demonstrating the issue:
webuse citytemp, clear
ciplot heatdd cooldd, by(region) horizontal recast(conn)
So, the graph shows means and confidence intervals for two variables across categories of the region variable. The bars for the different variables do not align horizontally, though. For each region, the point and bar for heatdd is one line above, and the point and bar for cooldd is one line below, the category label. I would like these to be on the same line, but I can't figure out how to achieve it.
I'm open to solutions that do not involve ciplot, but I have found it to be useful for the specific task I'm working on.

This is my program (in Stata terms, downloadable via ssc install ciplot) so I can speak confidently. (On Statalist, it's expected that you explain the exact provenance of user-written programs; that would be good practice here too.)
It's not a bug; it's a feature (supposedly).
The offsets are entirely deliberate, to avoid messes when two or more intervals would just overlap and occlude each other, which is entirely likely when groups or comparable variables have similar values, which in turn is common when you do this. Even in your example, intervals for heating and cooling degree-days for the South would overlap otherwise, so the graph makes the point for me.
I can see that it's not what you want, but
There is no option in ciplot to remove the offset. I can see a case for one, but
My advice is now to use statsby to get a reduced dataset containing the confidence interval information, and then the graphics are typically a couple of command lines and you get to choose what you want. This approach is documented in a paper easily accessible from the Stata Journal.
You are always welcome to clone the program and modify the code using a different program name, with notional mention of the original.

Related

Issues with combining different (continuous and ordinal) plot types into one plot

I am preparing a figure for a paper presenting data for 2 different experiments in one plot. For that reason I don't need a legend for every plot, so I try to combine them with ggdraw from cowplot.
My code
should generate a reproducible example
and gives this output:
It seems like the two figures get the same slot (A) and the legend gets slot (B). Typically, I would probably use facet wrap to plot them together (which should also guarantee that the scaling/legend is consistent across the two plots.), but that will probably not work in this case, as I am trying to add an additional figure type to C and D.
The problem is that this figure type is ordinal so I have used a somewhat “hacky” approach to plot it, giving me this figure looking essentially as I want it to:
I so far have not been able to extract to another element that ggdraw can use.
Ideally the final plot should roughly look like this (of course with different labels):
How would you go about plotting these different types together?
Thank you for taking time to read my question and I hope that you can help me. I now it is quite a mouth full, but I was not sure how I meaningfully could reduce it to smaller chunks.

how to de-clutter graph created using proc gchart?

I utilized proc gchart in SAS and the following code to generate the graph displayed here.
proc gchart data=combined;
vbar distrct / discrete type=sum sumvar=PERCENT
subgroup= population coutline=gray width=6;
run;
However, as you can see it seems that individual variable bars are stacked extremely close together and is difficult to comprehend. I have 110 variable bars representing densities of ethnic groups
My question is
is there a way to make this graph look less cluttered (I tried reducing the width but it does not seem to work)?
Should I be using a different procedure than the g chart procedure?
2 is easier to answer; proc gchart is mostly replaced by proc sgplot nowadays. It's still maintained, but I don't think much new work is being done in gchart or the other sas/graph procedures.
As for how to make it better; there are some answers, definitely, for how to improve it, but ultimately trying to show 110 bars each split by four ethnicities, means your'e showing 440 data points on one graph. That's going to be a tough lift no matter what.
The first thing I'd consider is switching to horizontal. Horizontal may allow you to have a larger graph, allowing for more spacing, and often times readers have an easier time reading horizontal charts when combining that with stacked bar charts. Scrolling is also easier up-down for most people (a mouse wheel), so if it's okay that they not see it on one screen this may be better. It also allows the bar titles to be presented in the usual left-to-right manner.
Second, consider if your bars can be grouped together. Do you have regions or such that allow you to group bars together, with a bit more spacing between the region? Or more importantly, are there bars that you'd like the readers to be comparing visually to each other? Right now it looks like it's sorted alphabetically, but that is probably not the right way to sort it if there's any sort of relationship between the bars. For example, does the area have sub-areas that are ethnically related? Maybe group those together; or by just geographies (here is the north-east section, here's the east, here's the south-west, etc.) Any time you can group like-things together it makes it easier for the reader to understand what they're looking at and draw sensible conclusions.
You could also sort them by a particular racial makeup - say, in descending order of "color" which seems the dominant population group - which is often an effective way to present data that's this cluttered, as a reader can both see the trend and can find, say, their neighborhood and see where it falls in the order just by looking.
Best overall though might be to group the district up and then display that, so you have many fewer bars. If there's a sensible way to do that, that'll get your idea across more effectively.

visualization - size of circle proportionate to the value of the item

I'm getting familiar with Graphviz and wonder if it's doable to generate a diagram/graph like the one below (not sure what you call it). If not, does anyone know what's a good open source framework that does it? (pref, C++, Java or Python).
According to Many Eyes‌​, this is a bubble chart. They say:
It is especially useful for data sets with dozens to hundreds of values, or with values that differ by several orders of magnitude.
...
To see the exact value of a circle on the chart, move your mouse over it. If you are charting more than one dimension, use the menu to choose which dimension to show. If your data set has multiple numeric columns, you can choose which column to base the circle sizes on by using the menu at the bottom of the chart.
Thus, any presentation with a lot of bubbles in it (especially with many small bubbles) would have to be dynamic to respond to the mouse.
My usual practice with bubble charts is to show three or four variables (x, y and another variable through the size of the bubble, and perhaps another variable with the color or shading of the bubble). With animation, you can show development over time too - see GapMinder. FlowingData provides a good example with a tutorial on how to make static bubble charts in R.
In the example shown in the question, though, the bubbles appear to be located somewhat to have similar companies close together. Even then, the exact design criteria are unclear to me. For example, I'd have expected Volkswagen to be closer to General Motors than Pfizer is (if some measure of company similarity is used to place the bubbles), but that isn't so in this diagram.
You could use Graphviz to produce a static version of a bubble chart, but there would be quite a lot of work involved to do so. You would have to preprocess the data to calculate a similarity matrix, obtain edge weights from that matrix, assign colours and sizes to each bubble and then have the preprocessing script write the Graphviz file with all edges hidden and run the Graphviz file through neato to draw it.

scaling Venn diagram in R

I am trying to plot the Venn Diagram of intersection of 3 sets with the following function:
library(VennDiagram)
draw.triple.venn(10,5,4,2,3,1,1,ind=TRUE,scaled=TRUE).
In the Quartz window I receive 3 identical circles (all of the same size). Where did the scaling go? After several hours of trying, I am wondering if it is a bug or if maybe the previous settings of my plotting area are not allowing it ( i closed and reopened the Quartz window several times). The output value is:
(polygon[GRID.polygon.1498], polygon[GRID.polygon.1499], polygon[GRID.polygon.1500], polygon[GRID.polygon.1501], polygon[GRID.polygon.1502], polygon[GRID.polygon.1503], text[GRID.text.1504], text[GRID.text.1505], text[GRID.text.1506], text[GRID.text.1507], text[GRID.text.1508], text[GRID.text.1509], text[GRID.text.1510], text[GRID.text.1511], text[GRID.text.1512])
Any help or tip would be appreciated. All the examples I see online are depicted already scaled.
According to the manual, scaling only happens for some configurations. One example would be
draw.triple.venn(1,2,3,0,0,0,0)
On the other hand, looking at the source code of that function, there appears to be no relevant use of that parameter at all. And indeed, passing scale=FALSE to the above command still results in scaled circles.
So it looks like with the current source code, you have no control over scaling, one way or the other.
The scale parameter is being ignored in many cases.
Here is another example that is also not plotted to scale
venn.plot <- draw.triple.venn(1883,598,2151,218,221,611,95, c("AL", "RL", "R"),scale=TRUE)
I used this .jar instead:
http://www.cs.kent.ac.uk/people/staff/pjr/EulerVennCircles/EulerVennApplet.html

How to avoid overplotting (for points) using base-graph?

I am in my way of finishing the graphs for a paper and decided (after a discussion on stats.stackoverflow), in order to transmit as much information as possible, to create the following graph that present both in the foreground the means and in the background the raw data:
However, one problem remains and that is overplotting. For example, the marked point looks like it reflects one data point, but in fact 5 data points exists with the same value at that place.
Therefore, I would like to know if there is a way to deal with overplotting in base graph using points as the function.
It would be ideal if e.g., the respective points get darker, or thicker or,...
Manually doing it is not an option (too many graphs and points like this). Furthermore, ggplot2 is also not what I want to learn to deal with this single problem (one reason is that I tend to like dual-axes what is not supprted in ggplot2).
Update: I wrote a function which automatically creates the above graphs and avoids overplotting by adding vertical or horizontal jitter (or both): check it out!
This function is now available as raw.means.plot and raw.means.plot2 in the plotrix package (on CRAN).
Standard approach is to add some noise to the data before plotting. R has a function jitter() which does exactly that. You could use it to add the necessary noise to the coordinates in your plot. eg:
X <- rep(1:10,10)
Z <- as.factor(sample(letters[1:10],100,replace=T))
plot(jitter(as.numeric(Z),factor=0.2),X,xaxt="n")
axis(1,at=1:10,labels=levels(Z))
Besides jittering, another good approach is alpha blending which you can obtain (on the graphics devices supporing it) as the fourth color parameter. I provided an example for 'overplotting' of two histograms in this SO question.
One additional idea for the general problem of showing the number of points is using a rug plot (rug function), this places small tick marks along the margin that can show how many points contribute (still use jittering or alpha blending for ties). This allows the actual points to show their true rather than jittered values, but the rug can then indicate which parts of the plot have more values.
For the example plot direct jittering or alpha blending is probably best, but in some other cases the rug plot can be useful.
You may also use sunflowerplot, while it would be hard to implement it here. I would use alpha-blending, as Dirk suggested.

Resources