Bokeh change properties of individual layers in a nested categorical plot - bokeh

I have a nested categorical plot with vbars. In my plot, the smallest level label overlap one another depending on the data. I would like to turn that level to visible = False to remove the labels in a python callback. Any ideas how to access the p.xrange or p.xaxis objects to accomplish this?
This is a good example of a nested categorical where I would for example want the year to be non-visible.
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#nested-categories

It is not possible to remove labels (except by writing a custom extension ticker) but as of Bokeh 0.12.16, it is possible to rotate all levels of categorical ticks, e.g.
p.xaxis.major_label_orientation = "vertical"
p.xaxis.subgroup_label_orientation = "normal"
p.xaxis.group_label_orientation = 0.8

Related

ggplot draw multiple plots by levels of a variable

I have a sample dataset
d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)
And I want to draw two separate histograms based on group variable.
I tried this method suggested by #jenesaisquoi in a separate post here
Generating Multiple Plots in ggplot by Factor
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)
It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.
Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.
d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)
Finally I tried factoring with color
ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)
But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.
In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.
By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)
You actually have the separation of charts by variable correct! Especially with ggplot, you sometimes need to consider the scales of the graph separately from the shape. Facet_wrap applies a new layer to your data, regardless of scale. It will behave the same, no matter what your axes are. You could also try adding scale_y_log10() as a layer, and you'll notice that the overall shape and style of your graph is the same, you've just changed the axes.
What you actually need is a fix to your scales. Understandable - frequency plots can be confusing. ..count../sum(..count..)) treats each bin as an independent unit, regardless of its value. See a good explanation of this here: Show % instead of counts in charts of categorical variables
What you want is ..density.., which is basically the count divided by the total count. The difference is subtle in principle, but the important bit is that the value on the x-axis matters. For an extreme case of this, see here: Normalizing y-axis in histograms in R ggplot to proportion, where tiny x-axis values produced huge densities.
Your original code will still work, just substituting the aesthetics I described above.
ggplot(data=d)+geom_histogram(aes(x=n,y=..density..,)binwidth = 1)+facet_wrap(~group)
If you're still confused about density, so are lots of people. Hadley Wickham wrote a long piece about it, you can find that here: http://vita.had.co.nz/papers/density-estimation.pdf

Initializing and customizing an autoplot r-project object

My head is getting sore from me banging it so much.
I have a time-series that I've converted into an xts object w/ 7 variables. Now I'm trying to plot 4 of them, all price indices, on the same graph. I used autoplot (from the ggfortify package) to initialize the graph, and this is where the trouble begins.
Autoplot doesn't seem to work unless I give it at least one variable to plot. That's fine, but the two customizations I want for the variable -- its color and line type -- seem to have no effect.
But once I create the plot this way, I have little trouble adding the other 3 variables by adding geom_lines. Here's sort of what the code looks like:
p <- autoplot(foo.xts,xlab="Year",
ylab="Price Index",
columns="Variable1",linetype=4) # the linetype accomplishes nothing
p <- p + geom_line(aes(y="Variable2", color="green", linetype="solid"
# etc. for the other 2 variables
p # The 3 added variables do get the selected colors & line types.
But how can I customize the line for the first variable?
Then there's another problem in that I can't get a legend to appear. Here's how I'm trying to do that:
p <- p + scale_color_discrete(
name="Price Indices",
breaks=c("Variable1", "Variable2", "Variable3", "Variable4"),
labels=c("Index 1", "Index 2", "Index 3", "Index 4"))
This seems to accomplish nothing.
One thing I'd add is that in my various experiments trying to get the legend to work, I've sometimes gotten two sets of keys: one for colors and one for line types. This is obviously not what I'm after.
If someone could help me with this, I'd be forever in your debt!
I spent yesterday away from the computer, and when I returned in the evening fixed the problems. Here's how:
Stopped using autoplot. It's a classic case of hand-holding that throws you over the cliff. In other words, it automatically formats the plot in ways that are difficult (impossible?) to customize. Instead, ggplot makes the initial plot.
Since I'm making a series of plots, moved all the shared features to a separate, preamble section. This section creates a base plot, sets the x-axis variable (the date of the observation), labels the x-axis, and formats its tick marks. It also sets up standardized colors, line styles, and shapes to be used by all the "production" plots.
To set up the standardized elements, it uses scale_color_manual, etc. Each one has to be identical in all respects except those that are unique to its specific aesthetic attribute. E.g., scale_color_manual uses values like "red" whereas scale_linetype_manual uses values like "solid." Each manual setting includes the following elements: legend.title*, values, labels, and guide = guide_legend()*. (Items marked with * must be identical, otherwise you'll get different legends for each one.) For each plot, the actual legend title is first stored in a variable, legend.title, and then used in all the manual scale setting. This way the manual settings can be moved to the common section, but each plot has is own unique title for its legend.
3A. Actually, I was wrong about this. I was thinking LaTeX, where most things are evaluated where they appear at execution time. So a scale_color_manual statement at the start could change later on just by changing the value of legend.title. But in R, things are evaluated sequentially, and changing legend.title after the scale_color_manual statement is executed will have no effect. I worked around this by defining several variables in the preamble (e.g., one with the colors I'm using) and then using these variables in the various source_x_manual statements. This way, the only thing that change is the legend title.
Then each production plot starts by copying the base plot, labeling the y-axis, and then adds the geometric objects that it needs.
This approach has several advantages. 1) It modularizes the plotting so that problems are easier to isolate and solve, and most solved problems in the preamble section are solved for all plots. 2) It standardizes the plots, ensuring that their common features are formatted identically. 3) It reduces each production plot to a few statements; since this is the unique part for each plot, creating a new style of plot becomes relatively easy. 4) The value added by autoplot becomes minimal because this approach, separating shared elements in a preamble, compensates by isolating reusable code and the preamble, once debugged, allows much more fine-grain customization.
If you have any questions, please feel free to ask.

In Stata, how do I modify axes of dot chart?

I'm trying to create a dot chart in Stata, splitting it into two categories
Running a chunk of code:
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
Creates such output:
So far so good but I'd like to remove labels of Y axis from the right graph to give the data some more space.
The only way I've been able to do that was to manually edit graph and hide the axis label object:
Is there a way to do it programmatically? I do know I could use one more over() but in some graphs of mine that is already taken.
I believe the solution is buried in help bystyle and help by_option. However, I can't get it to work with your example (I'm on Stata 12). But the description is clear. For example:
A bystyle determines the overall look of the combined graphs,
including
whether the individual graphs have their own axes and labels or if instead the axes and labels are shared across graphs arrayed in the
same row and/or in the same column;
...
There are options that let you control each of the above attributes --
see [G-3] by_option --
And also
iyaxes and ixaxes (and noiyaxes and noixaxes) specify whether the y axes and x axes are
to be displayed with each graph. The default
with most styles and
schemes is to place y axes on the leftmost graph of each row and to place x axes on
the bottommost graph of each column. The y and
x axes include the
default ticks and labels but exclude the axes titles.
If for some reason that doesn't work out, something like
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
gr_edit .plotregion1.grpaxis[2].draw_view.setstyle, style(no)
does (but I don't really like the approach). You can mess with at least the axis number [#] to do a bit of customization. I guess recording changes in the graphical editor and then recycling the corresponding code, may be one way out of difficult situations.

How to make legend in R?

I have data that falls into one of many categories. Here is a very simplified version of what I'm doing to my data. I want to make a scatterplot where different colors represent different categories. However, there are many different categories so instead of manually choosing the colors, I'm letting R choose for me by setting col=data$category within the plot function. However, I can't figure out how to generate a legend -- every parameter that I put inside the legend function never actually generates anything. Can someone help?
data <- data.frame(rnorm(50),sample(1:10,50,replace=TRUE))
colnames(data) <- c("data", "category")
plot(data$data, col=data$category)
legend("topright", data$category)
Try something like this,
legend("topright", legend=unique(data$category), pch=1, col=unique(data$category))

Toggling amongst plot spaces

Scenario: Five Graphs for Five Periods {3M, 6M, 1Y, 2Y & 3Y}, each with their own (1-2) scatter plots; sharing the same y-range (values).
Each period has different x-ranges and labeling policies.
For example, one could have either a fix or location policy; another none.
The X-Range appears to be immutable/plot-space. So I'm thinking of creating parallel plot spaces with their particular xRanges & labeling policies.
I studied the relationship of a plot space with the x.axis(s) & plot(s):
Graph <=== {NSMutableArray *plotSpaces}
x.axis/plot-space.
plot/plot-space
So I believe I can:
1) Create a plotspace.
2) Assign the plotspace to a particular plot, x-axis & xRange.
3) add or remove the plot to/from the graph.
4) Redraw the graph.
So when the user selects a period/plotspace, All I need to do is: replace any existing plots with the period plot(s) which will cause the graph to plot the plots & display the respective x-axis (Y-axis is common)?
[myGraph removePlot:(CPTPlot *)oldPlot];
[myGraph addPlot:(CPTPlot *)plot toPlotSpace:(CPTPlotSpace *)space];
...I'm a little lost here.
?
Axes are also assigned to a plot space. You would need to swap out the axes, too. You'll take a relatively large performance hit by adding and removing plots and axes all the time.
As you observed, the plot ranges are immutable. That just means that you can't change an existing range, not that you can't set a new one. Either create a new CPTPlotRange object or make a mutable copy of an existing one.
Whenever you want to change the plot scale, you need to do the following things. These can all change in place, without removing and replacing major pieces of the graph.
Change the plot ranges as described above.
Update the labeling policy and related properties of the axis.
Call -reloadData on the plot to load the new data.

Resources