Automated labeling of logarithmic plots - graph

I would like to automate the graph axis values for a series of plots in Stata 13.
In particular, I would like to show axis labels like 10^-1, 0, 10^1, 10^2 etc. by feeding the axis options macros.
The solution in the following blog post gives a good starting point:
Labeling logarithmic axes in Stata graphs
I could also also produce nicer labels by using "10{sup:`x'}".
However, I cannot find the solution to the following additional elements:
the range of the axis labels would run from 10^-10 up to 10^10. Moreover, my baseline is ln, so log values are 2.3, 4.6, etc. In particular, the line below which only takes integers as input:
label define expon `vallabel', replace
I would like to force the range of the axis values by graph (e.g. a particular axis runs from 10^-2 to 10^5). I understand that range() only extends axes, but does not allow to trim them.
Any ideas on either or both of the above?
This is a very straightforward output in R or Python, even standard without many additional arguments, but unfortunately not so in Stata.

Q1. You should check out niceloglabels from SSC as announced in this thread. niceloglabels together with other tricks and devices in this territory will be discussed in a column expected to appear in Stata Journal 18(1) in the first quarter of 2018.
Value labels are limited to association with integers but that does not bite here. All you need focus on is the text which is to be displayed as axis labels at designated points on either axis; such points can be specified using any numeric value within the axis range.
Your specific problem appears to be that one of your variables is a natural logarithm but you wish to label axes in terms of powers of 10. Conversion to a variable containing logarithms to base 10 is surely easy, but another program mylabels (SSC) can help here. This is a self-contained example.
* ssc inst mylabels
sysuse auto, clear
set scheme s1color
gen lnprice = ln(price)
mylabels 4000 8000 16000, myscale(ln(#)) local(yla)
gen lnweight = ln(weight)
mylabels 2 3 4, myscale(ln(1000*#)) suffix(" x 10{sup:3}") local(xla)
scatter lnprice lnweight, yla(`yla') xla(`xla') ms(Oh) ytitle(Price (USD)) xtitle(Weight (lb))
I have used different styles for the two axes just to show what is possible. On other grounds it is usually preferable to be consistent about style.
Broadly speaking, the use of niceloglabels is often simpler as it boils down to specifying xscale(log) or yscale(log) with the labels you want to see. niceloglabels also looks at a variable range or a specified minimum and maximum to suggest what labels might be used.
Q2. range() is an option with twoway function that allows extension of the x axis range. For most graph commands, the pertinent options are xscale() or yscale(), which again extend axis ranges if suitably specified. None of these options will omit data as a side-effect or reduce axis ranges as compared with what axis label options imply. If you want to omit data you need to use if or in or clone your variables and replace values you don't want to show with missing values in the clone.
FWIW, superscripts look much better to me than ^ for powers.

I have finally found a clunky but working solution.
The trick is to first generate 2 locals: one to evaluate the axis value, another to denote the axis label. Then combine both into a new local.
Somehow I need to do this separately for positive and negative values.
I'm sure this can be improved...
// define macros
forvalues i = 0(1)10 {
local a`i' = `i'*2.3
local b`i' `" "10{sup:`i'}" "'
local l`i' `a`i'' `"`b`i''"'
}
forvalues i = 1(1)10 {
local am`i' = `i'*-2.3
local bm`i' `" "10{sup:-`i'}" "'
local lm`i' `am`i'' `"`bm`i''"'
}
// graph
hist lnx, ///
xl(`lm4' `lm3' `lm2' `lm1' `l0' `l1' `l2' `l3' `l4')

Related

Initializing and customizing an autoplot r-project object

My head is getting sore from me banging it so much.
I have a time-series that I've converted into an xts object w/ 7 variables. Now I'm trying to plot 4 of them, all price indices, on the same graph. I used autoplot (from the ggfortify package) to initialize the graph, and this is where the trouble begins.
Autoplot doesn't seem to work unless I give it at least one variable to plot. That's fine, but the two customizations I want for the variable -- its color and line type -- seem to have no effect.
But once I create the plot this way, I have little trouble adding the other 3 variables by adding geom_lines. Here's sort of what the code looks like:
p <- autoplot(foo.xts,xlab="Year",
ylab="Price Index",
columns="Variable1",linetype=4) # the linetype accomplishes nothing
p <- p + geom_line(aes(y="Variable2", color="green", linetype="solid"
# etc. for the other 2 variables
p # The 3 added variables do get the selected colors & line types.
But how can I customize the line for the first variable?
Then there's another problem in that I can't get a legend to appear. Here's how I'm trying to do that:
p <- p + scale_color_discrete(
name="Price Indices",
breaks=c("Variable1", "Variable2", "Variable3", "Variable4"),
labels=c("Index 1", "Index 2", "Index 3", "Index 4"))
This seems to accomplish nothing.
One thing I'd add is that in my various experiments trying to get the legend to work, I've sometimes gotten two sets of keys: one for colors and one for line types. This is obviously not what I'm after.
If someone could help me with this, I'd be forever in your debt!
I spent yesterday away from the computer, and when I returned in the evening fixed the problems. Here's how:
Stopped using autoplot. It's a classic case of hand-holding that throws you over the cliff. In other words, it automatically formats the plot in ways that are difficult (impossible?) to customize. Instead, ggplot makes the initial plot.
Since I'm making a series of plots, moved all the shared features to a separate, preamble section. This section creates a base plot, sets the x-axis variable (the date of the observation), labels the x-axis, and formats its tick marks. It also sets up standardized colors, line styles, and shapes to be used by all the "production" plots.
To set up the standardized elements, it uses scale_color_manual, etc. Each one has to be identical in all respects except those that are unique to its specific aesthetic attribute. E.g., scale_color_manual uses values like "red" whereas scale_linetype_manual uses values like "solid." Each manual setting includes the following elements: legend.title*, values, labels, and guide = guide_legend()*. (Items marked with * must be identical, otherwise you'll get different legends for each one.) For each plot, the actual legend title is first stored in a variable, legend.title, and then used in all the manual scale setting. This way the manual settings can be moved to the common section, but each plot has is own unique title for its legend.
3A. Actually, I was wrong about this. I was thinking LaTeX, where most things are evaluated where they appear at execution time. So a scale_color_manual statement at the start could change later on just by changing the value of legend.title. But in R, things are evaluated sequentially, and changing legend.title after the scale_color_manual statement is executed will have no effect. I worked around this by defining several variables in the preamble (e.g., one with the colors I'm using) and then using these variables in the various source_x_manual statements. This way, the only thing that change is the legend title.
Then each production plot starts by copying the base plot, labeling the y-axis, and then adds the geometric objects that it needs.
This approach has several advantages. 1) It modularizes the plotting so that problems are easier to isolate and solve, and most solved problems in the preamble section are solved for all plots. 2) It standardizes the plots, ensuring that their common features are formatted identically. 3) It reduces each production plot to a few statements; since this is the unique part for each plot, creating a new style of plot becomes relatively easy. 4) The value added by autoplot becomes minimal because this approach, separating shared elements in a preamble, compensates by isolating reusable code and the preamble, once debugged, allows much more fine-grain customization.
If you have any questions, please feel free to ask.

In Stata, how do I modify axes of dot chart?

I'm trying to create a dot chart in Stata, splitting it into two categories
Running a chunk of code:
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
Creates such output:
So far so good but I'd like to remove labels of Y axis from the right graph to give the data some more space.
The only way I've been able to do that was to manually edit graph and hide the axis label object:
Is there a way to do it programmatically? I do know I could use one more over() but in some graphs of mine that is already taken.
I believe the solution is buried in help bystyle and help by_option. However, I can't get it to work with your example (I'm on Stata 12). But the description is clear. For example:
A bystyle determines the overall look of the combined graphs,
including
whether the individual graphs have their own axes and labels or if instead the axes and labels are shared across graphs arrayed in the
same row and/or in the same column;
...
There are options that let you control each of the above attributes --
see [G-3] by_option --
And also
iyaxes and ixaxes (and noiyaxes and noixaxes) specify whether the y axes and x axes are
to be displayed with each graph. The default
with most styles and
schemes is to place y axes on the leftmost graph of each row and to place x axes on
the bottommost graph of each column. The y and
x axes include the
default ticks and labels but exclude the axes titles.
If for some reason that doesn't work out, something like
sysuse nlsw88, clear
drop if race == 3
graph dot (mean) wage, over(occ) by(race)
gr_edit .plotregion1.grpaxis[2].draw_view.setstyle, style(no)
does (but I don't really like the approach). You can mess with at least the axis number [#] to do a bit of customization. I guess recording changes in the graphical editor and then recycling the corresponding code, may be one way out of difficult situations.

Intelligent Y Axis Scaling BarPlot R

I want to plot some data with barplot. Rather, I want to make a bar graph and barplot seemed the logical choice. I am plotting just fine but I was wondering if there is a way to intelligently scale the y axis to round up from the highest count.
For example I set the yaxis in this case to be 30, because I knew that Strand.22 had 27 counts in it: barplot(unlist(d), ylim=c(0,30), xlab="Forward Reverse", ylab="Counts")
In the future, I want this script to run on its own, so it would be optimal for the the Y-axis to choose it's own ylim. Short of pulling the information out of my 'd' variable I can't think of a good way to do this. Is there an easy way to do this with barplot? Would some other plotter work better? I have seen things about ggplots but it seemed super complex and I wasn't sure that it would do anything better.
EDIT: If I do not choose a ylim it picks automatically and this is what it decided was best.
I disagree with it's choice.
If you don't specify ylim, R will come up with something based on the data. (Sounds like you don't like it's choice, which is fair.)
If you specify something based on the data like:
barplot(unlist(d), ylim=c(0,1.1*max(unlist(d)))
R will draw you a plot that reflects the maximum value of data. That example just takes the maximum of your values and multiplies that by 1.1 (this could be any number) to give it a little extra height. R does something similar to this when you make a scatterplot but it handles barplots slightly differently.

Gnuplot: How to remove vectors below a certain magnitude from vector field?

I have a 2D CFD code that gives me the x and y flow velocity at every point on a grid. I am currently visualizing the data using a vector field in gnuplot. My goal is to see how far the plume from an eruption extends, so it would be much cleaner if I could prevent vectors from showing up at all in the field if they fall below a certain magnitude. Does anyone have an idea how to go about this? My current gnuplot script is below. I can also modify the input file as necessary.
reset
set nokey
set term png
set xrange [0:5.1]
set yrange [0:10.1]
do for [i=0:10] {
set title 'Eruption simulation: Timestep '.i
set output 'path/FlowVel'.sprintf('%04.0f',i).'.png'
plot 'path/Flow'.sprintf('%04.0f',i).'.dat' using 1:2:3:4 with vec
}
I guess you want a kind of filtering, which gnuplot doesn't really have, but can be achieved with the following trick (taken from "help using examples" in gnuplot):
One trick is to use the ternary `?:` operator to filter data:
plot 'file' using 1:($3>10 ? $2 : 1/0)
which plots the datum in column two against that in column one provided
the datum in column three exceeds ten. `1/0` is undefined; `gnuplot`
quietly ignores undefined points, so unsuitable points are suppressed.
Or you can use the pre-defined variable NaN to achieve the same result.
So I guess you would want something like this in your case
plot "data.dat" u 1:2:($3**2+$4**2>mag_sq?$3:NaN):($3**2+$4**2>mag_sq?$4:NaN) w vector
where mag_sq is the square of your desired magnitude.

equivalent to MatLab "bar" function in R?

Is there an function in R that does the same job as Matlab's "bar" function?
R does have a "barplot" function in the library graphics, however, it is not the same.
The Matlab bar(X,Y) (verbatim excerpt from MATLAB documentation) "draws a bar for each element in Y at locations specified in X, where X is a vector defining the x-axis intervals for the vertical bars." (emphasis mine)
However, the R barplot function does not allow one to specify locations.
Perhaps there is a method in ggplot2 that supports this? I am only able to find standard bar charts in ggplot2.
No, barplot is not the same as bar, but you should read the whole help. You can do many things to position the bars. The first is simply their order in Y. You could insert spaces if you wish (additional 0s). If you have X and Y then sort Y on X (Y[order(X)]) and plot it. If you need to change positions use the "space" and "width" arguments. It's not as straightforward as specifying X values I suppose but it's definitely more useful in most situations. Generally what you want to adjust is widths of bars and spaces between bars. Their position on the X-axis should be arbitrary. If the position on the X-axis is really meaningful then you should be using line plots, not bar graphs.
In R:
barplot(rbind(1:10, 2:11), beside=T, names.arg=1:10)
In MATLAB:
>> bar(1:10, [(1:10)' (2:11)'])
Read up on par . Then observe, for example:
x<-c(1,2,4,5,6)
y<-c(3,4,3,4,2)
plot(x,y,type='h',lwd=6)
Edit: yes, I know this doesn't (yet) plot multiple data sets, but I would hope you can see simple ways to make that happen, with spacings, colors, etc. specified to your exact liking :-)
Sounds vaguely like the R stepfun. On the other hand one would need to know what "draws a bar" means before saying it is not the same as barplot(..., horiz=TRUE) One would, of course, need to examine some more detailed evidence such as data and plots before arriving at a conclusion, however. #John Colby should be congratulated for adding some specificity to the discussion. The axis function is probably what Quant Guy needs education regarding.

Resources