What is an inch? Setting the length for arrows - r

Somewhat inexplicably, the length parameter in arrows is specified in inches (from ?arrows):
length length of the edges of the arrow head (in inches).
R source even goes so far as to explicitly make note that this measurement is in inches in a comment, highlighting how peculiar this design is.
That means the relative size of the arrows depends on dev.size(). What's not clear is how to translate inches into axis units (which are infinitely more useful in the first place). Here's a simplified version:
h = c(1, 2, 3)
xs = barplot(h, space = 0, ylim = c(0, 4))
arrows(xs, h - .5, xs, h + .5,
length = .5*mean(diff(xs)))
How this displays will depend on the device. E.g. here is the output on this device:
png('test.png', width = 5, height = 5)
And here it is on another:
png('test.png', width = 8, height = 8)
It's a bit of an optical illusion to tell on sight, but the arrows are indeed the same width in the two plots. How can I control this so that both plots (which convey the same data) display identically? More specifically, how can I make sure that the arrows are exactly .5 plot units in width?

I spent far too much time in the rabbit hole on this, but here goes. I'll document a bit of my journey first to aid others who happen upon this in the types of nooks and crannies to search when trying to pull yourself up by your bootstraps.
I started looking in the source of arrows, but to no avail, since it quickly dives into internal code. So I searched the R source for "C_arrows" to find what's happening; luckily, it's not too esoteric, as far as R internal code goes. Poking around it seems the workhorse is actually GArrow, but this was a dead end, as it seems the length parameter isn't really transformed there (IIUC this means the conversion to inches is done for the other coordinates and length is untouched). But I happened to notice some GConvert calls that looked closer to what I want and hoped to find some user-facing function that appeals to these directly.
This led me to go back to R and to simply run through the gamut of functions in the same package as arrows looking for anything that could be useful:
ls(envir = as.environment('package:grDevices'))
ls(envir = as.environment('package:graphics'))
Finally I found three functions in graphics: xinch, yinch, and xyinch (all found on ?xinch) are used for the opposite of my goal here -- namely, they take inches and convert them into device units (in the x, y, and x&y directions, respectively). Luckily enough, these functions are all very simple, e.g. the work horse of xinch is the conversion factor:
diff(par("usr")[1:2])/par("pin")[1L]
Examining ?par (for the 1,000,000th time), indeed pin and usr are exactly the graphical parameter we need (pin is new to me, usr comes up here and there):
pin The current plot dimensions, (width, height), in inches.
usr A vector of the form c(x1, x2, y1, y2) giving the extremes of the user coordinates of the plotting region.
Hence, we can convert from plot units to inches by inverting this function:
xinch_inv = function(dev_unit) {
dev_unit * par("pin")[1L]/diff(par("usr")[1:2])
}
h = c(1, 2, 3)
xs = barplot(h, space = 0, ylim = c(0, 4))
arrows(xs, h - .5, xs, h + .5,
# just convert plot units to inches
length = xinch_inv(.5*mean(diff(xs))))
Resulting in (5x5):
And (8x8):
One further note, it appears length is the length of each side of the arrow head -- using length = xinch_inv(.5), code = 3, angle = 90 results in segments as wide as the bars (i.e., 1).
On the off chance you're interested, I've packaged these in my package as xdev2in, etc.; GitHub only for now.

Related

How to create exponential graph

How can I make an x-axis that doubles for every increment? I want equal distances between 0, 128, 256, 512, 1024 and 2048. How can I do that?
I'm trying to plot points from a benchmark where I measured time and doubled the memory size every increment.
You can cheat and plot with a linear axis, like from 1 up to as many numbers as you desire, then change the labels when you're done. You can use the 'xtick' property to set what horizontal tick values on your graph remain and the 'xticklabel' property to change the labels to your desired values.
labels = [0 128 256 512 1024 2048]; % Provide your labels here
x = 1 : numel(labels);
y = rand(1, numel(x)); % Insert your data here
plot(x, y, 'b.'); % Plot your data
set(gca, 'xtick', x); % Change the x-axis so only the right amount of ticks remain
set(gca, 'xticklabel', labels) % Change the labels to the desired ones
I get the following graph. Note that the data I'm plotting is completely random as I don't have your data but I want to demonstrate what the changed plot looks like:
For more properties that you can change on your graph, see the Axes Properties page on the Octave docs.
With apologies to Rayryeng, since I'm essentially proposing the same method at heart, but I felt it was missing important info, such as how to convert the axis itself to equally spaced intervals in the first place, without messing with the data. So here's a complete solution for example data X vs Y, producing the equivalent of semilogx for base 2.
Y = 1 : 10;
X = 2 .^ Y;
XTicks = log2(X);
XTickLabels = {};
for XTick = XTicks
XTickLabels{end+1} = sprintf('2^{%d}', XTick);
end
plot (log2 (X), Y);
set(gca, 'xtick', XTicks, 'xticklabel', XTickLabels);
Note that if you plan to 'superimpose' another plot on top of this, you'll have to take into account that the actual values in the X axis are essentially "1, 2, 3, ... 10", so either "log-ify" the new plot's X-axis values too, before superimposing via hold on, or plot onto another, independent set of axes entirely and place them in the same position.
Note: I have assumed that you're after a base-2 logarithmic x-axis. If you do actually want the 0-128 interval to be the same as the 128-256 interval, then modify as per Rayrengs answer --- or even better, use a more appropriate graph, like a bar graph! (i.e. with the 'powers-of-two' used purely as descriptive labels for each column)

How do I make planes in RGL thicker?

I will try 3D printing data to make some nice visual illustration for a binary classification example.
Here is my 3D plot:
require(rgl)
#Get example data from mtcars and normalize to range 0:1
fun_norm <- function(k){(k-min(k))/(max(k)-min(k))}
x_norm <- fun_norm(mtcars$drat)
y_norm <- fun_norm(mtcars$mpg)
z_norm <- fun_norm(mtcars$qsec)
#Plot nice big spheres with rgl that I hope will look good after 3D printing
plot3d(x_norm, y_norm, z_norm, type="s", radius = 0.02, aspect = T)
#The sticks are meant to suspend the spheres in the air
plot3d(x_norm, y_norm, z_norm, type="h", lwd = 5, aspect = T, add = T)
#Nice thick gridline that will also be printed
grid3d(c("x","y","z"), lwd = 5)
Next, I wanted to add a z=0 plane, inspired by this blog here describing the r2stl written by Ian Walker. It is supposed to be the foundation of the printed structure that holds everything together.
planes3d(a=0, b=0, c=1, d=0)
However, it has no volume, it is a thin slab with height=0. I want it to form a solid base for the printed structure, which is meant to keep everything together (check out the aforementioned blog for more details, his examples are great). How do I increase the thickness of my z=0 plane to achieve the same effect?
Here is the final step to exporting as STL:
writeSTL("test.stl")
One can view the final product really nicely using the open source Meshlab as recommended by Ian in the blog.
Additional remark: I noticed that the thin plane is also separate from the grids that I added on the -z face of the cube and is floating. This might also cause a problem when printing. How can I merge the grids with the z=0 plane? (I will be sending the STL file to a friend who will print for me, I want to make things as easy for him as possible)
You can't make a plane thicker. You can make a solid shape (extrude3d() is the function to use). It won't adapt itself to the bounding box the way a plane does, so you would need to draw it last.
For example,
example(plot3d)
bbox <- par3d("bbox")
slab <- translate3d(extrude3d(bbox[c(1,2,2,1)], bbox[c(3,3,4,4)], 0.5),
0,0, bbox[5])
shade3d(slab, col = "gray")
produces this output:
This still isn't printable (the points have no support), but it should get you started.
In the matlib package, there's a function regvec3d() that draws a vector space representation of a 2-predictor multiple regression model. The plot method for the result of the function has an argument show.base that draws the base x1-x2 plane, and draws it thicker if show.base >0.
It is a simple hack that just draws a second version of the plane at a small offset. Maybe this will be enough for your application.
if (show.base > 0) planes3d(0, 0, 1, 0, color=col.plane, alpha=0.2)
if (show.base > 1) planes3d(0, 0, 1, -.01, color=col.plane, alpha=0.1)

Mutation step/symbol size when plotting haplotype networks with pegas

I've been trying to figure out how to make the little circles that represent mutation steps on a haplotype network bigger. For whatever reason, all the normal ways I'd think don't seem to be working. It seems like no matter what I do, the symbols remain tiny. What am I missing?
Here's a base bit of sample code:
data(woodmouse)
h <- haplotype(woodmouse)
net <- haploNet(h)
plot(net, size=attr(net,"freq")*3,bg=pal,labels=F, fast=F, legend=F,
show.mutation=T,threshold=0)
# using scale.ratio = 1, the mutations are visisble
plot(net, size=attr(net,"freq")*3,bg=pal,labels=F, fast=F, legend=F,
show.mutation=T,threshold=0,scale.ratio=3)
# but using scale.ratio=3, they get tiny / disappear
You can see the mutations here, but if I set scale.ratio to something bigger (a requirement with my own data), they essentially disappear.
I've tried passing a larger cex to plot (doesn't work) as well as setting cex globally with par (makes the whole plot smaller for some reason).
It seems like the circles are scaled with the lines, but I don't know how to control that. Is it even possible? Am I missing something really obvious?
cex controls the font size and will not help with graphics size. From the help page for the haploNet and plot.haploNet functions:
?haploNet
size: a numeric vector giving the diameter of the circles representing the haplotypes: this is in the same unit than the links and eventually recycled.
scale.ratio: the ratio of the scale of the links representing the number of steps on the scale of the circles representing the haplotypes. It may be needed to give a value greater than one to avoid overlapping circles.
This means that size of the links (circles representing mutations between haplotypes) is relative to the size of the haplotypes. To relatively enlarge the link size, you need to find a suitable combination of the two arguments.
set.seed(123)
net <- haploNet(haplotype(woodmouse[sample(15, size = 50, replace = TRUE), ]))
par(mfrow=c(1,2))
plot(net, size=attr(net,"freq"), labels=F, fast=F, legend=F,
show.mutation=T, threshold=0, scale.ratio=1)
plot(net, size=attr(net,"freq")*.2, labels=F, fast=F, legend=F,
show.mutation=T, threshold=0, scale.ratio=.2)
Enlarging circle size with haploNet runs into a risk that circles will overlap and the visible number of mutations will be incorrect. Use discretion with visualization and in case of problems, consider haplotype network calculation in TCS software where unsampled mutations are displayed with vertical bars or Network from Fluxus with proportional link lengths.

ggplot2 - The unit of size

A quick question that I can't find answer on the web (or Wickham's book):
What is the unit of the size argument in ggplot2? For example, geom_text(size = 10) -- 10 in what units?
The same question applies to default unit in ggsave(height = 10, width = 10).
The answer is : The unit is the points. It is the unit of fontsize in the grid package. In ?unit, we find the following definition
"points" Points. There are 72.27 points per inch.
(but note the closely related "bigpts" Big Points. 72 bp = 1 in.)
Internally ggplot2 will multiply the font size by a magic number ggplot2:::.pt, defined as 1/0.352777778.
Here a demonstration, I create a letter using grid and ggplot2 with same size:
library(grid)
library(ggplot2)
ggplot(data=data.frame(x=1,y=1,label=c('A'))) +
geom_text(aes(x,y,label=label),size=100)
## I divide by the magic number to get the same size.
grid.text('A',gp=gpar(fontsize=100/0.352777778,col='red'))
Addendum Thanks to #baptiste
The "magic number"(defined in aaa-constants.r as .pt <- 1 / 0.352777778) is really just the conversion factor between "points" and "mm", that is 1/72 * 25.4 = 0.352777778. Unfortunately, grid makes the subtle distinction between "pts" and "bigpts", which explains why convertUnit(unit(1, "pt"), "mm", valueOnly=TRUE) gives the slightly different value of 0.3514598.
The 'ggplot2' package, like 'lattice' before it, is built on the grid package. You can get the available units at:
?grid::unit
?grid::convertX
?grid::convertY
grid::convertX(grid::unit(72.27, "points"), "inches")
(I use the formalism pkg::func because in most cases grid is loaded a a NAMESPACE but not attached when either lattice or `ggplot2 are loaded.)
I earlier posted a comment that I later deleted saying that size was in points. I did so after seeing that the size of the text with size=10 was roughly 10 mm. The "magic" number mentioned by agstudy is in fact within 1% of:
as.numeric(grid::convertX(grid::unit(1, "points"), "mm"))
#[1] 0.3514598
0.352777778/.Last.value
#[1] 1.00375
From ?aes_linetype_size_shape
# Size examples
# Should be specified with a numerical value (in millimetres),
# or from a variable source
height and width in ggsave relate to par("din") from ?par
din
R.O.; the device dimensions, (width, height), in inches. See also dev.size,
which is updated immediately when an on-screen device windows is re-sized.
So I guess size in aes is in millimetres and ggsave height and width are in inches.

Remove redundant points for line plot

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.
My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?
Thank you for your time.
The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
Compute the line equation between first and last point
Check all other points to find what is the most distant from the line
If the worst point is below the tolerance you want then output a single segment
Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)].
About python syntax for arrays that may be non obvious:
x[a:b] is the part of array from index a up to index b (excluded)
x[n:] is the array made using elements of x from index n to the end
x[:n] is the array made using first n elements of x
a+b when a and b are arrays means concatenation
x[-1] is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.
I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.
So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.
Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.
I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.
I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.
Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

Resources