How are trellis axis limits calculated? - r

Say I want to create an ordinary xyplot without explicitly specifying axis limits, then how are axis limits calculated?
The following line of code produces a simple scatter plot. However, axis limits do not exactly range from 1 to 10, but are slightly expanded to the left and right and top and bottom sides (roughly by 0.5).
library(lattice)
xyplot(1:10 ~ 1:10, cex = 1.5, pch = 20, col = "black",
xlab = "x", ylab = "y")
Is there any way to determine the factor by which the axes were expanded on each site, e.g. using trellis.par.get? I already tried the following after executing the above-mentioned xyplot command:
library(grid)
downViewport(trellis.vpname(name = "figure"))
current.panel.limits()
$xlim
[1] 0 1
$ylim
[1] 0 1
Unfortunately, the panel limits are returned as normalized parent coordinates, which makes it impossible to obtain the "real" limits. Any suggestions would be highly appreciated!
Update:
Using base-R plot, the data range (and consequently the axis limits) is by default extended by 4% on each side, see ?par. But this factor doesn't seem to apply to 'trellis' objects. So what I am looking for is an analogue to the 'xaxs' (and 'yaxs') argument implemented in par.

Axis limits for xyplot are calculated in the extend.limits function. This function isn't exported from the lattice package, so to see it, type lattice:::extend.limits. Concerning a numeric vector, this function is passed the range of values from the corresponding data (c(1, 10) in this example). The final limits are calculated according to the following equation:
lim + prop * d * c(-1, 1)
lim are the limits of the data, in this case c(1, 10)
prop is lattice.getOption("axis.padding")$numeric, which by default is 0.07
d is diff(as.numeric(lim)), in this case 9
The result in this case is c(0.37, 10.63)
In case you're interested, the call stack from xyplot to extend.limits is
xyplot
xyplot.formula
limits.and.aspect
limitsFromLimitList
extend.limits

Related

double x-axis in R native plot, one for double frequency and the second for Dates [duplicate]

This question already has answers here:
How can I plot with 2 different y-axes?
(6 answers)
R: multiple x axis with annotations
(2 answers)
Closed 15 days ago.
I am using native R plot function to genertae graphics. looking to add a double x-axis on same plot. One holds doubles and the other x-axis holds Date object. I am using the following commands but they dont seem to work.
First x-axis:
axis.Date(1,at=seq(min(x$Date),na.rm=TRUE,max(x$Date),na.rm=TRUE,by="2 years"),format ="%Y-%m-%d",col.axis="white", cex=1)
Second x-axis:
axis(1,at=seq(min(f), max(f), by = 0.1), col.axis="white", cex=1)
The parameters for the R native plot:
x11()
par(mfrow=c(1,1),oma = c(0, 0, 2, 0) )
Result is only Dates on x-axis.
Up front: dual axes can easily be mis-used by mis-representing the data and/or ranges. It's easy for eyes to misconstrue correlation or relationships based on imperfect axis decisions. For scatter plots (such as below), I'm not a fan and tend to avoid them ... but I do use them under very controlled circumstances, as they can provide visual correlation of relative trends.
When I must do it, I'm a fan of using color as a way to more strongly tie points (or lines) with particular axes, though of course this does not work as well with color-impaired readers.
Given that preface ...
I believe the easiest way to handle multiple axes in base-R is to use par(new=TRUE) between plots. Here's an example:
par(mar = c(4,4,4,4) + 0.1)
plot(disp ~ mpg, data = mtcars, las = 1)
par(new = TRUE)
dat <- data.frame(dat = Sys.Date() + 0:5, y = 1:6)
plot(y ~ dat, data = dat, ann = FALSE, yaxt = "n", xaxt = "n", pch = 16, col = "red")
axis.Date(3, dat$dat[1], col = "red", line = 1)
axis(4, col = "red", line = 1, las = 1)
Other differentiating techniques include shapes or line-types (if lines) specific to each side, and adding those as clear markers on the secondary axes.
The use of par(new=TRUE) simply allows the next plot command to not reset/clear the canvas before starting over. This means that the subsequent plotting functions have no knowledge of what is existing. From ?par:
'new' logical, defaulting to 'FALSE'. If set to 'TRUE', the next
high-level plotting command (actually 'plot.new') should _not
clean_ the frame before drawing _as if it were on a *_new_*
device_. It is an error (ignored with a warning) to try to
use 'new = TRUE' on a device that does not currently contain
a high-level plot.
It doesn't work well with all plotting mechanisms (certainly nothing grid or ggplot2), and anything that might be sensitive to margins or oma or other parameters should be tested carefully with various ranges of data.
I intentionally used line=1 to "bump out" the top/right axes, another way to set them apart. Frankly, I often do that for the bottom/left (primary) axes as well, it can be aesthetically preferred ... but it's an option and not required for this technique to at least start the process.

Why is `ann=FALSE` not working in the boxplot call in R?

Trying to produce both a stripchart and a boxplot of the same (transformed) data but (because the boxplot is shifted down a tad) I don't want the axis labels twice:
set.seed(3121975)
bee = list(x1=rnbinom(50, mu = 4, size = .1),
x2=rnbinom(30,mu=6,size=.1),
x3=rnbinom(40,mu=2,size=.1))
f = function(x) asinh(sqrt(4*x+1.5))
stripchart(lapply(bee,f),method="stack",offset=.13,ylim=c(.8,3.9))
boxplot(lapply(bee,f),horizontal=TRUE,boxwex=.05,at=(1:3)-.1,add=TRUE,ann=FALSE)
Other things that don't work include: (i) leaving ann to take its default value of !add, (ii) specifying labels for ylab.
I presume I have missed something obvious but I am not seeing what it might be.
Just add yaxt = 'n' into boxplot() to suppress plotting of the y-axis. The argument ann controls axis titles and overall titles, not the axis itself.

When creating a 3d histogram in R, using Hist3D from the Plot3D library, the bins don't line up with axis ticks

As my title suggests I am trying to create a 3D histogram using the Plot3D package. The following is a minimum working (or rather not working) example of the problem I'm having:
library(plot3D)
x = runif(10000)/2
y=runif(10000)
cuts = c(0, 0.2, 0.4, 0.6, 0.8, 1)
x_cut = cut(x, cuts)
y_cut = cut(y, cuts)
xy_table = table(x_cut, y_cut)
hist3D(z=xy_table, ticktype = "detailed")
This produces the following image:
As you can observe in the image, the bins of the histogram extend outside of [0,1]x[0,1]. Is there anyway I can force the bins to line up exactly with the ticks on the axis? That is, I would like the graph to correctly represent that all data points have x and y values between 0 and 1. Looking at the plot now, one could be led to believe that the bin containing the origin, for example, might also contain the data point (-0.1, 0). This cannot happen in the data I am trying to display and I need the axis to convey that.
I've spent all day fiddling with the various axis parameters and whatnot but cannot get it to work. For example if I try to plot things using the command
hist3D(z=xy_table, ticktype = "detailed", xlim=c(0,1), ylim=c(0,1))
Than I get something even worse:
I feel like I must be missing something obvious but I'm just not seeing what it is. If anyone has an answer please do share. And thank you for taking the time to read my question.
0 - 1 range is the default behaviour of hist3D if you don't define x and y ranges.
You get the expected result if you define x and y arguments using te middle of the bins ( 0.1 0.3 0.5 0.7 0.9):
hist3D(x = seq(0.1,0.9,0.2),y=seq(0.1,0.9,0.2),z=xy_table, ticktype = "detailed")

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Base Plot, correctly defining axis [duplicate]

How can I change the spacing of tick marks on the axis of a plot?
What parameters should I use with base plot or with rgl?
There are at least two ways for achieving this in base graph (my examples are for the x-axis, but work the same for the y-axis):
Use par(xaxp = c(x1, x2, n)) or plot(..., xaxp = c(x1, x2, n)) to define the position (x1 & x2) of the extreme tick marks and the number of intervals between the tick marks (n). Accordingly, n+1 is the number of tick marks drawn. (This works only if you use no logarithmic scale, for the behavior with logarithmic scales see ?par.)
You can suppress the drawing of the axis altogether and add the tick marks later with axis().
To suppress the drawing of the axis use plot(... , xaxt = "n").
Then call axis() with side, at, and labels: axis(side = 1, at = v1, labels = v2). With side referring to the side of the axis (1 = x-axis, 2 = y-axis), v1 being a vector containing the position of the ticks (e.g., c(1, 3, 5) if your axis ranges from 0 to 6 and you want three marks), and v2 a vector containing the labels for the specified tick marks (must be of same length as v1, e.g., c("group a", "group b", "group c")). See ?axis and my updated answer to a post on stats.stackexchange for an example of this method.
With base graphics, the easiest way is to stop the plotting functions from drawing axes and then draw them yourself.
plot(1:10, 1:10, axes = FALSE)
axis(side = 1, at = c(1,5,10))
axis(side = 2, at = c(1,3,7,10))
box()
I have a data set with Time as the x-axis, and Intensity as y-axis. I'd need to first delete all the default axes except the axes' labels with:
plot(Time,Intensity,axes=F)
Then I rebuild the plot's elements with:
box() # create a wrap around the points plotted
axis(labels=NA,side=1,tck=-0.015,at=c(seq(from=0,to=1000,by=100))) # labels = NA prevents the creation of the numbers and tick marks, tck is how long the tick mark is.
axis(labels=NA,side=2,tck=-0.015)
axis(lwd=0,side=1,line=-0.4,at=c(seq(from=0,to=1000,by=100))) # lwd option sets the tick mark to 0 length because tck already takes care of the mark
axis(lwd=0,line=-0.4,side=2,las=1) # las changes the direction of the number labels to horizontal instead of vertical.
So, at = c(...) specifies the collection of positions to put the tick marks. Here I'd like to put the marks at 0, 100, 200,..., 1000. seq(from =...,to =...,by =...) gives me the choice of limits and the increments.
And if you don't want R to add decimals or zeros, you can stop it from drawing the x axis or the y axis or both using ...axt. Then, you can add your own ticks and labels:
plot(x, y, xaxt="n")
plot(x, y, yaxt="n")
axis(1 or 2, at=c(1, 5, 10), labels=c("First", "Second", "Third"))
I just discovered the Hmisc package:
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables.
library(Hmisc)
plot(...)
minor.tick(nx=10, ny=10) # make minor tick marks (without labels) every 10th

Resources