I have created a stripchart in R using the code below:
oldFaithful <- read.table("http://www.isi-stats.com/isi/data/prelim/OldFaithful1.txt", header = TRUE)
par(bty = "n") #Turns off plot border
stripchart(oldFaithful, #Name of the data frame we want to graph
method = "stack", #Stack the dots (no overlap)
pch = 20, #Use dots instead of squares (plot character)
at = 0, #Aligns dots along axis
xlim = c(40,100)) #Extends axis to include all data
The plot contains a large amount of extra space or whitespace at the top of the graph, as shown below.
Is there a way to eliminate the extra space at the top?
Short Answer
Add the argument offset=1, as in
stripchart(oldFaithful, offset=1, ...)
Long Answer
You really have to dig into the code of stripchart to figure this one out!
When you set a ylim by calling stripchart(oldFaithful, ylim=c(p,q)) or when you let stripchart use its defaults, it does in fact set the ylim when it creates the empty plotting area.
However, it then has to plot the points on that empty plotting area. When it does so, the y-values for the points at one x-value are specified as (1:n) * offset * csize. Here's the catch, csize is based on ylim[2], so the smaller you make the upper ylim, the smaller is csize, effectively leaving the space at the top of the chart no matter the value of ylim[2].
As a quick aside, notice that you can "mess with" ylim[1]. Try this:
stripchart(oldFaithful, ylim=c(2,10), pch=20, method="stack")
OK, back to solving your problem. There is a second reason that there is space at the top of the plot, and that second reason is offset. By default, offset=1/3 which (like csize) is "shrinking" down the height of the y-values of the points being plotted. You can negate this behavior be setting offset closer or equal to one, as in offset=0.9.
Related
I am trying to arrange 3 plots together. All 3 plots have the same y axis scale, but the third plot has a longer x axis than the other two. I would like to arrange the first two plots side by side in the first row and then place the third plot on the second row aligned to the right. Ideally I would like the third plot's x values to align with plot 2 for the full extent of plot 2 and then continue on below plot one. I have seen some other postings about using the layout function to reach this general configuration (Arrange plots in a layout which cannot be achieved by 'par(mfrow ='), but I haven't found anything on fine tuning the plots so that the scales match. Below is a crappy picture that should be able to get the general idea across.
I thought you could do this by using par("plt"), which returns the coordinates of the plot region as a fraction of the total figure region, to programmatically calculate how much horizontal space to allocate to the bottom plot. But even when using this method, manual adjustments are necessary. Here's what I've got for now.
First, set the plot margins to be a bit thinner than the default. Also, las=1 rotates the y-axis labels to be horizontal, and xaxs="i" (default is "r") sets automatic x-axis padding to zero. Instead, we'll set the amount of padding we want when we create the plots.
par(mar=c(3,3,0.5,0.5), las=1, xaxs="i")
Some fake data:
dat1=data.frame(x=seq(-5000,-2500,length=100), y=seq(-0.2,0.6,length=100))
dat2=data.frame(x=seq(-6000,-2500,length=100), y=seq(-0.2,0.6,length=100))
Create a layout matrix:
# Coordinates of plot region as a fraction of the total figure region
# Order c(x1, x2, y1, y2)
pdim = par("plt")
# Constant padding value for left and right ends of x-axis
pad = 0.04*diff(range(dat1$x))
# If total width of the two top plots is 2 units, then the width of the
# bottom right plot is:
p3w = diff(pdim[1:2]) * (diff(range(dat2$x)) + 2*pad)/(diff(range(dat1$x)) + 2*pad) +
2*(1-pdim[2]) + pdim[1]
# Create a layout matrix with 200 "slots"
n=200
# Adjustable parameter for fine tuning to get top and bottom plot lined up
nudge=2
# Number of slots needed for the bottom right plot
l = round(p3w/2 * n) - nudge
# Create layout matrix
layout(matrix(c(rep(1:2, each=0.5*n), rep(4:3,c(n - l, l))), nrow=2, byrow=TRUE))
Now create the graphs: The two calls to abline are just to show us whether the graphs' x-axes line up. If not, we'll change the nudge parameter and run the code again. Once we've got the layout we want, we can run all the code one final time without the calls to abline.
# Plot first two graphs
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
abline(v=-5000, xpd=TRUE, col="red")
# Lower right plot
plot(dat2, xaxt="n", xlim=range(dat2$x) + c(-pad,pad))
abline(v=-5000, xpd=TRUE, col="blue")
axis(1, at=seq(-6000,-2500,500))
Here's what we get with nudge=2. Note the plots are lined up, but this is also affected by the pixel size of the saved plot (for png files), and I adjusted the size to get the upper and lower plots exactly lined up.
I would have thought that casting all the quantities in ratios that are relative to the plot area (by using par("plt")) would have both ensured that the upper and lower plots lined up and that they would stay lined up regardless of the number of pixels in the final image. But I must be missing something about how base graphics work or perhaps I've messed up a calculation (or both). In any case, I hope this helps you get the plot layout you wanted.
Here is shown how to label histogram bars with data values or percents using labels = TRUE. Is it also possible to rotate those labels? My goal is to rotate them to 90 degrees because now the labels over bars overrides each other and it is unreadable.
PS: please note that my goal is not to rotate y-axis labels as it is shown e.g. here
Using mtcars, here's one brute-force solution (though it isn't very brutish):
h <- hist(mtcars$mpg)
maxh <- max(h$counts)
strh <- strheight('W')
strw <- strwidth(max(h$counts))
hist(mtcars$mpg, ylim=c(0, maxh + strh + strw))
text(h$mids, strh + h$counts, labels=h$counts, adj=c(0, 0.5), srt=90)
The srt=90 is the key here, rotating 90 degrees counter-clockwise (anti-clockwise?).
maxh, strh, and strw are used (1) to determine how much to extend the y-axis so that the text is not clipped to the visible figure, and (2) to provide a small pad between the bar and the start of the rotated text. (The first reason could be mitigated by xpd=TRUE instead, but it might impinge on the main title, and will be a factor if you set the top margin to 0.)
Note: if using density instead of frequency, you should use h$density instead of h$counts.
Edit: changed adj, I always forget the x/y axes on it stay relative to the text regardless of rotation.
Edit #2: changing the first call to hist so the string height/width are calculate-able. Unfortunately, plotting twice is required in order to know the actual height/width.
I'm trying to figure out a way to calculate the height of a legend for a plot prior to setting the margins of the plot. I intend to place the legend below the plot below the x-axis labels and title.
As it is part of a function which plots a range of things the legend can grow and shrink in size to cater for 2 items, up to 15 or more, so I need to figure out how I can do this dynamically rather that hard-coding. So, in the end I need to dynamically set the margin and some other bits and pieces.
The key challenge is to figure out the height of the legend to feed into par(mar) prior to drawing the plot, but after dissecting the base codes for legend however, it seems impossible to get a solid estimate of the height value unless the plot is actually drawn (chicken and egg anyone?)
Here's what I've tried already:
get a height using the legend$rect$h output from the base legend function (which seems to give a height value which is incorrect unless the plot is actually drawn)
calculate the number of rows in the legend (easy) and multiply this by the line height (in order to do this, seems you'd need to translate into inches (the base legend code uses yinch and I've also tried grconvertY but neither of those work unless a plot has been drawn).
Another challenge is to work out the correct y value for placement of the legend - I figure that once I've solved the first challenge, the second will be easy.
EDIT:
After a day of sweating over how this is (not) working. I have a couple of insights and a couple of questions. For the sake of clarity, this is what my function essentially does:
step 1) set the margins
step 2) create the barplot on the left axis
step 3) re-set the usr coordinates - this is necessary to ensure alignment of the right axis otherwise it plots against the x-axis scale. Not good when they are markedly different.
step 4) create the right axis
step 5) create a series of line charts on the right axis
step 6) do some labelling of the two axes and the x-axis
step 7) add in the legend
Here are the questions
Q1) What units are things reported in? I'm interested in margin lines and coordinates (user-coordinates), inches is self explanatory. - I can do some conversions using grconvertY() but I'm not sure what I'm looking at and what I should be converting to - the documentation isn't so great.
Q2) I need to set the margin in step 1 so that there is enough room at the bottom of the chart for the legend. I think I'm getting that right, however I need to set the legend after the right axis and line charts are set, which means that the user coordinates (and the pixel value of an inch, has changed. Because of Q1 above I'm not sure how to translate one system to the other. Any ideas in this regard would be appreciated.
After another day of sweating over this here's what solved it mostly for me.
I pulled apart the code for the core legend function and compiled this:
#calculate legend buffer
cin <- par("cin")
Cex <- par("cex")
yc <- Cex * cin[2L] #cin(inches) * maginfication
yextra <- 0
ymax <- yc * max(1, strheight("Example", units = "inches", cex = Cex)/yc)
ychar <- yextra + ymax #coordinates
legendHeight <- (legendLines * ychar) + yc # in
Which is essentially mimicking the way the core function calculates legend height but returns the height in inches rather than in user coordinates. legendLines is the number of lines in the legend.
After that, it's a doddle to work out how to place the legend, and to set the lower margin correctly. I'm using:
#calculate inches per margin line
inchesPerMarLine<-par("mai")[1]/par("mar")[1]
To calculate the number of inches per margin line, and the following to set the buffers (for the axis labels and title, and the bottom of the chart), and the margin of the plot.
#set buffers
bottomBuffer = 1
buffer=2
#calculate legend buffer
legBuffer <- legendHeight/inchesPerMarLine
#start the new plot
plot.new()
# set margin
bottomMargin <- buffer + legBuffer + bottomBuffer
par(mar=c(bottomMargin,8,3,5))
The plot is made
barplot(data, width=1, col=barCol, names.arg=names, ylab="", las=1 ,axes=F, ylim=c(0,maxL), axis.lty=1)
And then the legend is placed. I've used a different method to extract the legend width which does have some challenges when there is a legend with 1 point, however, it works ok for now. Putting the legend into a variable allows you to access the width of the box like l$rect$w. trace=TRUE and plot=FALSE stop the legend being written to the plot just yet.
ycoord <- -1*(yinch(inchesPerMarLine*buffer)*1.8)
l<-legend(x=par("usr")[1], y=ycoord, inset=c(0,-0.25), legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3, trace=TRUE,plot=FALSE)
lx <- mean(par("usr")[1:2]-(l$rect$w/2))
legend(x=lx, y=ycoord, legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3)
For completeness, this is how I calculate the number of lines in the legend. Note - the number of columns in the legend is 3. labelSeries is the list of legend labels.
legendLines <- ceiling(nrow(labelSeries)/3)
I'm looking to plot a set of sparklines in R with just a 0 and 1 state that looks like this:
Does anyone know how I might create something like that ideally with no extra libraries?
I don't know of any simple way to do this, so I'm going to build up this plot from scratch. This would probably be a lot easier to design in illustrator or something like that, but here's one way to do it in R (if you don't want to read the whole step-by-step, I provide my solution wrapped in a reusable function at the bottom of the post).
Step 1: Sparklines
You can use the pch argument of the points function to define the plotting symbol. ASCII symbols are supported, which means you can use the "pipe" symbol for vertical lines. The ASCII code for this symbol is 124, so to use it for our plotting symbol we could do something like:
plot(df, pch=124)
Step 2: labels and numbers
We can put text on the plot by using the text command:
text(x,y,char_vect)
Step 3: Alignment
This is basically just going to take a lot of trial and error to get right, but it'll help if we use values relative to our data.
Here's the sample data I'm working with:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
I'm going to start out by plotting an empty box to use as our canvas. To make my life a little easier, I'm going to set the coordinates of the box relative to values meaningful to my data. The Y positions of the 4 data series will be the same across all plotting elements, so I'm going to store that for convenience.
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
With this box in place, I can start adding elements. For each element, the X values will all be the same, so we can use rep to set that vector, and seq to set the Y vector relative to Y range of our plot (1:n). I'm going to shift the positions by percentages of the X and Y ranges to align my values, and modified the size of the text using the cex parameter. Ultimately, I found that this works out:
ypos = rev(seq(1+.1*n,n*.9, length.out=n))
text(rep(1,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=2) # Increase the size of the text
I reversed the sequence of Y values because I built my sequence in ascending order, and the values on the Y axis in my plot increase from bottom to top. Reversing the Y values then makes it so the series in my dataframe will print from top to bottom.
I then repeated this process for the second label, shifting the X values over but keeping the Y values the same.
text(rep(.37*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=2)
Finally, we shift X over one last time and use points to build the sparklines with the pipe symbol as described earlier. I'm going to do something sort of weird here: I'm actually going to tell points to plot at as many positions as I have data points, but I'm going to use ifelse to determine whether or not to actually plot a pipe symbol or not. This way everything will be properly spaced. When I don't want to plot a line, I'll use a 'space' as my plotting symbol (ascii code 32). I will repeat this procedure looping through all columns in my dataframe
for(i in 1:n){
points(seq(.5*m,m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=2,
col='gray')
}
So, piecing it all together and wrapping it in a function, we have:
df = data.frame(replicate(4, rbinom(50, 1, .7)))
colnames(df) = c('steps','atewell','code','listenedtoshell')
BinarySparklines = function(df,
L_adj=1,
mid_L_adj=0.37,
mid_R_adj=0.5,
R_adj=1,
bottom_adj=0.1,
top_adj=0.9,
spark_col='gray',
cex1=2,
cex2=2,
cex3=2
){
# 'adJ' parameters are scalar multipliers in [-1,1]. For most purposes, use [0,1].
# The exception is L_adj which is any value in the domain of the plot.
# L_adj < mid_L_adj < mid_R_adj < R_adj
# and
# bottom_adj < top_adj
n=ncol(df)
m=nrow(df)
plot(1:m,
seq(1,n, length.out=m),
# The following arguments suppress plotting values and axis elements
type='n',
xaxt='n',
yaxt='n',
ann=F)
ypos = rev(seq(1+.1*n,n*top_adj, length.out=n))
text(rep(L_adj,n),
ypos,
colnames(df), # These are our labels
pos=4, # This positions the text to the right of the coordinate
cex=cex1) # Increase the size of the text
text(rep(mid_L_adj*m,n), # Shifted towards the middle of the plot
ypos,
colSums(df), # new label
pos=4,
cex=cex2)
for(i in 1:n){
points(seq(mid_R_adj*m, R_adj*m, length.out=m),
rep(ypos[i],m),
pch=ifelse(df[,i], 124, 32), # This determines whether to plot or not
cex=cex3,
col=spark_col)
}
}
BinarySparklines(df)
Which gives us the following result:
Try playing with the alignment parameters and see what happens. For instance, to shrink the side margins, you could try decreasing the L_adj parameter and increasing the R_adj parameter like so:
BinarySparklines(df, L_adj=-1, R_adj=1.02)
It took a bit of trial and error to get the alignment right for the result I provided (which is what I used to inform the default values for BinarySparklines), but I hope I've given you some intuition about how I achieved it and how moving things using percentages of the plotting range made my life easier. In any event, I hope this serves as both a proof of concept and a template for your code. I'm sorry I don't have an easier solution for you, but I think this basically gets the job done.
I did my prototyping in Rstudio so I didn't have to specify the dimensions of my plot, but for posterity I had 832 x 456 with the aspect ratio maintained.
I have a plot that has data that runs into the area I'd like to use for a legend. Is there a way to have the plot automatically put in something like a header space above the highest data points to fit the legend into?
I can get it to work if I manually enter the ylim() arguments to expand the size and then give the exact coordinates of where I want the legend located, but I'd prefer to have a more flexible means of doing this as it's a front end for a data base query and the data levels could have very different levels.
Edit 2017:
use ggplot and theme(legend.position = ""):
library(ggplot2)
library(reshape2)
set.seed(121)
a=sample(1:100,5)
b=sample(1:100,5)
c=sample(1:100,5)
df = data.frame(number = 1:5,a,b,c)
df_long <- melt(df,id.vars = "number")
ggplot(data=df_long,aes(x = number,y=value, colour=variable)) +geom_line() +
theme(legend.position="bottom")
Original answer 2012:
Put the legend on the bottom:
set.seed(121)
a=sample(1:100,5)
b=sample(1:100,5)
c=sample(1:100,5)
dev.off()
layout(rbind(1,2), heights=c(7,1)) # put legend on bottom 1/8th of the chart
plot(a,type='l',ylim=c(min(c(a,b,c)),max(c(a,b,c))))
lines(b,lty=2)
lines(c,lty=3,col='blue')
# setup for no margins on the legend
par(mar=c(0, 0, 0, 0))
# c(bottom, left, top, right)
plot.new()
legend('center','groups',c("A","B","C"), lty = c(1,2,3),
col=c('black','black','blue'),ncol=3,bty ="n")
You have to add the size of the legend box to the ylim range
#Plot an empty graph and legend to get the size of the legend
x <-1:10
y <-11:20
plot(x,y,type="n", xaxt="n", yaxt="n")
my.legend.size <-legend("topright",c("Series1","Series2","Series3"),plot = FALSE)
#custom ylim. Add the height of legend to upper bound of the range
my.range <- range(y)
my.range[2] <- 1.04*(my.range[2]+my.legend.size$rect$h)
#draw the plot with custom ylim
plot(x,y,ylim=my.range, type="l")
my.legend.size <-legend("topright",c("Series1","Series2","Series3"))
Building on #P-Lapointe solution, but making it extremely easy, you could use the maximum values from your data using max() and then you re-use those maximum values to set the legend xy coordinates. To make sure you don't get beyond the borders, you set up ylim slightly over the maximum values.
a=c(rnorm(1000))
b=c(rnorm(1000))
par(mfrow=c(1,2))
plot(a,ylim=c(0,max(a)+1))
legend(x=max(a)+0.5,legend="a",pch=1)
plot(a,b,ylim=c(0,max(b)+1),pch=2)
legend(x=max(b)-1.5,y=max(b)+1,legend="b",pch=2)
?legend will tell you:
Arguments
x, y
the x and y co-ordinates to be used to position the legend. They can be specified by keyword or in any way which is accepted by xy.coords: See ‘Details’.
Details:
Arguments x, y, legend are interpreted in a non-standard way to allow the coordinates to be specified via one or two arguments. If legend is missing and y is not numeric, it is assumed that the second argument is intended to be legend and that the first argument specifies the coordinates.
The coordinates can be specified in any way which is accepted by xy.coords. If this gives the coordinates of one point, it is used as the top-left coordinate of the rectangle containing the legend. If it gives the coordinates of two points, these specify opposite corners of the rectangle (either pair of corners, in any order).
The location may also be specified by setting x to a single keyword from the list bottomright, bottom, bottomleft, left, topleft, top, topright, right and center. This places the legend on the inside of the plot frame at the given location. Partial argument matching is used. The optional inset argument specifies how far the legend is inset from the plot margins. If a single value is given, it is used for both margins; if two values are given, the first is used for x- distance, the second for y-distance.