Plot a histogram of means and label individual mean/PDF - r

So I have this plot which is showing average scores of group of people. I would like to know how to, in the same picture, plot or lable X (see the picture, I added X with paint), where X presents the mean of one student compared to others.
My code
CairoPDF(paste('output/picture/', student, '_hist.pdf', sep=''), family='sans',)
hist(means.students.all, xlab="Means", main="Average Ratings")
dev.off()

Here are two ways of adding a label. First, some random data and the regular histogram:
set.seed(0)
means <- rnorm(1000, 4.5, 0.2)
hist(means)
One way to add what you want is plot one point where you want, using points()
points(x=means[1], y=0, pch="X", cex=1.5)
Use y for the vertical position, pch for the type or character to plot, and cex to control it's size.
Another option, which gives you more possibilites, is using text()
text(x=means[2], y=0, label="StudentX", cex=1.5, srt=90, adj=c(0,0.5))
This way you can plot a full string (like the Student's name), rotate it 90 degrees using srt to fit the plot better, and align the text properly with left horizontal align and centered vertical align (this is related to the unrotated text) using adj. All of the above will result in:

Related

R: Matching x-axis scales on upper and lower plot using layout with base graphics

I am trying to arrange 3 plots together. All 3 plots have the same y axis scale, but the third plot has a longer x axis than the other two. I would like to arrange the first two plots side by side in the first row and then place the third plot on the second row aligned to the right. Ideally I would like the third plot's x values to align with plot 2 for the full extent of plot 2 and then continue on below plot one. I have seen some other postings about using the layout function to reach this general configuration (Arrange plots in a layout which cannot be achieved by 'par(mfrow ='), but I haven't found anything on fine tuning the plots so that the scales match. Below is a crappy picture that should be able to get the general idea across.
I thought you could do this by using par("plt"), which returns the coordinates of the plot region as a fraction of the total figure region, to programmatically calculate how much horizontal space to allocate to the bottom plot. But even when using this method, manual adjustments are necessary. Here's what I've got for now.
First, set the plot margins to be a bit thinner than the default. Also, las=1 rotates the y-axis labels to be horizontal, and xaxs="i" (default is "r") sets automatic x-axis padding to zero. Instead, we'll set the amount of padding we want when we create the plots.
par(mar=c(3,3,0.5,0.5), las=1, xaxs="i")
Some fake data:
dat1=data.frame(x=seq(-5000,-2500,length=100), y=seq(-0.2,0.6,length=100))
dat2=data.frame(x=seq(-6000,-2500,length=100), y=seq(-0.2,0.6,length=100))
Create a layout matrix:
# Coordinates of plot region as a fraction of the total figure region
# Order c(x1, x2, y1, y2)
pdim = par("plt")
# Constant padding value for left and right ends of x-axis
pad = 0.04*diff(range(dat1$x))
# If total width of the two top plots is 2 units, then the width of the
# bottom right plot is:
p3w = diff(pdim[1:2]) * (diff(range(dat2$x)) + 2*pad)/(diff(range(dat1$x)) + 2*pad) +
2*(1-pdim[2]) + pdim[1]
# Create a layout matrix with 200 "slots"
n=200
# Adjustable parameter for fine tuning to get top and bottom plot lined up
nudge=2
# Number of slots needed for the bottom right plot
l = round(p3w/2 * n) - nudge
# Create layout matrix
layout(matrix(c(rep(1:2, each=0.5*n), rep(4:3,c(n - l, l))), nrow=2, byrow=TRUE))
Now create the graphs: The two calls to abline are just to show us whether the graphs' x-axes line up. If not, we'll change the nudge parameter and run the code again. Once we've got the layout we want, we can run all the code one final time without the calls to abline.
# Plot first two graphs
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
abline(v=-5000, xpd=TRUE, col="red")
# Lower right plot
plot(dat2, xaxt="n", xlim=range(dat2$x) + c(-pad,pad))
abline(v=-5000, xpd=TRUE, col="blue")
axis(1, at=seq(-6000,-2500,500))
Here's what we get with nudge=2. Note the plots are lined up, but this is also affected by the pixel size of the saved plot (for png files), and I adjusted the size to get the upper and lower plots exactly lined up.
I would have thought that casting all the quantities in ratios that are relative to the plot area (by using par("plt")) would have both ensured that the upper and lower plots lined up and that they would stay lined up regardless of the number of pixels in the final image. But I must be missing something about how base graphics work or perhaps I've messed up a calculation (or both). In any case, I hope this helps you get the plot layout you wanted.

Rotate labels for histogram bars - shown via: labels = TRUE

Here is shown how to label histogram bars with data values or percents using labels = TRUE. Is it also possible to rotate those labels? My goal is to rotate them to 90 degrees because now the labels over bars overrides each other and it is unreadable.
PS: please note that my goal is not to rotate y-axis labels as it is shown e.g. here
Using mtcars, here's one brute-force solution (though it isn't very brutish):
h <- hist(mtcars$mpg)
maxh <- max(h$counts)
strh <- strheight('W')
strw <- strwidth(max(h$counts))
hist(mtcars$mpg, ylim=c(0, maxh + strh + strw))
text(h$mids, strh + h$counts, labels=h$counts, adj=c(0, 0.5), srt=90)
The srt=90 is the key here, rotating 90 degrees counter-clockwise (anti-clockwise?).
maxh, strh, and strw are used (1) to determine how much to extend the y-axis so that the text is not clipped to the visible figure, and (2) to provide a small pad between the bar and the start of the rotated text. (The first reason could be mitigated by xpd=TRUE instead, but it might impinge on the main title, and will be a factor if you set the top margin to 0.)
Note: if using density instead of frequency, you should use h$density instead of h$counts.
Edit: changed adj, I always forget the x/y axes on it stay relative to the text regardless of rotation.
Edit #2: changing the first call to hist so the string height/width are calculate-able. Unfortunately, plotting twice is required in order to know the actual height/width.

Set margins to cater for large legend

I'm trying to figure out a way to calculate the height of a legend for a plot prior to setting the margins of the plot. I intend to place the legend below the plot below the x-axis labels and title.
As it is part of a function which plots a range of things the legend can grow and shrink in size to cater for 2 items, up to 15 or more, so I need to figure out how I can do this dynamically rather that hard-coding. So, in the end I need to dynamically set the margin and some other bits and pieces.
The key challenge is to figure out the height of the legend to feed into par(mar) prior to drawing the plot, but after dissecting the base codes for legend however, it seems impossible to get a solid estimate of the height value unless the plot is actually drawn (chicken and egg anyone?)
Here's what I've tried already:
get a height using the legend$rect$h output from the base legend function (which seems to give a height value which is incorrect unless the plot is actually drawn)
calculate the number of rows in the legend (easy) and multiply this by the line height (in order to do this, seems you'd need to translate into inches (the base legend code uses yinch and I've also tried grconvertY but neither of those work unless a plot has been drawn).
Another challenge is to work out the correct y value for placement of the legend - I figure that once I've solved the first challenge, the second will be easy.
EDIT:
After a day of sweating over how this is (not) working. I have a couple of insights and a couple of questions. For the sake of clarity, this is what my function essentially does:
step 1) set the margins
step 2) create the barplot on the left axis
step 3) re-set the usr coordinates - this is necessary to ensure alignment of the right axis otherwise it plots against the x-axis scale. Not good when they are markedly different.
step 4) create the right axis
step 5) create a series of line charts on the right axis
step 6) do some labelling of the two axes and the x-axis
step 7) add in the legend
Here are the questions
Q1) What units are things reported in? I'm interested in margin lines and coordinates (user-coordinates), inches is self explanatory. - I can do some conversions using grconvertY() but I'm not sure what I'm looking at and what I should be converting to - the documentation isn't so great.
Q2) I need to set the margin in step 1 so that there is enough room at the bottom of the chart for the legend. I think I'm getting that right, however I need to set the legend after the right axis and line charts are set, which means that the user coordinates (and the pixel value of an inch, has changed. Because of Q1 above I'm not sure how to translate one system to the other. Any ideas in this regard would be appreciated.
After another day of sweating over this here's what solved it mostly for me.
I pulled apart the code for the core legend function and compiled this:
#calculate legend buffer
cin <- par("cin")
Cex <- par("cex")
yc <- Cex * cin[2L] #cin(inches) * maginfication
yextra <- 0
ymax <- yc * max(1, strheight("Example", units = "inches", cex = Cex)/yc)
ychar <- yextra + ymax #coordinates
legendHeight <- (legendLines * ychar) + yc # in
Which is essentially mimicking the way the core function calculates legend height but returns the height in inches rather than in user coordinates. legendLines is the number of lines in the legend.
After that, it's a doddle to work out how to place the legend, and to set the lower margin correctly. I'm using:
#calculate inches per margin line
inchesPerMarLine<-par("mai")[1]/par("mar")[1]
To calculate the number of inches per margin line, and the following to set the buffers (for the axis labels and title, and the bottom of the chart), and the margin of the plot.
#set buffers
bottomBuffer = 1
buffer=2
#calculate legend buffer
legBuffer <- legendHeight/inchesPerMarLine
#start the new plot
plot.new()
# set margin
bottomMargin <- buffer + legBuffer + bottomBuffer
par(mar=c(bottomMargin,8,3,5))
The plot is made
barplot(data, width=1, col=barCol, names.arg=names, ylab="", las=1 ,axes=F, ylim=c(0,maxL), axis.lty=1)
And then the legend is placed. I've used a different method to extract the legend width which does have some challenges when there is a legend with 1 point, however, it works ok for now. Putting the legend into a variable allows you to access the width of the box like l$rect$w. trace=TRUE and plot=FALSE stop the legend being written to the plot just yet.
ycoord <- -1*(yinch(inchesPerMarLine*buffer)*1.8)
l<-legend(x=par("usr")[1], y=ycoord, inset=c(0,-0.25), legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3, trace=TRUE,plot=FALSE)
lx <- mean(par("usr")[1:2]-(l$rect$w/2))
legend(x=lx, y=ycoord, legendText, fill=legendColour, horiz=FALSE, bty = "n", ncol=3)
For completeness, this is how I calculate the number of lines in the legend. Note - the number of columns in the legend is 3. labelSeries is the list of legend labels.
legendLines <- ceiling(nrow(labelSeries)/3)

asp is producing unnecessary whitespace within the axes of my R plot. How can I reformat the graph?

I'm trying to create a scatter plot + linear regression line in R 3.0.3. I originally tried to create it with the following simple call to plot:
plot(hops$average.temperature, hops$percent.alpha.acids)
This created this first plot:
As you can see, the scales of the Y and X axes differ. I tried fixing this using the asp parameter, as follows:
plot(hops$average.temperature, hops$percent.alpha.acids, asp=1, xaxp=c(13,18,5))
This produced this second plot:
Unfortunately, setting asp to 1 appears to have compressed the X axis while using the same amount of space, leaving large areas of unused whitespace on either side of the data. I tried using xlim to constrain the size of the X-axis, but asp seemed to overrule it as it didn't have any effect on the plot.
plot(hops$average.temperature, hops$percent.alpha.acids, xlim=c(13,18), asp=1, xaxp=c(13,18,5))
Any suggestions as to how I could get the axes to be on the same scale without creating large amounts of whitespace?
Thanks!
One solution would be to use par parameter pty and set it to "s". See ?par:
pty
A character specifying the type of plot region to be used; "s"
generates a square plotting region and "m" generates the maximal
plotting region.
It forces the plot to be square (thus conteracting the side effect of asp).
hops <- data.frame(a=runif(100,13,18),b=runif(100,2,6))
par(pty="s")
plot(hops$a,hops$b,asp=1)
I agree with plannapus that the issue is with your plotting area. You can also fix this within the device size itself by ensuring that you plot to a square region. The example below opens a plotting device with square dimension; then the margins are also set to maintain these proportions:
Example:
n <- 20
x <- runif(n, 13, 18)
y <- runif(n, 2, 6)
png("plot.png", width=5, height=5, units="in", res=200)
par(mar=c(5,5,1,1))
plot(x, y, asp=1)
dev.off()

How to display all x labels in R barplot?

This is a basic question but I am unable to find an answer. I am generating about 9 barplots within one panel and each barplot has about 12 bars. I am providing all the 12 labels in my input but R is naming only alternate bars. This is obviously due to to some default setting in R which needs to be changed but I am unable to find it.
You may be able get all of the labels to appear if you use las=2 inside the plot() call. This argument and the others mentioned below are described in ?par which sets the graphical parameters for plotting devices. That rotates the text 90 degrees. Otherwise, you will need to use xaxt="n" (to suppress ticks and labels) and then put the labels in with a separate call to axis(1, at= <some numerical vector>, labels=<some character vector>).
# midpts <- barplot( ... ) # assign result to named object
axis(1, at = midpts, labels=names(DD), cex.axis=0.7) # shrinks axis labels
Another method is to first collect the midpoints and then use text() with xpd=TRUE to allow text to appear outside the plot area and srt be some angle for text rotation as named arguments to control the degree of text rotation:
text(x=midpts, y=-2, names(DD), cex=0.8, srt=45, xpd=TRUE)
The y-value needs to be chosen using the coordinates in the plotted area.
Copying a useful comment: For future readers who don't know what these arguments do: las=2 rotates the labels counterclockwise by 90 degrees. furthermore, if you need to reduce the font you can use cex.names=.5 to shrink the size down
To get rotated labels on a base R barplot, you could (like I do here) adapt one of the
examples given in the vignette of the gridBase package:
library(grid)
library(gridBase)
## Make some data with names long enough that barplot won't print them all
DD <- table(rpois(100, lambda=5))
names(DD) <- paste("long", names(DD), sep="_")
## Plot, but suppress the labels
midpts <- barplot(DD, col=rainbow(20), names.arg="")
## Use grid to add the labels
vps <- baseViewports()
pushViewport(vps$inner, vps$figure, vps$plot)
grid.text(names(DD),
x = unit(midpts, "native"), y=unit(-1, "lines"),
just="right", rot=50)
popViewport(3)
R won't label every bar if the labels are too big.
I would suggest trying to rotate the labels vertically by passing in the las=2 argument to your plotting function.
If the labels are still too large, you can try shrinking the font by using the cex.names=.5 argument.
Sample Data for plot
sample_curve <- c(2.31,2.34,2.37,2.52,2.69,2.81,2.83,2.85,2.94, 3.03, 3.21, 3.33) # create a sample curve
names(sample_curve)<-c("1 MO","2 MO","3 MO","6 MO","1 YR","2 YR","3 YR","5 YR","7 YR","10 YR","20 YR","30 YR") # label the curve
Example of plot with labels too big
barplot(sample_curve) # labels too big for the plot
Example of plot with labels rotated and small
barplot(sample_curve, las=2, cex.names=.5) # lables are rotated and smaller, so they fit
before plotting the barplot()
You can simply increase the margins with par() and your margins values (your plot has 4 margins) mar = c(v1,v2,v3,V4)
par(mar=c(10,4,4,4))
as example :
par(mar=c(10,4,4,4))
barplot(height=c(1,5,8,19,7),
names.arg=c("very long label 1","very long label 2",
"very long label 3","very long label 4",
"very long label 5"), las=2 )

Resources