questions about axes and mtext - r

I am swimming backwards in my R knowledge. Please help!
ExampleData:
Site, Aluminum_Dissolved, Federal_Guideline
M1, 0.1, 0.4
M1, 0.2, 0.4
M1, 0.5, 0.4
M2, 0.6, 0.4
M2, 0.4, 0.4
M2, 0.3, 0.4
I have a simple function:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple",
par (cex.axis=2, las=2), mar=c(7,4,4,2)+0.1
X and Y axis Labels:
Once I increase the values on the axis so much, my xlab and ylab are obscured by axis text.
I have tried using:
`mpg=c(3,1,0)`
and altering values but that seems to get mess up with margin increase
`mar=c(7,4,4,2)+0.1`
I tried scrapping the xlab and ylab altogether and using mtext, but I can't get that to give me labels outside my axis text that are parallel to the y-axis. I have tried:
`mtext("Dissolved Aluminum", side=2, adj=0, las)` etc....
45 degree text on x-axis:
And, finally, I have tried reconstructing my x and y-axis with no avail and I can't seem to rotate my x-axis labels 45 degrees using SRT function. I have tried:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple",
xaxt='n', yaxt='n', axis(2, cex.axis=2, xlab="Dissolved Aluminum"),
axis(1, cex.axis=2, srt=45)
and this doesn't work. What am I missing. Is there a simple way to do this I am missing...

A quick tutorial:
The way that plotting works in base R graphics is general thought of as a "pen on paper" model. This means that each function you call draws "on top" of what you've created up to that point. Graphical parameters can either be set beforehand via a call to par, or passed directly to the plotting function directly (with some caveats). So for example, I would have done this as:
par(cex.axis=2, las=2,mar=c(7,4,4,2)+0.1)
boxplot(Aluminum_Dissolved ~ Site,data = dat,
col="purple",ylab = "Dissolved Aluminum",xlab = "Dissolved Aluminum")
If you wanted custom axes, you would have done something like:
par(cex.axis=2, las=2,mar=c(7,4,4,2)+0.1)
boxplot(Aluminum_Dissolved ~ Site,data = dat,
col="purple",ylab = "Dissolved Aluminum",xlab = "Dissolved Aluminum",axes = FALSE)
axis(...)
Subsequent call (on separate lines) to things like points or lines would add points or lines to the graph, respectively.
The caveat with par is that some parameters can only be set by calling par directly, not by passing them as named arguments to plotting functions. There is a list of those (which includes mar) located at ?par.

#joran was right -- i think i just messed up the order of the function. I get the axis labels working despite greater size in text using this code:
boxplot(ExampleData$Aluminum_Dissolved ~ ExampleData$Site, col="purple", par(cex.axis=2, cex.lab=1.8), ylab="Dissolved Aluminum")
The only problem with this is that the label is very close to text, but it is alright.

Related

Is it possible to create this graph on R?

I'm really new to R and I'm looking to create a graph similar to the one attached. I have tried to create a density plot using both ggplot and the base program.
I have used code ggplot(data, aes(x = Freq)) + geom_density() but the output is incorrect. I'm getting a spike at each number point rather than an overall curve. Every row is one data point of between 1 to 7 and the frequency distributions for one trait is as follows:
1: 500, 2: 550 3:700 4:1000 5:900 6:835: 7:550
As such I have 5035 rows as one row equates to one score.
Any help is much appreciated.
Here is what I wish the plot would look like. (Note I'll add other traits at a later stage, I just wish to add one line at the moment).
there are a few things going on here, first is generating summary statistics of the data. you just need to call mean and sd in the appropriate way to get mean and standard deviation from your data. you've not shown your data so it would be difficult to suggest much here.
as far as plotting these summary statistics, you can replicate the plot from the original paper easily, but it's pretty bad and I'd suggest you not do that. stronger lines imply more importance, the need to double label everything, mislabelling the y-axis, all of that on top of drawing nice smooth parametric curves gives a false impression of confidence. I've only scanned the paper, but that sort of data is crying out for a multi-level model of some sort
I prefer "base" graphics, ggplot is great for exploratory graphics but if you have hard constraints on what a plot should look like it tends to get in the way. We start with the summary statistics:
df <- read.csv(text="
title, mu, sigma,label, label_x,label_pos
Extraversion, 4.0, 1.08,Extra, 3.85,3
Agreeableness, 5.0, 0.77,Agree, 5.0, 3
Conscientiousness, 4.7, 0.97,Cons, 3.4, 2
Emotional stability,5.3, 0.84,Emot stab,5.9, 4
Intellect, 3.7, 0.86,Intellect,3.7, 3
")
I've just pulled numbers out of the paper here, you'd have to calcular them. the mu column is the mean of the variable, and sigma is the standard deviation. label_x and label_pos are used to draw labels so need to be manually chosen (or the plot can be annotated afterwards in something like Inkscape). label_x is the x-axis position, and label_pos stands for where it is in relation to the x-y point (see text for info about the pos parameter)
next we calculate a couple of things:
lwds <- 1 + seq(3, 1, len=5) ^ 2
label_y <- dnorm(df$label_x, df$mu, df$sigma)
i.e. line widths and label y positions, and we can start to make the plot:
# start by setting up plot nicely and setting plot limits
par(bty='l', mar=c(3, 3, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(0, 0.56), yaxs='i')
# loop over data drawing curves
for (i in 1:nrow(df)) {
curve(dnorm(x, df$mu[[i]], df$sigma[[i]]), add=T, n=151, lwd=lwds[[i]])
}
# draw labels
text(df$label_x, label_y, df$label, pos=df$label_pos)
# draw axes
axis(1, lwd=0, lwd.ticks=1)
axis(2, lwd=0, lwd.ticks=1)
box(lwd=1)
# finally, title and legend
title(xlab='Level of state', ylab='Probability density')
legend('topleft', legend=df$title, lwd=lwds, bty='n', cex=0.85)
this gives us something like:
I've also gone with more modern capitalisation, and started the y-axis at zero as these are probabilities so can't be negative
My preferences would be for something closer to this:
the thin lines cover 2 standard deviations (i.e. 95% intervals) around the mean, thick lines 1 SDs (68%), and the point is the mean. it's much easier to discriminate each measure and compare across them, and it doesn't artificially make "extraversion" more prominent. the code for this is similar:
par(bty='l', mar=c(3, 8, 0.5, 0.5), mgp=c(1.8, 0.4, 0), tck=-0.02)
plot.new(); plot.window(c(1, 7), c(5.3, 0.7))
# draw quantiles
for (i in 1:nrow(df)) {
lines(df$mu[[i]] + df$sigma[[i]] * c(-1, 1), rep(i,2), lwd=3)
lines(df$mu[[i]] + df$sigma[[i]] * c(-2, 2), rep(i,2), lwd=1)
}
# and means
points(df$mu, 1:5, pch=20)
axis(1, lwd=0, lwd.ticks=1)
axis(2, at=1:5, labels=df$title, lwd=0, lwd.ticks=1, las=1)
box()
title(xlab='Level of state')

How to make the placement of asterisks in a barplot easier?

I would like to place asterisks in my grouped barplot (R base) to indicate where the paired comparisons differ significantly. I know how to place these stars using the points command. However, from the posts that I read sofar it seems that one needs to find the right coordinates manually (e.g., group I: x=0.635, y=26, see the code below). This would take quite some time if one needs to find that out for all significant pairs.
So my question is: Is there an easier way to find the coordinates that correspond with the mid and just next to paired bars? I would prefer to do this in base plotting system at the moment but ggplot answers are also welcome. Thank you very much in advance!
Data example
set.seed(123)
dat<-matrix(runif(32, min = 0.5, max = 1), nrow=2, ncol=16)
colnames(dat)<-c(LETTERS[1:16])
par(mar=c(2,4,2,2))
mp<-barplot(dat, col=c("blue","red"), beside=TRUE, horiz=TRUE, xpd=FALSE, axes=FALSE, axisnames=TRUE, cex.names=0.8, las=2, xlim=c(0.5,1.0), main="Data Example")
axis(1, at=seq(0.5,1.0, by=0.1))
axis(2, at=mp, labels=FALSE, tick=FALSE)
points(x=0.635, y=26, pch="*", cex=2) #sign position at I
Let's say you have a vector telling you which pairs are significant. For example:
sign <- rep(TRUE, 16) ; sign[c(5, 7, 13:14)] <- FALSE
you already know the y coordinates of the letters:
colMeans(mp)
so you can define the y coordinates of the asterisks:
ord_sign <- colMeans(mp)[sign]
For the x coordinates, you can place them for example 0.01 point to the right from the max value:
abs_sign <- apply(dat, 2, max)[sign] + 0.01
Then you can draw all your asterisks at once:
points(x=abs_sign, y=ord_sign, pch="*", cex=2)

R levelplot remove outer border (adjust plot border)

I'm creating a correlation heatmap in R with levelplot (lattice).
I'd like borders between the boxes, but not along the outside since it interferes with the plot border.
How can I remove the outer borders from the boxes?
Here is my code:
levelplot(matrix, border="black",
colorkey=list(height=.25, space="right", at=seq(-1, 1, .25), cuts=7),
scales=list(y=(list(cex=1)), tck = c(1,0), x=list(cex=1, rot=90)),
main="Leaf Correlations", xlab="", ylab="",
col.regions=scalebluered)
and here is what it looks like.. I don't like the double lines on the edges..
EDIT: here is a reproducible example:
data(mtcars)
cars.matrix <- as.matrix(mtcars[c(2:8)])
cars.corr <- cor(cars.matrix)
levelplot(cars.corr, border="black", colorkey=list(height=.25, space="right",
at=seq(-1, 1, .25), cuts=7),
scales=list(y=(list(cex=1)), tck = c(1,0), x=list(cex=1, rot=90)),
xlab="", ylab="")
OK, the fix for this is simple if a bit obscure.
Just use lattice.options() to reset the value of axis.padding used for factors, changing it from its default value of 0.6 (a little padding) to 0.5 (no padding), and you should be fine:
lattice.options(axis.padding=list(factor=0.5))
## An example to show that this works
data(mtcars)
cars.matrix <- as.matrix(mtcars[c(2:8)])
cars.corr <- cor(cars.matrix)
levelplot(cars.corr, border="black", colorkey=list(height=.25, space="right",
at=seq(-1, 1, .25), cuts=7),
scales=list(y=(list(cex=1)), tck = c(1,0), x=list(cex=1, rot=90)),
xlab="", ylab="")
For possibly-useful-future-reference, I figured this out by taking a quick look at the code used by prepanel.default.levelplot(). (The various prepanel.*** functions are responsible, among other things, for determining the coordinates and minimal area that should be allocated to each panel so that the objects to be plotted into it will all fit nicely.)
head(prepanel.default.levelplot, 4)
1 function (x, y, subscripts, ...)
2 {
3 pad <- lattice.getOption("axis.padding")$numeric
4 if (length(subscripts) > 0) {
A bit of digging shows that a lot of the par commands may not make it to Lattice package graphics. For example, par(bty = 'n') won't work in this levelplot example.
Unlike base R graphs, lattice graphs are not effected by many of the options set in the par( ) function. To view the options that can be changed, look at help(xyplot). It is frequently easiest to set these options within the high level plotting functions ... you can write functions that modify the rendering of panels.
Try passing the axis color directly into the graphic ala the method suggested by Yangchen Lin here: R lattice 3d plot: ticks disappear when changing panel border thickness
axis.line = list(col='transparent')

filled.contour() in R: nonlinear key range

I am using filled.contour() to plot data stored in a matrix. The data is generated by a (highly) non-linear function, hence its distribution is not uniform at all and the range is very large.
Consequently, I have to use the option "levels" to fine tune the plot. However, filled.contour() does not use these custom levels to make an appropriate color key for the heat map, which I find quite surprising.
Here is a simple example of what I mean:
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
filled.contour(x=x,y=y,z=z,color.palette=colorRampPalette(c('green','yellow','red')),levels=c(1:60/3,30,50,150,250,1000,3000))
As you can see, the color key produced with the code above is pretty much useless. I would like to use some sort of projection (perhaps sin(x) or tanh(x)?), so that the upper range is not over-represented in the key (in a linear way).
At this point, I would like to:
1) know if there is something very simple/obvious I am missing, e.g.: an option to make this "key range adapting" automagically;
2) seek suggestions/help on how to do it myself, should the answer to 1) be negative.
Thanks a lot!
PS: I apologize for my English, which is far from perfect. Please let me know if you need me to clarify anything.
I feel your frustration. I never found a way to do this with filled contour, so have usually reverted to using image and then adding my own scale as a separate plot. I wrote the function image.scale to help out with this (link). Below is an example of how you can supply a log-transform to your scale in order to stretch out the small values - then label the scale with the non-log-transformed values as labels:
Example:
source("image.scale.R") # http://menugget.blogspot.de/2011/08/adding-scale-to-image-plot.html
x = c(20:200/100)
y = c(20:200/100)
z = as.matrix(exp(x^2)) %*% exp(y^2)
pal <- colorRampPalette(c('green','yellow','red'))
breaks <- c(1:60/3,30,50,150,250,1000,3000)
ncolors <- length(breaks)-1
labs <- c(0.5, 1, 3,30,50,150,250,1000,3000)
#x11(width=6, height=6)
layout(matrix(1:2, nrow=1, ncol=2), widths=c(5,1), heights=c(6))
layout.show(2)
par(mar=c(5,5,1,1))
image(x=x,y=y,z=log(z), col=pal(ncolors), breaks=log(breaks))
box()
par(mar=c(5,0,1,4))
image.scale(log(z), col=pal(ncolors), breaks=log(breaks), horiz=FALSE, xlab="", ylab="", xaxt="n", yaxt="n")
axis(4, at=log(labs), labels=labs)
box()
Result:

Histogram with Logarithmic Scale and custom breaks

I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:
hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))
This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.
Then I've tried doing:
mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")
It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.
A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.
As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:
plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)
gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.
Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.
Another option would be to use the package ggplot2.
ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.
Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:
buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)
The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.
I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.
Run the hist() function without making a graph, log-transform the counts, and then draw the figure.
hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)
It should look just like the regular histogram, but the y-axis will be log2 Frequency.
I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.
The original problem would be solved with:
myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")
The function:
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.
Here's a pretty ggplot2 solution:
library(ggplot2)
library(scales) # makes pretty labels on the x-axis
breaks=c(0,1,2,3,4,5,25)
ggplot(mydata,aes(x = V3)) +
geom_histogram(breaks = log10(breaks)) +
scale_x_log10(
breaks = breaks,
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10

Resources