While plotting histogarm, scatterplots and other plots with axes scaled to logarithmic scale in R, how is it possible to use labels such as 10^-1 10^0 10^1 10^2 10^3 and so on instead of the axes showing just -1, 0, 1, 2, 3 etc. What parameters should be added to the commands such as hist(), plot() etc?
Apart from the solution of ggplot2 (see gsk3's comment), I would like to add that this happens automatically in plot() as well when using the correct arguments, eg :
x <- 1:10
y <- exp(1:10)
plot(x,y,log="y")
You can use the parameter log="x" for the X axis, or log="xy" for both.
If you want to format the numbers, or you have the data in log format, you can do a workaround using axis(). Some interesting functions :
axTicks(x) gives you the location of the ticks on the X-axis (x=1) or Y-axis (x=2)
bquote() converts expressions to language, but can replace a variable with its value. More information on bquote() in the question Latex and variables in plot label in R? .
as.expression() makes the language object coming from bquote() an expression. This allows axis() to do the formatting as explained in ?plotmath. It can't do so with language objects.
An example for nice formatting :
x <- y <- 1:10
plot(x,y,yaxt="n")
aty <- axTicks(2)
labels <- sapply(aty,function(i)
as.expression(bquote(10^ .(i)))
)
axis(2,at=aty,labels=labels)
Which gives
Here is a different way to draw this type of axis:
plot(NA, xlim=c(0,10), ylim=c(1, 10^4), xlab="x", ylab="y", log="y", yaxt="n")
at.y <- outer(1:9, 10^(0:4))
lab.y <- ifelse(log10(at.y) %% 1 == 0, at.y, NA)
axis(2, at=at.y, labels=lab.y, las=1)
EDIT: This is also solved in latticeExtra with scale.components
In ggplot2 you just can add a
... +
scale_x_log10() +
scale_y_log10(limits = c(1e-4,1), breaks=c(1e-4,1e-3,1e-2,0.1,1)) + ...
to scale your axis, Label them and add custom breaks.
Related
I would like to represent two-dimensional data as bars, placed over the x-axis values, but barplot() does not allow to control x-axis placement, and plot() does not draw bars:
x <- c(1, 2, 3, 5)
y <- 1:4
plot(x, y, type = "h")
barplot(y)
Click for an image illustrating the plot() and barplot() examples.
I understand that I can plot a histogram –
hist(rep(x, y), breaks = seq(min(x) - 0.5, max(x) + 0.5, 1))
Click for an image illustrating the hist() example.
– but the recreation of the original (non-frequency) data and the calculation of the breaks is not always as straightforward as in this example, so:
Is there a way to force plot() to draw bars?
Or is there a way to force barplot() to place the bars at specific values on the x-axis?
Basically, what I would like is something like:
barplot(y, at = x)
I would prefer to use base R and avoid ggplot.
While I agree with #Dave2e that a barplot may not be the best way to represent your data, you can get something like what you are describing by starting with a blank plot and drawing the relevant rectangles. I am using your y values (1:4) and the x values that you mentioned in your comment. I am not sure what you want on the x-axis, but I show labels for the x-values that you give. In order to look like a barplot, I suppress the tick marks on the x-axis.
plot(NULL, xlim=c(0,11), ylim=c(0,4.5), bty="n",
xaxt="n", xaxs="i", yaxs="i", xlab="", ylab="")
rect(x-0.5, 0, x+0.5, y, col="gray")
axis(side=1, at=x, col.ticks=NA)
I'm trying to create a stripplot in which the name of my parameters on the x axis are greek letters. It is quite logical then that I want my plot to be label with greek letters and not "theta", "rho" and "tau" subscripts. My strategy is to hide the original labels and plot the greek letters on top. The following example shows what I have tried so far:
library(lattice)
data <- data.frame(Parameters=c("theta","rho","tau"),val1=c(1,2,4),val2=c(2,3,4))
png("plot.png")
stripplot(val1 + val2 ~ Parameters, data = data, pch=c(1,2), cex=2,
scales=list(cex=c(0,1.5)),
xlab=c(expression(rho),expression(tau),expression(theta)),
ylab=NULL,
xaxt='n',
)
dev.off()
But this piece of code weirdly eliminates the labels in the y axis too.
Try 1:
I also followed the advice in How to hide x-axis in lattice R but the results preserves the names in the x-axis.
Try 2:
How can I eliminate ONLY the labels on the x-axis without altering the y axis?
Have a look at the help page for ?stripplot, in particular the
scales argument.
Generally a list determining how the x- and y-axes (tick marks and labels) are drawn.
So use this argument to pass the size and labels argument for the x-axis.
stripplot(val1 + val2 ~ Parameters, data = dat, pch=c(1,2), cex=2,
scales=list(x=list(cex=1.5,
labels=c(expression(rho),expression(tau),expression(theta))
)))
This is clunky but would help if you had many x-axis elements.
labels=as.expression(sapply(levels(dat$Parameters), function(x) as.name(x)))
or
labels=parse(text=as.character(sort(dat$Parameters)))
library(lattice)
data <- data.frame(Parameters=c("theta","rho","tau"),val1=c(1,2,4),val2=c(2,3,4))
png("plot.png")
stripplot(val1 + val2 ~ Parameters, data = data, pch=c(1,2), cex=2,
scales=list(cex=c(0,1.5)),
xlab=c(expression(rho),expression(tau),expression(theta)),
xaxt='n',
)
dev.off()
I regularly do all kinds of scatter plots in R using the plot command.
Sometimes both, sometimes only one of the plot axes is labelled in scientific notation. I do not understand when R makes the decision to switch to scientific notation. Surprisingly, it often prints numbers which no sane human would write in scientific notation when labelling a plot, for example it labels 5 as 5e+00. Let's say you have a log-axis going up to 1000, scientific notation is unjustified with such "small" numbers.
I would like to suppress that behaviour, I always want R to display integer values. Is this possible?
I tried options(scipen=10) but then it starts writing 5.0 instead of 5, while on the other axis 5 is still 5 etc. How can I have pure integer values in my R plots?
I am using R 2.12.1 on Windows 7.
Use options(scipen=5) or some other high enough number. The scipen option determines how likely R is to switch to scientific notation, the higher the value the less likely it is to switch. Set the option before making your plot, if it still has scientific notation, set it to a higher number.
You can use format or formatC to, ahem, format your axis labels.
For whole numbers, try
x <- 10 ^ (1:10)
format(x, scientific = FALSE)
formatC(x, digits = 0, format = "f")
If the numbers are convertable to actual integers (i.e., not too big), you can also use
formatC(x, format = "d")
How you get the labels onto your axis depends upon the plotting system that you are using.
Try this. I purposely broke out various parts so you can move things around.
library(sfsmisc)
#Generate the data
x <- 1:100000
y <- 1:100000
#Setup the plot area
par(pty="m", plt=c(0.1, 1, 0.1, 1), omd=c(0.1,0.9,0.1,0.9))
#Plot a blank graph without completing the x or y axis
plot(x, y, type = "n", xaxt = "n", yaxt="n", xlab="", ylab="", log = "x", col="blue")
mtext(side=3, text="Test Plot", line=1.2, cex=1.5)
#Complete the x axis
eaxis(1, padj=-0.5, cex.axis=0.8)
mtext(side=1, text="x", line=2.5)
#Complete the y axis and add the grid
aty <- seq(par("yaxp")[1], par("yaxp")[2], (par("yaxp")[2] - par("yaxp")[1])/par("yaxp")[3])
axis(2, at=aty, labels=format(aty, scientific=FALSE), hadj=0.9, cex.axis=0.8, las=2)
mtext(side=2, text="y", line=4.5)
grid()
#Add the line last so it will be on top of the grid
lines(x, y, col="blue")
You can use the axis() command for that, eg :
x <- 1:100000
y <- 1:100000
marks <- c(0,20000,40000,60000,80000,100000)
plot(x,y,log="x",yaxt="n",type="l")
axis(2,at=marks,labels=marks)
gives :
EDIT : if you want to have all of them in the same format, you can use the solution of #Richie to get them :
x <- 1:100000
y <- 1:100000
format(y,scientific=FALSE)
plot(x,y,log="x",yaxt="n",type="l")
axis(2,at=marks,labels=format(marks,scientific=FALSE))
You could try lattice:
require(lattice)
x <- 1:100000
y <- 1:100000
xyplot(y~x, scales=list(x = list(log = 10)), type="l")
The R graphics package has the function axTicks that returns the tick locations of the ticks that the axis and plot functions would set automatically. The other answers given to this question define the tick locations manually which might not be convenient in some situations.
myTicks = axTicks(1)
axis(1, at = myTicks, labels = formatC(myTicks, format = 'd'))
A minimal example would be
plot(10^(0:10), 0:10, log = 'x', xaxt = 'n')
myTicks = axTicks(1)
axis(1, at = myTicks, labels = formatC(myTicks, format = 'd'))
There is also an log parameter in the axTicks function but in this situation it does not need to be set to get the proper logarithmic axis tick location.
Normally setting axis limit # max of your variable is enough
a <- c(0:1000000)
b <- c(0:1000000)
plot(a, b, ylim = c(0, max(b)))
I regularly do all kinds of scatter plots in R using the plot command.
Sometimes both, sometimes only one of the plot axes is labelled in scientific notation. I do not understand when R makes the decision to switch to scientific notation. Surprisingly, it often prints numbers which no sane human would write in scientific notation when labelling a plot, for example it labels 5 as 5e+00. Let's say you have a log-axis going up to 1000, scientific notation is unjustified with such "small" numbers.
I would like to suppress that behaviour, I always want R to display integer values. Is this possible?
I tried options(scipen=10) but then it starts writing 5.0 instead of 5, while on the other axis 5 is still 5 etc. How can I have pure integer values in my R plots?
I am using R 2.12.1 on Windows 7.
Use options(scipen=5) or some other high enough number. The scipen option determines how likely R is to switch to scientific notation, the higher the value the less likely it is to switch. Set the option before making your plot, if it still has scientific notation, set it to a higher number.
You can use format or formatC to, ahem, format your axis labels.
For whole numbers, try
x <- 10 ^ (1:10)
format(x, scientific = FALSE)
formatC(x, digits = 0, format = "f")
If the numbers are convertable to actual integers (i.e., not too big), you can also use
formatC(x, format = "d")
How you get the labels onto your axis depends upon the plotting system that you are using.
Try this. I purposely broke out various parts so you can move things around.
library(sfsmisc)
#Generate the data
x <- 1:100000
y <- 1:100000
#Setup the plot area
par(pty="m", plt=c(0.1, 1, 0.1, 1), omd=c(0.1,0.9,0.1,0.9))
#Plot a blank graph without completing the x or y axis
plot(x, y, type = "n", xaxt = "n", yaxt="n", xlab="", ylab="", log = "x", col="blue")
mtext(side=3, text="Test Plot", line=1.2, cex=1.5)
#Complete the x axis
eaxis(1, padj=-0.5, cex.axis=0.8)
mtext(side=1, text="x", line=2.5)
#Complete the y axis and add the grid
aty <- seq(par("yaxp")[1], par("yaxp")[2], (par("yaxp")[2] - par("yaxp")[1])/par("yaxp")[3])
axis(2, at=aty, labels=format(aty, scientific=FALSE), hadj=0.9, cex.axis=0.8, las=2)
mtext(side=2, text="y", line=4.5)
grid()
#Add the line last so it will be on top of the grid
lines(x, y, col="blue")
You can use the axis() command for that, eg :
x <- 1:100000
y <- 1:100000
marks <- c(0,20000,40000,60000,80000,100000)
plot(x,y,log="x",yaxt="n",type="l")
axis(2,at=marks,labels=marks)
gives :
EDIT : if you want to have all of them in the same format, you can use the solution of #Richie to get them :
x <- 1:100000
y <- 1:100000
format(y,scientific=FALSE)
plot(x,y,log="x",yaxt="n",type="l")
axis(2,at=marks,labels=format(marks,scientific=FALSE))
You could try lattice:
require(lattice)
x <- 1:100000
y <- 1:100000
xyplot(y~x, scales=list(x = list(log = 10)), type="l")
The R graphics package has the function axTicks that returns the tick locations of the ticks that the axis and plot functions would set automatically. The other answers given to this question define the tick locations manually which might not be convenient in some situations.
myTicks = axTicks(1)
axis(1, at = myTicks, labels = formatC(myTicks, format = 'd'))
A minimal example would be
plot(10^(0:10), 0:10, log = 'x', xaxt = 'n')
myTicks = axTicks(1)
axis(1, at = myTicks, labels = formatC(myTicks, format = 'd'))
There is also an log parameter in the axTicks function but in this situation it does not need to be set to get the proper logarithmic axis tick location.
Normally setting axis limit # max of your variable is enough
a <- c(0:1000000)
b <- c(0:1000000)
plot(a, b, ylim = c(0, max(b)))
I'm trying to generate a histogram in R with a logarithmic scale for y. Currently I do:
hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))
This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.
Then I've tried doing:
mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")
It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. It's also showing the data as points rather than bars. barplot works but then I don't get any bottom axis.
A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.
As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:
plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)
gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.
Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.
Another option would be to use the package ggplot2.
ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log10()
It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.
Dirk's answer is a great one. If you want an appearance like what hist produces, you can also try this:
buckets <- c(0,1,2,3,4,5,25)
mydata_hist <- hist(mydata$V3, breaks=buckets, plot=FALSE)
bp <- barplot(mydata_hist$count, log="y", col="white", names.arg=buckets)
text(bp, mydata_hist$counts, labels=mydata_hist$counts, pos=1)
The last line is optional, it adds value labels just under the top of each bar. This can be useful for log scale graphs, but can also be omitted.
I also pass main, xlab, and ylab parameters to provide a plot title, x-axis label, and y-axis label.
Run the hist() function without making a graph, log-transform the counts, and then draw the figure.
hist.data = hist(my.data, plot=F)
hist.data$counts = log(hist.data$counts, 2)
plot(hist.data)
It should look just like the regular histogram, but the y-axis will be log2 Frequency.
I've put together a function that behaves identically to hist in the default case, but accepts the log argument. It uses several tricks from other posters, but adds a few of its own. hist(x) and myhist(x) look identical.
The original problem would be solved with:
myhist(mydata$V3, breaks=c(0,1,2,3,4,5,25), log="xy")
The function:
myhist <- function(x, ..., breaks="Sturges",
main = paste("Histogram of", xname),
xlab = xname,
ylab = "Frequency") {
xname = paste(deparse(substitute(x), 500), collapse="\n")
h = hist(x, breaks=breaks, plot=FALSE)
plot(h$breaks, c(NA,h$counts), type='S', main=main,
xlab=xlab, ylab=ylab, axes=FALSE, ...)
axis(1)
axis(2)
lines(h$breaks, c(h$counts,NA), type='s')
lines(h$breaks, c(NA,h$counts), type='h')
lines(h$breaks, c(h$counts,NA), type='h')
lines(h$breaks, rep(0,length(h$breaks)), type='S')
invisible(h)
}
Exercise for the reader: Unfortunately, not everything that works with hist works with myhist as it stands. That should be fixable with a bit more effort, though.
Here's a pretty ggplot2 solution:
library(ggplot2)
library(scales) # makes pretty labels on the x-axis
breaks=c(0,1,2,3,4,5,25)
ggplot(mydata,aes(x = V3)) +
geom_histogram(breaks = log10(breaks)) +
scale_x_log10(
breaks = breaks,
labels = scales::trans_format("log10", scales::math_format(10^.x))
)
Note that to set the breaks in geom_histogram, they had to be transformed to work with scale_x_log10