Related
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
I´m working with R scatterplot3D and I need to use expression() in labels because I have to use some Greek letters;
my question is: is there a way to pull the y.lab name down or write it along the axis (in a diagonal position)? I went to help and packages description but nothing seems to work;
thanks in advance for any help
Maria
library(scatterplot3d)
par(mfrow=c(1,1))
A <- c(3,2,3,3,2)
B <- c(2,4,5,3,4)
D <- c(4,3,4,2,3)
scatterplot3d(A,D,B, xlab=expression(paste(x[a],"-",x[b])),
ylab=expression(x[a]),
zlab=expression(sigma^2))
You can't use any of the classic ways due to the way the scatterplot3d() function constructs the plot. It's basically plotted on top of a classic plot pane, which means the axis labels are bound to the classic positions. The z-label is printed at the real left Y-axis, and the y label is printed at the real right Y-axis.
You can use text() to get around this:
use par("usr") to get the limits of the X and Y coordinates
calculate the position you want the label on (at 90% of the horizontal position and 8% of the vertical position for example.)
use text() to place it (and possibly the parameter srt to turn the label)
This makes it a bit more generic, so you don't have try different values for every new plot you make.
Example :
scatterplot3d(A,D,B, xlab=expression(paste(x[a],"-",x[b])),
ylab="",
zlab=expression(sigma^2))
dims <- par("usr")
x <- dims[1]+ 0.9*diff(dims[1:2])
y <- dims[3]+ 0.08*diff(dims[3:4])
text(x,y,expression(x[a]),srt=45)
Gives
scatterplot3d(A,D,B, xlab=expression(paste(x[a],"-",x[b])),
ylab="",
zlab=expression(sigma^2))
mtext( expression(x[a]), side=4,las=2,padj=18, line=-4)
One does need to use fairly extreme parameter values to get the expression in the right place in that transformed spatial projection.
I have created my bar plot in R and am moving on from using two-bar plots to three-bars.
I want to label the bars '1','2' and '3'along the x-axis
I understand how to use the axis function, and also that the 'at' function requires a numeric value or vector, it's just thatI don't know exactly what value it is that it requires!
Thanks very much in advance
Great news! You can use whatever values you like!
mat <- matrix(c(14,9,7))
barplot(mat[ ,1])
axis(1, at = c(.5,1.5,3.5), labels = 1:3, tick = FALSE, las = 2)
I want to plot a barplot of some data with some x-axis labels but so far I just keep running into the same problem, as the axis scaling is completely off limits and therefore my labels are wrongly positioned below the bars.
The most simple example I can think of:
x = c(1:81)
barplot(x)
axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
As you can see, the x-axis does not stretch along the whole plot but stops somewhere in between. It seems to me as if the problem is quite simple, but I somehow I am not able to fix it and I could not find any solution so far :(
Any help is greatly appreciated.
The problem is that barplot is really designed for plotting categorical, not numeric data, and as such it pretty much does its own thing in terms of setting up the horizontal axis scale. The main way to get around this is to recover the actual x-positions of the bar midpoints by saving the results of barplot to a variable, but as you can see below I haven't come up with an elegant way of doing what you want in base graphics. Maybe someone else can do better.
x = c(1:81)
b <- barplot(x)
## axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
head(b)
You can see here that the actual midpoint locations are 0.7, 1.9, 3.1, ... -- not 1, 2, 3 ...
This is pretty quick, if you don't want to extend the axis from 0 to 100:
b <- barplot(x)
axis(side=1,at=b[c(20,40,60,80)],labels=seq(20,80,by=20))
This is my best shot at doing it in base graphics:
b <- barplot(x,xlim=c(0,120))
bdiff <- diff(b)[1]
axis(side=1,at=c(b[1]-bdiff,b[c(20,40,60,80)],b[81]+19*bdiff),
labels=seq(0,100,by=20))
You can try this, but the bars aren't as pretty:
plot(x,type="h",lwd=4,col="gray",xlim=c(0,100))
Or in ggplot:
library(ggplot2)
d <- data.frame(x=1:81)
ggplot(d,aes(x=x,y=x))+geom_bar(stat="identity",fill="lightblue",
colour="gray")+xlim(c(0,100))
Most statistical graphics nerds will tell you that graphing quantitative (x,y) data is better done with points or lines rather than bars (non-data-ink, Tufte, blah blah blah :-) )
Not sure exactly what you wnat, but If it is to have the labels running from one end to the other evenly places (but not necessarily accurately), then:
x = c(1:81)
bp <- barplot(x)
axis(side=1,at=bp[1+c(0,20,40,60,80)],labels=c(20,40,60,80,100))
The puzzle for me was why you wanted to label "20" at 0. But this is one way to do it.
I run into the same annoying property of batplots - the x coordinates go wild. I would add one another way to show the problem, and that is adding more lines to the plot.
x = c(1:81)
barplot(x)
axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
lines(c(81,81), c(0, 100)) # this should cross the last bar, but it does not
The best I came with was to define a new barplot function that will take also the parameter "at" for plotting positions of the bars.
barplot_xscaled <- function(bar_heights, at = NA, width = 0.5, col = 'grey'){
if ( is.na(at) ){
at <- c(1:length(bar_heights))
}
plot(bar_heights, type="n", xlab="", ylab="",
ylim=c(0, max(bar_heights)), xlim=range(at), bty = 'n')
for ( i in 1:length(bar_heights)){
rect(at[i] - width, 0, at[i] + width, bar_heights[i], col = col)
}
}
barplot_xscaled(x)
lines(c(81, 81), c(0, 100))
The lines command crosses the last bar - the x scale works just as naively expected, but you could also now define whatever positions of the bars you would like (you could play more with the function a bit to have the same properties as other R plotting functions).
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this: