I need to plot a vector of numbers. Let's say these numbers range from 0 to 1000. I need to make a histogram where the x axis goes from 100 to 500, and I want to specify the number of bins to be 10. How do I do this?
I know how to use xlim and break separately, but I don't know how to make a given number of bins inside the custom range.
This is a very good question actually! I was bothered by this all the time but finally your question has kicked me to finally solve it :-)
Well, in this case we cannot simply do hist(x, xlim = c(100, 500), breaks = 9), as the breaks refer to the whole range of x, not related to xlim (in other words, xlim is used only for plotting, not for computing the histogram and setting the actual breaks). This is a clear flaw of the hist function and there is no simple remedy found in the documentation.
I think the easiest way out is to "xlim" the values before they go to the hist function:
x <- runif(1000, 0, 1000) # example data
hist(x[x > 100 & x < 500], breaks = 9)
breaks should be number of cells minus one.
For more info see ?hist
Related
I'd like to plot a dataset that consists of two vectors of length 100. The mean difference of the vectors being high and the variance of each of them being considerably smaller, it is quite difficult to plot both vectors and still be able to see the variation within each vector.
What I'd like to be able to manually set the breaks so that we could both see the difference between the vectors and within them.
Consider this data set
a=rnorm(100,sd=0.005)+1
b=rnorm(100,sd=0.005)+10
vec = c(a,b)
Neither plot(vec) nor plot(vec,log="y") gives satisfying results, as it is not possible to distinguish the variation within the vector (see picture).
I'd like the breaks on the y-axis to be (min(a), max(a), 5, min(b), max(b)) (and get equal distance between them). How could one achieve that?
Depending on exactly what you are trying to do, a simple transformation of the data in each part of the vector might be enough:
vec2 <- c( (a - min(a))/ (max(a)-min(a)) , 3 + (b - min(b))/ (max(b)-min(b)) )
plot(vec2, axes=F)
box()
axis(1)
axis(2, at=c(0,1,2,3,4), labels = round(c(min(a), max(a), 5, min(b), max(b)),2))
Alternative approaches might be a custom transformation in ggplot, a secondary axis in ggplot, breaking the graph into facets, or using ggbreak.
I have a really skewed data and I am want to set my histogram's last bin to include a threshold number to infinity so that my histogram will be not skewed. I know we can set xlim or coord_cartisian to zooming but I want to keep all the data.
x=data.frame(100*rbeta(10000,2,50))
ggplot(data=x,aes(x))+geom_histogram(bins=20)+scale_x_continuous(breaks =seq(1,100,by=5))
The accepted answer will get a little ugly if the aggreagted bin gets too big. You can map the values:
x <- mapvalues(x,
from = c(aggBinLow:aggBinHigh),
to = c(rep.int(aggBinLow,aggBinHigh-aggBinLow+1)))
and add a scale with distinct values:
g +
scale_x_continuous(breaks=min:aggBinLow,labels=c(sprintf("%s",min:aggBinLow-1),">aggBinLow-1"))
Use geom_histogram(breaks=c(...)) to set customised bins, where c(...) is the vector of values you want. For example c(seq(from=1,to=11,by=1),100000)
I am creating a plot in R and I dont like the x axis values being plotted by R.
For example:
x <- seq(10,200,10)
y <- runif(x)
plot(x,y)
This plots a graph with the following values on the X axis:
50, 100, 150, 200
However, I want to plot the 20 values 10,20, 30 ... 200 stored in variable x, as the X axis values. I have scoured through countless blogs and the terse manual - after hours of searching, the closest I've come to finding anything useful is the following (summarized) instructions:
call plot() or par(), specifying argument xaxt='n'
call axis() e.g. axis(side = 1, at = seq(0, 10, by = 0.1), labels = FALSE, tcl = -0.2)
I tried it and the resulting plot had no x axis values at all. Is it possible that someone out there knows how to do this? I can't believe that no one has ever tried to do this before.
You'll find the answer to your question in the help page for ?axis.
Here is one of the help page examples, modified with your data:
Option 1: use xaxp to define the axis labels
plot(x,y, xaxt="n")
axis(1, xaxp=c(10, 200, 19), las=2)
Option 2: Use at and seq() to define the labels:
plot(x,y, xaxt="n")
axis(1, at = seq(10, 200, by = 10), las=2)
Both these options yield the same graphic:
PS. Since you have a large number of labels, you'll have to use additional arguments to get the text to fit in the plot. I use las to rotate the labels.
Take a closer look at the ?axis documentation. If you look at the description of the labels argument, you'll see that it is:
"a logical value specifying whether (numerical) annotations are
to be made at the tickmarks,"
So, just change it to true, and you'll get your tick labels.
x <- seq(10,200,10)
y <- runif(x)
plot(x,y,xaxt='n')
axis(side = 1, at = x,labels = T)
# Since TRUE is the default for labels, you can just use axis(side=1,at=x)
Be careful that if you don't stretch your window width, then R might not be able to write all your labels in. Play with the window width and you'll see what I mean.
It's too bad that you had such trouble finding documentation! What were your search terms? Try typing r axis into Google, and the first link you will get is that Quick R page that I mentioned earlier. Scroll down to "Axes", and you'll get a very nice little guide on how to do it. You should probably check there first for any plotting questions, it will be faster than waiting for a SO reply.
Hope this coding will helps you :)
plot(x,y,xaxt = 'n')
axis(side=1,at=c(1,20,30,50),labels=c("1975","1980","1985","1990"))
In case of plotting time series, the command ts.plot requires a different argument than xaxt="n"
require(graphics)
ts.plot(ldeaths, mdeaths, xlab="year", ylab="deaths", lty=c(1:2), gpars=list(xaxt="n"))
axis(1, at = seq(1974, 1980, by = 2))
I try to specify number of bins in hist() in R to be 10, as follows
> hist(x, breaks=10)
But the number of bins is not exactly 10. I try several with other numbers of bins, and same thing happen.
?hist says breaks can specify
a single number giving the number of cells for the histogram.
So I wonder what I can do now? Thanks!
You can always create custom breakpoints
x = rnorm(500)
brks = seq(-3,3,0.1)
hist(x, breaks = brks)
Tim wrote in comments:
The problem with that is I specified brks = seq(min(x),max(x),length.out=500), but hist(x, breaks = brks) complained that some entries of x wouldn't be included in the histogram
I had the same problem. I suspect this happens because the value on the border of range is not counted. I have 2 solutions but non satisfies me in 100%.
Solution 1.
When making the sequence, set minimum a little bit lower and maximum a little bit higher.
brks = seq(min(x)*.99999,max(x)*1.00001,length.out=500)
Solution 2. Instead of hist() use a combination of cut() and barplot(). The plot looks almost the same as hist, but doesn't produce a data frame like hist().
barplot(summary(cut(data, 10)), space=0)
I am creating a plot in R and I dont like the x axis values being plotted by R.
For example:
x <- seq(10,200,10)
y <- runif(x)
plot(x,y)
This plots a graph with the following values on the X axis:
50, 100, 150, 200
However, I want to plot the 20 values 10,20, 30 ... 200 stored in variable x, as the X axis values. I have scoured through countless blogs and the terse manual - after hours of searching, the closest I've come to finding anything useful is the following (summarized) instructions:
call plot() or par(), specifying argument xaxt='n'
call axis() e.g. axis(side = 1, at = seq(0, 10, by = 0.1), labels = FALSE, tcl = -0.2)
I tried it and the resulting plot had no x axis values at all. Is it possible that someone out there knows how to do this? I can't believe that no one has ever tried to do this before.
You'll find the answer to your question in the help page for ?axis.
Here is one of the help page examples, modified with your data:
Option 1: use xaxp to define the axis labels
plot(x,y, xaxt="n")
axis(1, xaxp=c(10, 200, 19), las=2)
Option 2: Use at and seq() to define the labels:
plot(x,y, xaxt="n")
axis(1, at = seq(10, 200, by = 10), las=2)
Both these options yield the same graphic:
PS. Since you have a large number of labels, you'll have to use additional arguments to get the text to fit in the plot. I use las to rotate the labels.
Take a closer look at the ?axis documentation. If you look at the description of the labels argument, you'll see that it is:
"a logical value specifying whether (numerical) annotations are
to be made at the tickmarks,"
So, just change it to true, and you'll get your tick labels.
x <- seq(10,200,10)
y <- runif(x)
plot(x,y,xaxt='n')
axis(side = 1, at = x,labels = T)
# Since TRUE is the default for labels, you can just use axis(side=1,at=x)
Be careful that if you don't stretch your window width, then R might not be able to write all your labels in. Play with the window width and you'll see what I mean.
It's too bad that you had such trouble finding documentation! What were your search terms? Try typing r axis into Google, and the first link you will get is that Quick R page that I mentioned earlier. Scroll down to "Axes", and you'll get a very nice little guide on how to do it. You should probably check there first for any plotting questions, it will be faster than waiting for a SO reply.
Hope this coding will helps you :)
plot(x,y,xaxt = 'n')
axis(side=1,at=c(1,20,30,50),labels=c("1975","1980","1985","1990"))
In case of plotting time series, the command ts.plot requires a different argument than xaxt="n"
require(graphics)
ts.plot(ldeaths, mdeaths, xlab="year", ylab="deaths", lty=c(1:2), gpars=list(xaxt="n"))
axis(1, at = seq(1974, 1980, by = 2))