R getting break values from histogram - r

When you make a histogram and define the breaks argument, R uses some functions to generate those breaks. I want to obtain the range values for the breaks generated by the histogram such that if I made the following histogram
hist(df$foo, breaks = 5)
I want a list or data.frame that has the value ranges of the breaks:
list(c("1_lower"="<num>","1_upper"="<num2>","2_lower"="<num3>","2_upper"="<num4>"))
I hope this is possible. Any help is greatly appreciated.

According to the documentation ?hist - if you set h<-hist(...), then h$breaks will give you the breakpoints.

Related

Why the histogram look different using two different breaks argument in R?

I want to plot the distribution of the datasets using the histogram in R. I tried using different arguments (default, Freedman-Diaconis, and Scott) to get the best representation. I consider using a log scale later, but first I want to know the raw distribution without any scaling. However, the results look different, why is that? The dataset I use can be downloaded from here data or here data. The code I'm running are
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks = 200)
result is
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks = "Scott")
Result is
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks="Freedman-Diaconis")
result is
Please help. Thank you very much.
Histograms are very sensitive to the choice of cell break points. Even for the same (!) number of cells, the histogram can become considerably different by just a small shift of the cell borders. It is thus generally preferable to use kernel density estimators instead of histograms, because they do not depend on random cell border placement:
# increase n if you have a wide range of values
d <- density(as.matrix(deviation_all_genes_all_spots), n=512)
plot(d$x, d$y)
In your second and third call of hist, you ask for an automatic way to select the number of cells and the cell borders. Obviously, this results in more cells than in your first call with breaks=200. You can query the cells from the return value of hist, e.g.
h <- hist(as.matrix(deviation_all_genes_all_spots))
cat(srintf("number of cells = %i\n", length(h$mids))

R histogram with numbers under bars

I had some problems while trying to plot a histogram to show the frequency of every value while plotting the value as well. For example, suppose I use the following code:
x <- sample(1:10,1000,replace=T)
hist(x,label=TRUE)
The result is a plot with labels over the bar, but merging the frequencies of 1 and 2 in a single bar.
Apart from separate this bar in two others for 1 and 2, I also need to put the values under each bar.
For example, with the code above I would have the number 10 under the tick at the right margin of its bar, and I needed to plot the values right under the bars.
Is there any way to do both in a single histogram with hist function?
Thanks in advance!
Calling hist silently returns information you can use to modify the plot. You can pull out the midpoints and the heights and use that information to put the labels where you want them. You can use the pos argument in text to specify where the label should be in relation to the point (thanks #rawr)
x <- sample(1:10,1000,replace=T)
## Histogram
info <- hist(x, breaks = 0:10)
with(info, text(mids, counts, labels=counts, pos=1))

R: histogram plot, number of bins = number of classes

I am surprised that there seems to be no question about this problem. At least I haven't found any with an accurate answer.
Suppose the easy case of rolling two dices and adding the pips shown. Possible results range from 2 to 12. Now I want to plot the histogram for this event, i.e. one bin per possible number. That would make 11 bins (2,3,4,5...12)
# Example dataset: how often did we get "2","3", "4"(1x2, 3x3, 2x4, 4x5, 8x6, 14x7, ...)
Dice <- c(2,rep(3,3),rep(4,2),rep(5,4),rep(6,8),rep(7,14),rep(8,9),rep(9,5),rep(10,4),rep(11,1),rep(12,2))
hist(Dice,breaks=seq(2,12)) # custom breaks return 10 bins (9 breaks)
hist(Dice,breaks=11) # same for automatic breaks (and for breaks=12 or 13...)
What I need is a histogram plot with 11 bins - that is one bin per possible result. How can I trick R into doing this?
Thank you!
hist(Dice,breaks=seq(1.5,12.5))
This is not an histogram per se, but you could try this:
barplot(table(Dice))

breaks in histogram

In excel when we plot a histogram , we can define bins, and values that are greater than the bin values are shown as "more" in the histogram. Can we do similar kind of thing in R (using the base plotting system).
As Roman said you can use the cut function,
r<-cut(x,breaks=c(0,50,Inf),levels=c("lev1","lev2")
will partition the x into two levels. Then you can draw the histogram using the usual hist command.
Yes, but this means you have to pre-calculate the values yourself. You can use function cut to define breaks in your data. The result will be a factor (with bin names as indicators where the split was done). You can then merge factor levels and plot the result.

Odd axis label behaviour after setting xlim in pyramid.plot [plotrix]

I'm trying to make an "opposing stacked bar chart" and have found pyramid.plot from the plotrix package seems to do the job. (I appreciate ggplot2 will be the go-to solution for some of you, but I'm hoping to stick with base graphics on this one.)
Unfortunately it seems to do an odd thing with the x axis, when I try to set the limits to non integer values. If I let it define the limits automatically, they are integers and in my case that just leaves too much white space. But defining them as xlim=c(1.5,1.5) produces the odd result below.
If I understand correctly from the documentation, there is no way to pass on additional graphical parameters to e.g. suppress the axis and add it on later, or let alone define the tick points etc. Is there a way to make it more flexible?
Here is a minimal working example used to produce the plot below.
require(plotrix)
set.seed(42)
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5))
Just in case it is of interest to anyone else, I'm not doing a population pyramid, but rather attempting a stacked bar chart with some of the values negative. The code above includes a 'trick' I use to make it possible to have a different number of sets of bars on each side, namely adding empty columns to the matrix, hopefully someone will find that useful - so sorry the working example is not as minimal as it could have been!
Setting the x axis labels using laxlab and raxlab creates a continuous axis:
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5),
laxlab = seq(from = 0, to = 1.5, by = 0.5),
raxlab=seq(from = 0, to = 1.5, by = 0.5))

Resources