Problems with relative frequency in R

Problems with relative frequency in R - r

I want to plot a relative frequency histogram in R, of this data:
0.1575850
0.1378830 0.1462112 0.1303224 0.3538677 0.2497142 0.2359662 0.1647894 0.1861195
0.3957871 0.2135463 0.1584121 0.1690736 0.4232640 0.2058885 0.1615527 0.3250968
0.1529143 0.5984977 0.2334365 0.2141899 0.1495538
I want to use seq(0,1,0.2) for the argument "breaks", and set freq=FALSE to get the DENSITY (not the counts) plot. Based on what the hist function help states, I would expect that the total area of the relative frequency histogram (or the sum of $density) would be equal to one, but instead I'm getting this:
cc$density
[1] 2.5000000 2.0454545 0.4545455 0.0000000 0.0000000
Any suggestions of what could be happening? I tried to use the histogram function of {lattice}, and the histogram seems fine, but I couldn't change the size of the label and axis test using the regular aruments (cex.lab and cex.axis).
Thanks for your help and time.

Hint: sum(cc$density) == 5, and 5 * 0.2 == 1. (You can stop reading here, or...)
To calculate an area under the bar plot curve, you have to multiply the height of each bar (which is what cc$density gives you) by width of each bar, which is 0.2 in your case.

Related

Strange x-axis in histogram for discrete variable in R

I have some discrete data that I'm trying to plot in a histogram in R. I'm using the built in hist() function, which works fine most of the times for the data I have. However, when it comes to a discrete variable it looks somewhat strange (unfortunately I cannot add the picture). I interpret it as "since the bin for 0 and 1 children must fit between 0 and 1 it determines the width of all bins and thus the "from 1.5 to 2" result". How can I put the numbers on the x-axis centered underneath each bin instead?
Thanks in advance!

You might want to consider drawing the axis in a second step:
Prevent x-axis with xaxt="n":
hist(cars$speed, xaxt="n")
Draw x-axis and adjust label position with hadj=:
axis(1, at=seq(0,25,5), hadj=2.5, labels = c("",seq(5,25,5)))

Log Log Plot - How to make sense of the axis

I am having a small issue making sense of a log-log plot. So if I create an x-y plot by the following:
xx <- exp(1:10)
yy <- exp(1:10)
plot(xx,yy)
The highest value is 22026.47. When I then plot it on as a log-log plot (this is purely a basic example), as per below
plot(xx,yy, log="yx")
the highest co-ordinate is over 5000. Can someone point me in the right direction to interpret this? For example how can I get the value by which 22026.47 is transformed.

I am not entirely sure what you are asking in regards to "the value by which 22026.47 is transformed". You can simply take the log of whatever value to get it, if that is what you are asking. Unsurprisingly:
log(22026.47)
#[1] 10
Anyway, perhaps some confusion stems from the fact that the log="xy" argument to plot plots your data on a log scale but with ticks marks and labels on the original scale. You say the highest coordinates is over 5000, but 22026.47 is over 5000 so that fits well. The two are just close on a log-scale; just as close as 2.72 and 7.39 corresponding to xx[1:2].
Compare your log-log plot with the result of
plot(log(xx), log(yy))
Here you are plotting the actual log-values of your data, and that is also reflected in your x-axis and y-axis labels.

R histogram with numbers under bars

I had some problems while trying to plot a histogram to show the frequency of every value while plotting the value as well. For example, suppose I use the following code:
x <- sample(1:10,1000,replace=T)
hist(x,label=TRUE)
The result is a plot with labels over the bar, but merging the frequencies of 1 and 2 in a single bar.
Apart from separate this bar in two others for 1 and 2, I also need to put the values under each bar.
For example, with the code above I would have the number 10 under the tick at the right margin of its bar, and I needed to plot the values right under the bars.
Is there any way to do both in a single histogram with hist function?
Thanks in advance!

Calling hist silently returns information you can use to modify the plot. You can pull out the midpoints and the heights and use that information to put the labels where you want them. You can use the pos argument in text to specify where the label should be in relation to the point (thanks #rawr)
x <- sample(1:10,1000,replace=T)
## Histogram
info <- hist(x, breaks = 0:10)
with(info, text(mids, counts, labels=counts, pos=1))

Simple histogram plot wrong?

test <- rep(5,20)
hist(test,freq=FALSE,breaks=5)
The vector contains 20 times the value 5. When I plot this with freq=FALSE and breaks=5 I expect to see 1 bar at x=5 with height = 1.0, because the value 5 makes up 100% of the data.
Why do I instead see 1 bar that ranges from x=0 to x=5 and has height = 0.2 ??

hist plots an estimate of the probability density when freq=FALSE or prob=TRUE, so the total area of the bars in the histogram sums to 1. Since the horizontal range of the single bar that is plotted is (0,5), it follows that the height must be 0.2 (5*0.2=1)
If you really want the histogram you were expecting (i.e. heights correspond to fraction of counts, areas don't necessarily sum to 1), you can do this:
h <- hist(test,plot=FALSE)
h$counts <- h$counts/length(test)
plot(h)
Another possibility is to force the bar widths to be equal to 1.0, e.g.
hist(test,freq=FALSE,breaks=0:10)
Or maybe you want
plot(table(test)/length(test))
or
plot(table(test)/length(test),lwd=10,lend="butt")
?
See also: How do you use hist to plot relative frequencies in R?

Plotting histograms with R; y axis keeps changing to frequency from proportion/probability

I try to overlay two histograms in the same plane but the option Probability=TRUE (relative frequencies) in hist() is not effective with the code below. It is a problem because the two samples have very different sizes (length(cl1)=9 and length(cl2)=339) and, with this script, I cannot vizualize differences between both histograms because each shows frequencies. How can I overlap two histograms with the same bin width, showing relative frequencies?
c1<-hist(dataList[["cl1"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(1,0,0,1/4),main=paste(paramlab,"Group",groupnum,"cl1",sep=" "),xlab="",probability=TRUE)
c2<-hist(dataList[["cl2"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(0,0,1,1/4),main=paste(paramlab,"Group",groupnum,"cl2",sep=" "),xlab="",probability=TRUE)
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="")# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T)
cl1Col <- rgb(1,0,0,1/4)
cl2Col <- rgb(0,0,1,1/4)
legend('topright',c('Cl1','Cl2'),
fill = c(cl1Col , cl2Col ), bty = 'n',
border = NA)
Thanks in advance for your help!

When you call plot on an object of class histogram (like c1), it calls the S3 method for the histogram. Namely, plot.histogram. You can see the code for this function if you type graphics:::plot.histogram and you can see its help under ?plot.histogram. The help file for that function states:
freq logical; if TRUE, the histogram graphic is to present a
representation of frequencies, i.e, x$counts; if FALSE, relative
frequencies (probabilities), i.e., x$density, are plotted. The default
is true for equidistant breaks and false otherwise.
So, when plot renders a histogram it doesn't use the previously specified probability or freq arguments, it tries to figure it out for itself. The reason for this is obvious if you dig around inside c1, it contains all of the data necessarily for the plot, but does not specify how it should be rendered.
So, the solution is to reiterate the argument freq=FALSE when you run the plot functions. Notably, freq=FALSE works whereas probability=TRUE does not because plot.histogram does not have a probability option. So, your plot code will be:
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="",freq=FALSE)# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T, freq=FALSE)
This all seems like a oversight/idiosyncratic decision (or lack thereof) on the part of the R devs. To their credit it is appropriately documented and is not "unexpected behavior" (although I certainly didn't expect it). I wonder where such oddness should be reported, if it should be reported at all.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Problems with relative frequency in R - r

Hint: sum(cc$density) == 5, and 5 * 0.2 == 1. (You can stop reading here, or...) To calculate an area under the bar plot curve, you have to multiply the height of each bar (which is what cc$density gives you) by width of each bar, which is 0.2 in your case.

Related

Strange x-axis in histogram for discrete variable in R

Log Log Plot - How to make sense of the axis

R histogram with numbers under bars

Simple histogram plot wrong?

Plotting histograms with R; y axis keeps changing to frequency from proportion/probability

Categories

Resources