Strange x-axis in histogram for discrete variable in R - r

I have some discrete data that I'm trying to plot in a histogram in R. I'm using the built in hist() function, which works fine most of the times for the data I have. However, when it comes to a discrete variable it looks somewhat strange (unfortunately I cannot add the picture). I interpret it as "since the bin for 0 and 1 children must fit between 0 and 1 it determines the width of all bins and thus the "from 1.5 to 2" result". How can I put the numbers on the x-axis centered underneath each bin instead?
Thanks in advance!

You might want to consider drawing the axis in a second step:
Prevent x-axis with xaxt="n":
hist(cars$speed, xaxt="n")
Draw x-axis and adjust label position with hadj=:
axis(1, at=seq(0,25,5), hadj=2.5, labels = c("",seq(5,25,5)))

Related

How to add frequency & percentage on the same histogram in R?

Consider the following data set.
x <- c(2,2,2,4,4,4,4,5,5,7,7,8,8,9,10,10,1,1,0,2,3,3,5,6)
hist(x, nclass=10)
I want to have a histogram where the x-axis indicates the intervals & the y-axis on the left represents the frequency. In addition to this, I need another y-axis on the right side of the histogram representing the percentage of the intervals on the same plot. Even though the following graph is for two variables, more or less it looks like what I need (taken from Histogram of two variables in R).
Thanks in advance!
You can add a vertical axis to the right side with axis(4,at = at), where at are points at which tick-marks are to be drawn. If you want the density values of your histogram as tick-marks, call axis(4, at = hist(x,nclass=10)$density).

R: setting label spacing and position on a bar plot

I want to present percentages over a 24h period in 15 min intervals as a bar plot.
When I use barplot(), the labels for those timepoints are more or less randomly chosen by R (depending on how I format the window. I know it's not random, but it's not what I want either). I would rather have them evenly spaced at 1 h intervals (that is every 4th bar).
I have searched extensively on this and know i can add labels later with axis() but I have not found a way to set which bars are labeled and which are left blank.
So here is an example. Sorry for the long lines:
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10:00","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
names(x)<-Labels
barplot(x)
I do not think you can force R to show every label if it does not have enough space. But at least if you want to add the labels every 1h, the following code should work :
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
b=barplot(x,axes = F)
axis(2)
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))

R histogram with numbers under bars

I had some problems while trying to plot a histogram to show the frequency of every value while plotting the value as well. For example, suppose I use the following code:
x <- sample(1:10,1000,replace=T)
hist(x,label=TRUE)
The result is a plot with labels over the bar, but merging the frequencies of 1 and 2 in a single bar.
Apart from separate this bar in two others for 1 and 2, I also need to put the values under each bar.
For example, with the code above I would have the number 10 under the tick at the right margin of its bar, and I needed to plot the values right under the bars.
Is there any way to do both in a single histogram with hist function?
Thanks in advance!
Calling hist silently returns information you can use to modify the plot. You can pull out the midpoints and the heights and use that information to put the labels where you want them. You can use the pos argument in text to specify where the label should be in relation to the point (thanks #rawr)
x <- sample(1:10,1000,replace=T)
## Histogram
info <- hist(x, breaks = 0:10)
with(info, text(mids, counts, labels=counts, pos=1))

Y-axis values does not display correctly on my gap.boxplot in R

This is my script and the graph produced. I have made a gap between 7-29.8. But How can I display the y-axis values at 7 and 30? The axis only shows 1-6, instead of 0-7 , 30 as intended.
gap.boxplot(Km, gap=list(top=c(7,30), bottom=c(NA,NA), axis(side=2, at=c(0,29.8), labels= F)),
ylim=c(0,30), axis.labels=T, ylab="Km (mM)", plot=T, axe=T,
col=c("red","blue","black"))
abline(h=seq(6.99,7.157,.001), col="white")
axis.break(2, 7.1,style="slash")
You can call the axis function again to plot the marks at c(0:7,30)
axis(2,c(0:7,30)
But because you have a gap between 7 and 30, anything beyond 7 will have to be shifted in the plot. In general, a mark at position y will have to be moved downwards to y-gap.width, or y-(30-7) in your case.
So you can do this to plot your marks:
axis(2, labels=c(0:7,30), at=c(0:7,30-(30-7)))
It's hard to replicate the plot without example data. But I think this worth a try and should work.
axis(2,labels=c(0:7), at=c(0:7)) # build first gap marker '7'
then separately add the second gap marker
axis(2, labels=c(30), at=7*(1+0.01)) # the interval (0.01) could be different, test to find the best one to fit your plot

Axis-labeling in R histogram and density plots; multiple overlays of density plots

I have two related problems.
Problem 1: I'm currently using the code below to generate a histogram overlayed with a density plot:
hist(x,prob=T,col="gray")
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(density(x))
I've pasted the data (i.e. x above) here.
I have two issues with the code as it stands:
the last tick and label (100) of the x-axis does not appear on the histogram/plot. How can I put these on?
I'd like the y-axis to be of count or frequency rather than density, but I'd like to retain the density plot as an overlay on the histogram. How can I do this?
Problem 2: using a similar solution to problem 1, I now want to overlay three density plots (not histograms), again with frequency on the y-axis instead of density. The three data sets are at:
http://pastebin.com/z5X7yTLS
http://pastebin.com/Qg8mHg6D
http://pastebin.com/aqfC42fL
Here's your first 2 questions:
myhist <- hist(x,prob=FALSE,col="gray",xlim=c(0,100))
dens <- density(x)
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(dens$x,dens$y*(1/sum(myhist$density))*length(x))
The histogram has a bin width of 5, which is also equal to 1/sum(myhist$density), whereas the density(x)$x are in small jumps, around .2 in your case (512 even steps). sum(density(x)$y) is some strange number definitely not 1, but that is because it goes in small steps, when divided by the x interval it is approximately 1: sum(density(x)$y)/(1/diff(density(x)$x)[1]) . You don't need to do this later because it's already matched up with its own odd x values. Scale 1) for the bin width of hist() and 2) for the frequency of x length(x), as DWin says. The last axis tick became visible after setting the xlim argument.
To do your problem 2, set up a plot with the correct dimensions (xlim and ylim), with type = "n", then draw 3 lines for the densities, scaled using something similar to the density line above. Think however about whether you want those semi continuous lines to reflect the heights of imaginary bars with bin width 5... You see how that might make the density lines exaggerate the counts at any particular point?
Although this is an aged thread, if anyone catches this. I would only think it is a 'good idea' to forego translating the y density to count scales based on what the user is attempting to do.
There are perfectly good reasons for using frequency as the y value. One idea in particular that comes to mind is that using counts for the y scale value can give an analyst a good idea about where to begin the 'data hunt' for stratifying heterogenous data, if a mixed distribution model cannot soundly or intuitively be applied.
In practice, overlaying a density estimate over the observed histogram can be very useful in data quality checks. For example, in the above, if I were looking at the above graphic as a single source of data with the assumption that it describes "1 thing" and I wish to model this as "1 thing", I have an issue. That is, I have heterogeneous data which may require some level of stratification. The density overlay then becomes a simple visual tool for detecting heterogeneity (apart from using log transformations to smooth between-interval variation), and a direction (locations of the mixed distributions) for stratifying the data.

Resources