R histogram with numbers under bars - r

I had some problems while trying to plot a histogram to show the frequency of every value while plotting the value as well. For example, suppose I use the following code:
x <- sample(1:10,1000,replace=T)
hist(x,label=TRUE)
The result is a plot with labels over the bar, but merging the frequencies of 1 and 2 in a single bar.
Apart from separate this bar in two others for 1 and 2, I also need to put the values under each bar.
For example, with the code above I would have the number 10 under the tick at the right margin of its bar, and I needed to plot the values right under the bars.
Is there any way to do both in a single histogram with hist function?
Thanks in advance!

Calling hist silently returns information you can use to modify the plot. You can pull out the midpoints and the heights and use that information to put the labels where you want them. You can use the pos argument in text to specify where the label should be in relation to the point (thanks #rawr)
x <- sample(1:10,1000,replace=T)
## Histogram
info <- hist(x, breaks = 0:10)
with(info, text(mids, counts, labels=counts, pos=1))

Related

How to add frequency & percentage on the same histogram in R?

Consider the following data set.
x <- c(2,2,2,4,4,4,4,5,5,7,7,8,8,9,10,10,1,1,0,2,3,3,5,6)
hist(x, nclass=10)
I want to have a histogram where the x-axis indicates the intervals & the y-axis on the left represents the frequency. In addition to this, I need another y-axis on the right side of the histogram representing the percentage of the intervals on the same plot. Even though the following graph is for two variables, more or less it looks like what I need (taken from Histogram of two variables in R).
Thanks in advance!
You can add a vertical axis to the right side with axis(4,at = at), where at are points at which tick-marks are to be drawn. If you want the density values of your histogram as tick-marks, call axis(4, at = hist(x,nclass=10)$density).

Add data labels to spineplot in R

iFacColName <- "hireMonth"
iTargetColName <- "attrition"
iFacVector <- as.factor(c(1,1,1,1,10,1,1,1,12,9,9,1,10,12,1,9,5))
iTargetVector <- as.factor(c(1,1,0,1,1,0,0,1,1,0,1,0,1,1,1,1,1))
sp <- spineplot(iFacVector,iTargetVector,xlab=iFacColName,ylab=iTargetColName,main=paste0(iFacColName," vs. ",iTargetColName," Spineplot"))
spLabelPass <- sp[,2]/(sp[,1]+sp[,2])
spLabelFail <- 1-spLabelPass
text(seq_len(nrow(sp)),rep(.95,length(spLabelPass)),labels=as.character(spLabelPass),cex=.8)
For some reason, the text() function only plots one label far to the right of the graph. I have used this format to apply data labels to other types of graphs, so I am confused.
EDIT: added more code to make example work
You're not placing your labels inside the plotting region. It only extends to around 1.3 on the x axis. Try plotting something like
text(
cumsum(prop.table(table(iFacVector))),
rep(.95, length(spLabelPass)),
labels = as.character(round(spLabelPass, 1)),
cex = .8
)
and you'll get something like
This is obviously not the right positions for the labels, but you should be able to figure that out by yourself. (You're going to have to subtract half of the frequency for each bar from the cumulative frequency and account for the fact that the bars are padded with some amount of whitespace.

How to superimpose a histogram on each panel

I would like to superimpose, on each lattice histogram panel, an additional histogram (which will be the same in each panel). I want the overlayed histogram to have solid borders but empty fill (col), to allow comparison with the underlying histograms.
That is, the end result will be a series of panels, each with a different colored histogram, and each with the same extra outline histogram on top of the colored histogram.
Here's something that I tried, but it just produces empty panels:
foo.df <- data.frame(x=rnorm(40), categ=c(rep("A", 20), rep("B", 20)))
bar.df <- data.frame(x=rnorm(20))
histogram(~ x | categ, data=foo.df,
panel=function(...){histogram(...);
histogram(~ x, data=bar.df, col=NULL)})
(My guess is that I need to use panel.superpose, but this function is somewhat confusing. Sarkar's book doesn't explain how to use it, and the R help page has no examples. I'm finding it difficult to make sense of the panel.superpose help page without already having a basic understanding. There are a very small number of examples that I've found on the web, but I have been unable to figure out what aspects of those examples apply to my case. This answer is surely relevant, but I don't understand its use of panel.groups, and the example overlays three different groups from a single dataframe, whereas I want to repeatedly overlay the same data on multiple panels that also have different data .)
I continued working on this problem, and came up with an answer. I had been on the right track but got several crucial details wrong. Comments in the code below spell out important points.
# Main data, which will be displayed as solid histograms, different in each panel:
foo.df <- data.frame(y=rnorm(40), cat=c(rep("A", 20), rep("B", 20)))
# Comparison data: This will be displayed as an outline histogram in each panel:
bar.df <- data.frame(y=rnorm(30)-2)
# Define some vectors that we'll use in the histogram call.
# These have to be adjusted for the data by trial and error.
# Usually, panel.histogram will figure out reasonable default values for these.
# However, the two calls to panel.histogram below may figure out different values,
# producing pairs of histograms that aren't comparable.
bks <- seq(-5,3,0.5) # breaks that define the bar bins
yl <- c(0,50) # height of plot
# The key is to coordinate breaks in the two panel.histogram calls below.
# The first one inherits the breaks from the top-level call through '...' .
# Using "..." in the second call generates an error, so I specify parameters explicitly.
# It's not necessary to specify type="percent" at the top level, since that's the default,
# but it is necessary to specify it in the second panel.histogram call.
histogram(~ y | cat, data=foo.df, ylim=yl, breaks=bks, type="percent", border="cyan",
panel=function(...){panel.histogram(...)
panel.histogram(x=bar.df$y, col="transparent",
type="percent", breaks=bks)})
# col="transparent" is what makes the second set of bars into outlines.
# In the first set of bars, I set the border color to be the same as the value of col
# (cyan by default) rather than using border="transparent" because otherwise a filled
# bar with the same number of points as an outline bar will be slightly smaller.

How to send parameter to Geom.histogram when using Geom.subplot_grid in Gadfly?

I am trying to plot several histograms for the same data set, but with different numbers of bins. I am using Gadfly.
Suppose x is just an array of real values, plotting each histogram works:
plot(x=x, Geom.histogram(bincount=10))
plot(x=x, Geom.histogram(bincount=20))
But I'm trying to put all the histograms together. I've added the number of bins as another dimension to my data set:
x2 = vcat(hcat(10*ones(length(x)), x), hcat(20*ones(length(x)), x)
df = DataFrame(Bins=x2[:,1], X=x2[:,2])
Is there any way to send the number of bins (the value from the first column) to Geom.histogram when using Geom.subplot_grid? Something like this:
plot(df, x="X", ygroup="Bins", Geom.subplot_grid(Geom.histogram(?)))
I think you would be better off not using subplot grid at that point, and instead just combine them with vstack or hstack. From the docs
Plots can also be stacked horizontally with ``hstack`` or vertically with
``vstack``. This allows more customization in regards to tick marks, axis
labeling, and other plot details than is available with ``subplot_grid``.

Change axis labels with matplot in R

I'm trying to change the x axis in a matplot, but this command doesn't work:
TimePoints=1997:2011
matplot(t(DataMatrix),type='l',col="black",lwd=1,xlab="Anni",ylab="Rifiuti",main="Produzione rifiuti")
axis(side=1,at=TimePoints,labels=TimePoints)
with plot I used this without problems. How can I fix it?
Here you can find the objects: https://dl.dropboxusercontent.com/u/47720440/SOF.RData
I usually do this as follows:
Omit the axes altogether.
Add the axes with desired options one by one.
In R:
# Add argument axes=F to omit the axes
matplot(t(DataMatrix),type='l',col="black",lwd=1,xlab="Anni",ylab="Rifiuti",main="Produzione rifiuti",axes=F)
# Add Y-axis as is
axis(2)
# Add X-axis
# Note that your X-axis range is not in years but in the "column numbers",
# i.e. the X-axis range runs from 1 to 15 (the number of columns in your matrix)
# Possibly that's why your original code example did not work as expected?
axis(side=1,at=1:ncol(DataMatrix),labels=TimePoints)

Resources