Adding multiple categorical labels to a bar chart in R - r

Say I have some data on some experiment I conducted on Earth and on Wayne's World. There are control and treatment means:
means1<-c(1,2)
means2<-c(1.5,2.5)
data<-cbind(means1,means2)
rownames(data)=c('ctrl','treatment')
colnames(data)=c('Earth','Waynes World')
I would like to plot this data, so I do.
barplot(data,beside=T)
This generates paired control and treatment bars, separated by planet. Each pair of bars has an x axis label specifying what planet they are from. What I would like is a second set of x-axis labels underneath each bar that specifies ctrl or treatment. Bonus if you tilt this second set of labels, they don't overlap the first labels, and everything looks pretty.

I think something like this describes what you're after
bp<-barplot(data,beside=T, xaxt="n")
mtext(text=rownames(data)[row(bp)], at=bp, line=1, side=1)
mtext(text=colnames(data), at=colMeans(bp), line=2.2, side=1)

Related

How to add frequency & percentage on the same histogram in R?

Consider the following data set.
x <- c(2,2,2,4,4,4,4,5,5,7,7,8,8,9,10,10,1,1,0,2,3,3,5,6)
hist(x, nclass=10)
I want to have a histogram where the x-axis indicates the intervals & the y-axis on the left represents the frequency. In addition to this, I need another y-axis on the right side of the histogram representing the percentage of the intervals on the same plot. Even though the following graph is for two variables, more or less it looks like what I need (taken from Histogram of two variables in R).
Thanks in advance!
You can add a vertical axis to the right side with axis(4,at = at), where at are points at which tick-marks are to be drawn. If you want the density values of your histogram as tick-marks, call axis(4, at = hist(x,nclass=10)$density).

Adding text to a boxplot that does not have a numbered axes

I am interested in labeling my boxplot with the letter A in the top left corner, but because I have a categorical X axis comparing seasons (summer vs winter), I am unable to give coordinates for my added text. How do you add text to a boxplot with a categorical axis?
This is what I've tried, which doesn't work:
`boxplot(LogTHg~Season, data = HgSIS, xlab= "Season", ylab= "LogTHg", text ("topleft", "A"))'
Three points:
As Ben writes, you can add a legend with x="topleft". The inset parameter allows you to separate it from the top and left boundaries.
If you call boxplot() with a formula object that has a factor-like right hand side, then R will put the first boxplot over the horizontal coordinate 1, the second over 2 and so forth. Which still doesn't tell you what the exact coordinates of the top left corner of the plotting region are, but you can at least do a couple of things. Like putting labels above each separate boxplot.
Relatedly, you can use the xlim parameter for boxplot() to control the horizontal spacing. For instance, if you use xlim=c(0,3), then you know that you can put something at horizontal coordinate 0. And the same with ylim.

Making barplot axis more readable: how to place ticks at some positions on the X axis so they correspond to specific bars for a large vector

Say i have a fairly long vector which i want to present as a barplot:
myvec<-runif(2000,0,1)
barplot(myvec, col="grey", border=NA, names.arg = seq(1:2000))
i would like x axis to go pretty:
to have, say 4-5 labels to be shown on the axis, but with the ticks that determine to which specific bar it corresponds.
worst to worst i can live with the random labels that are being picked automatically, but i want to see which bar they correspond to.
thanks
You can try this:
library(ggplot2)
myvec<-data.frame(v1=runif(2000,0,1))
ggplot(myvec,aes(x=dplyr::row_number(v1),y=v1))+geom_bar(stat='identity')

How to represent datapoints that are out of scale in R

I am trying to plot a set of data in R
x <- c(1,4,5,3,2,25)
my Y scale is fixed at 20 so that the last datapoint would effectively not be visible on the plot if i execute the following code
plot(x, ylim=c(0,20), type='l')
i wanted to show the range of the outlying datapoint by showing a smaller box above the plot, with an independent Y scale, representing only this last datapoint.
is there any package or way to approach this problem?
You may try axis.break (plotrix package) http://rss.acs.unt.edu/Rdoc/library/plotrix/html/axis.break.html, with which you can define the axis to break, the style, size and color of the break marker.
The potential disadvantage of this approach is that the trend perception might be fooled. Good luck!

Axis-labeling in R histogram and density plots; multiple overlays of density plots

I have two related problems.
Problem 1: I'm currently using the code below to generate a histogram overlayed with a density plot:
hist(x,prob=T,col="gray")
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(density(x))
I've pasted the data (i.e. x above) here.
I have two issues with the code as it stands:
the last tick and label (100) of the x-axis does not appear on the histogram/plot. How can I put these on?
I'd like the y-axis to be of count or frequency rather than density, but I'd like to retain the density plot as an overlay on the histogram. How can I do this?
Problem 2: using a similar solution to problem 1, I now want to overlay three density plots (not histograms), again with frequency on the y-axis instead of density. The three data sets are at:
http://pastebin.com/z5X7yTLS
http://pastebin.com/Qg8mHg6D
http://pastebin.com/aqfC42fL
Here's your first 2 questions:
myhist <- hist(x,prob=FALSE,col="gray",xlim=c(0,100))
dens <- density(x)
axis(side=1, at=seq(0,100, 20), labels=seq(0,100,20))
lines(dens$x,dens$y*(1/sum(myhist$density))*length(x))
The histogram has a bin width of 5, which is also equal to 1/sum(myhist$density), whereas the density(x)$x are in small jumps, around .2 in your case (512 even steps). sum(density(x)$y) is some strange number definitely not 1, but that is because it goes in small steps, when divided by the x interval it is approximately 1: sum(density(x)$y)/(1/diff(density(x)$x)[1]) . You don't need to do this later because it's already matched up with its own odd x values. Scale 1) for the bin width of hist() and 2) for the frequency of x length(x), as DWin says. The last axis tick became visible after setting the xlim argument.
To do your problem 2, set up a plot with the correct dimensions (xlim and ylim), with type = "n", then draw 3 lines for the densities, scaled using something similar to the density line above. Think however about whether you want those semi continuous lines to reflect the heights of imaginary bars with bin width 5... You see how that might make the density lines exaggerate the counts at any particular point?
Although this is an aged thread, if anyone catches this. I would only think it is a 'good idea' to forego translating the y density to count scales based on what the user is attempting to do.
There are perfectly good reasons for using frequency as the y value. One idea in particular that comes to mind is that using counts for the y scale value can give an analyst a good idea about where to begin the 'data hunt' for stratifying heterogenous data, if a mixed distribution model cannot soundly or intuitively be applied.
In practice, overlaying a density estimate over the observed histogram can be very useful in data quality checks. For example, in the above, if I were looking at the above graphic as a single source of data with the assumption that it describes "1 thing" and I wish to model this as "1 thing", I have an issue. That is, I have heterogeneous data which may require some level of stratification. The density overlay then becomes a simple visual tool for detecting heterogeneity (apart from using log transformations to smooth between-interval variation), and a direction (locations of the mixed distributions) for stratifying the data.

Resources