I want to plot a barplot of some data with some x-axis labels but so far I just keep running into the same problem, as the axis scaling is completely off limits and therefore my labels are wrongly positioned below the bars.
The most simple example I can think of:
x = c(1:81)
barplot(x)
axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
As you can see, the x-axis does not stretch along the whole plot but stops somewhere in between. It seems to me as if the problem is quite simple, but I somehow I am not able to fix it and I could not find any solution so far :(
Any help is greatly appreciated.
The problem is that barplot is really designed for plotting categorical, not numeric data, and as such it pretty much does its own thing in terms of setting up the horizontal axis scale. The main way to get around this is to recover the actual x-positions of the bar midpoints by saving the results of barplot to a variable, but as you can see below I haven't come up with an elegant way of doing what you want in base graphics. Maybe someone else can do better.
x = c(1:81)
b <- barplot(x)
## axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
head(b)
You can see here that the actual midpoint locations are 0.7, 1.9, 3.1, ... -- not 1, 2, 3 ...
This is pretty quick, if you don't want to extend the axis from 0 to 100:
b <- barplot(x)
axis(side=1,at=b[c(20,40,60,80)],labels=seq(20,80,by=20))
This is my best shot at doing it in base graphics:
b <- barplot(x,xlim=c(0,120))
bdiff <- diff(b)[1]
axis(side=1,at=c(b[1]-bdiff,b[c(20,40,60,80)],b[81]+19*bdiff),
labels=seq(0,100,by=20))
You can try this, but the bars aren't as pretty:
plot(x,type="h",lwd=4,col="gray",xlim=c(0,100))
Or in ggplot:
library(ggplot2)
d <- data.frame(x=1:81)
ggplot(d,aes(x=x,y=x))+geom_bar(stat="identity",fill="lightblue",
colour="gray")+xlim(c(0,100))
Most statistical graphics nerds will tell you that graphing quantitative (x,y) data is better done with points or lines rather than bars (non-data-ink, Tufte, blah blah blah :-) )
Not sure exactly what you wnat, but If it is to have the labels running from one end to the other evenly places (but not necessarily accurately), then:
x = c(1:81)
bp <- barplot(x)
axis(side=1,at=bp[1+c(0,20,40,60,80)],labels=c(20,40,60,80,100))
The puzzle for me was why you wanted to label "20" at 0. But this is one way to do it.
I run into the same annoying property of batplots - the x coordinates go wild. I would add one another way to show the problem, and that is adding more lines to the plot.
x = c(1:81)
barplot(x)
axis(side=1,at=c(0,20,40,60,80),labels=c(20,40,60,80,100))
lines(c(81,81), c(0, 100)) # this should cross the last bar, but it does not
The best I came with was to define a new barplot function that will take also the parameter "at" for plotting positions of the bars.
barplot_xscaled <- function(bar_heights, at = NA, width = 0.5, col = 'grey'){
if ( is.na(at) ){
at <- c(1:length(bar_heights))
}
plot(bar_heights, type="n", xlab="", ylab="",
ylim=c(0, max(bar_heights)), xlim=range(at), bty = 'n')
for ( i in 1:length(bar_heights)){
rect(at[i] - width, 0, at[i] + width, bar_heights[i], col = col)
}
}
barplot_xscaled(x)
lines(c(81, 81), c(0, 100))
The lines command crosses the last bar - the x scale works just as naively expected, but you could also now define whatever positions of the bars you would like (you could play more with the function a bit to have the same properties as other R plotting functions).
Related
As my title suggests I am trying to create a 3D histogram using the Plot3D package. The following is a minimum working (or rather not working) example of the problem I'm having:
library(plot3D)
x = runif(10000)/2
y=runif(10000)
cuts = c(0, 0.2, 0.4, 0.6, 0.8, 1)
x_cut = cut(x, cuts)
y_cut = cut(y, cuts)
xy_table = table(x_cut, y_cut)
hist3D(z=xy_table, ticktype = "detailed")
This produces the following image:
As you can observe in the image, the bins of the histogram extend outside of [0,1]x[0,1]. Is there anyway I can force the bins to line up exactly with the ticks on the axis? That is, I would like the graph to correctly represent that all data points have x and y values between 0 and 1. Looking at the plot now, one could be led to believe that the bin containing the origin, for example, might also contain the data point (-0.1, 0). This cannot happen in the data I am trying to display and I need the axis to convey that.
I've spent all day fiddling with the various axis parameters and whatnot but cannot get it to work. For example if I try to plot things using the command
hist3D(z=xy_table, ticktype = "detailed", xlim=c(0,1), ylim=c(0,1))
Than I get something even worse:
I feel like I must be missing something obvious but I'm just not seeing what it is. If anyone has an answer please do share. And thank you for taking the time to read my question.
0 - 1 range is the default behaviour of hist3D if you don't define x and y ranges.
You get the expected result if you define x and y arguments using te middle of the bins ( 0.1 0.3 0.5 0.7 0.9):
hist3D(x = seq(0.1,0.9,0.2),y=seq(0.1,0.9,0.2),z=xy_table, ticktype = "detailed")
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
How can I plot a degree graoh like that?
The picture is only indicative, the result may not be identical to the image.
The important thing is that on the X axis there are labels of the nodes and on the Y axis the degree of each node.
Then the degree can be represented as a histogram (figure), with points, etc., it is not important.
This is what I tried to do and did not come close to what I want:
d = degree(net, mode="all")
hist(d)
or
t = table(degree(net))
plot(t, xlim=c(1,77), ylim=c(0, 40), xlab="Degree", ylab="Frequency")
I think it is a trivial thing but it's the first time I use R.
Thank you
This is what I have now:
I would like a graph that was more readable (I have 77 bars). That is, with more space between the bars and between the labels.
My aim is to show how a node (Valjean) has higher value than the other, I don't know if I am using the right graphic..
You can just use a barplot and specify the row names. For example,
net = rgraph(10)
rownames(net) = colnames(net) = LETTERS[1:10]
d = degree(net)
barplot(d, names.arg = rownames(net))
I'm trying to make an "opposing stacked bar chart" and have found pyramid.plot from the plotrix package seems to do the job. (I appreciate ggplot2 will be the go-to solution for some of you, but I'm hoping to stick with base graphics on this one.)
Unfortunately it seems to do an odd thing with the x axis, when I try to set the limits to non integer values. If I let it define the limits automatically, they are integers and in my case that just leaves too much white space. But defining them as xlim=c(1.5,1.5) produces the odd result below.
If I understand correctly from the documentation, there is no way to pass on additional graphical parameters to e.g. suppress the axis and add it on later, or let alone define the tick points etc. Is there a way to make it more flexible?
Here is a minimal working example used to produce the plot below.
require(plotrix)
set.seed(42)
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5))
Just in case it is of interest to anyone else, I'm not doing a population pyramid, but rather attempting a stacked bar chart with some of the values negative. The code above includes a 'trick' I use to make it possible to have a different number of sets of bars on each side, namely adding empty columns to the matrix, hopefully someone will find that useful - so sorry the working example is not as minimal as it could have been!
Setting the x axis labels using laxlab and raxlab creates a continuous axis:
pyramid.plot(cbind(runif(7,0,1),
rep(0,7),
rep(0,7)),
cbind(rep(0,7),
runif(7,0,1),
runif(7,0,1)),
top.labels=NULL,
gap=0,
labels=rep("",7),
xlim=c(1.5,1.5),
laxlab = seq(from = 0, to = 1.5, by = 0.5),
raxlab=seq(from = 0, to = 1.5, by = 0.5))
I have generated the following histogram in R:
I generated it using this hist() call:
hist(x[,1], xlab='t* (Transition Statistic)',
ylab='Proportion of Resamples (n = 10,000)',
main='Distribution of Resamples', col='lightblue',
prob=TRUE, ylim=c(0.00,0.05),xlim=c(1725,max(x[,1])+10))
Plus the following abline():
abline(v=1728,col=4,lty=1,lwd=2)
That vertical line indicates the actual location of a test statistic, which I am comparing to the results of permutation samples.
My question is this: as you can see, the x scale does not extend back to the vertical line. I would really like it to do so, because I think it looks odd otherwise. How can I make this happen?
I have already tried the xaxs="i" parameter, which has no effect. I have also tried making my own axis with axis() but this requires making both axes again from scratch, and the results don't look that great to me. So, I suspect there must be an easier way to do this. Is there? And, if not, can anyone suggest what axis() command might work well, assuming I want everything to look basically the same, but with the longer x scale?
The usual R plot draws a frame around the plot. To add this, do:
box()
after the plot.
If that isn't what you want, you need to suppress axis plotting and then add your own later.
hist(...., axes = FALSE) ## .... is where your other args go
axis(side = 2)
axis(side = 1, at = seq(1730, 1830, by = 20))
That won't go quite to the vertical line but may be close enough. If you want a tick at the vertical line, choose different tick marks, e.g.
axis(side = 1, at = seq(1725, 1835, by = 20))
Since R is using gaps of 20 for the x-axis here, you can get the extension you want using 1720 rather than 1725 for the lower limit , i.e. with xlim=c(1720,max(x[,1])+10) which would produce something like