Set categorical axis labels with scales "free" ggplot2 - r

I am trying to set the labels on a categorical axis within a faceted plot using the ggplot2 package (1.0.1) in R (3.1.1) with scales="free". If I plot without manually setting the axis tick labels they appear correctly (first plot), but when I try to set the labels (second plot) only the first n labels are used on both facets (not in sequence as with the original labels).
Here is a reproducible code snippet exemplifying the problem:
foo <- data.frame(yVal=factor(letters[1:8]), xVal=factor(rep(1:4,2)), fillVal=rnorm(8), facetVar=rep(1:2,each=4))
## axis labels are correct
p <- ggplot(foo) + geom_tile(aes(x=xVal, y=yVal, fill=fillVal)) + facet_grid(facetVar ~ ., scales='free')
print(p)
## axis labels are not set correctly
p <- p + scale_y_discrete(labels=c('a','a','b','b','c','d','d','d'))
print(p)
I note that I cannot set the labels correctly within the data.frame as they are not unique. Also I am aware that I can accomplish this with arrange.grid, but this requires "manually" aligning the plots if there are different length labels etc. Additionally, I would like to have the facet labels included in the plot which is not an available option with the arrange.grid solution. Also I haven't tried viewports yet. Maybe that is the solution, but I was hoping for more of the faceted look to this plot and that seems to be more similar to grid.arrange.
It seems to me as though this is a bug, but I am open to an explanation as to how this might be a "feature". I also hope that there might be a simple solution to this problem that I have not thought of yet!

The easiest method would be to create another column in your data set with the right conversion. This would also be easier to audit and manipulate. If you insist on changing manually:
You cannot simply set the labels directly, as it recycles (I think) the label vector for each facet. Instead, you need to set up a conversion using corresponding breaks and labels:
p <- p + scale_y_discrete(labels = c('1','2','3','4','5','6','7','8'), breaks=c('a','b','c','d','e','f','g','h'))
print(p)
Any y axis value of a will now be replaced with 1, b with 2 and so on. You can play around with the label values to see what I mean. Just make sure that every factor value you have is also represented in the breaks argument.

I think I may actually have a solution to this. My problem was that my labels were not correct because as someone above has said - it seems like the label vector is recycled through. This line of code gave me incorrect labels.
ggplot(dat, aes(x, y))+geom_col()+facet_grid(d ~ t, switch = "y", scales = "free_x")+ylab(NULL)+ylim(0,10)+geom_text(aes(label = x))
However when the geom_text was moved prior to the facet_grid, the below code gave me correct labels.
ggplot(dat, aes(x, y))+geom_col()+geom_text(aes(label = x))+facet_grid(d ~ t, switch = "y", scales = "free_x")+ylab(NULL)+ylim(0,10)
There's a good chance I may have misunderstood the problem above, but I certainly solved my problem so hopefully this is helpful to someone!

Related

How to create colorbars in ggplot similar to those created by Lattice

I want colourbars created with ggplot to be similar to what spplot function (from lattice package) creates. Something like the attached image with each finite number of colours being assigned to rectangular blocks, instead of creating a continuous spectrum of colours. I need to be able to define the outline colour of the colourbar and also the format of the ticks.
I put this simple example together. How can I change this into something similar to this attached image? For example, I want the legend to start from -3 and end at 3 with 10 blocks of colours. I already tried 'nbin' in the function 'guides'. But I need the labels to be put at the 'edges' of the colour blocks instead of at the middle of them (i.e. centre of the bins).
ps: And sometimes ggplot creates a labels beyond the length of the colourbar!
library(ggplot2)
dat <- data.frame(x = rnorm(100), y = rnorm(100), col=rnorm(100))
ggplot(dat, aes(x,y,color=col)) +
geom_point() +
scale_color_gradient2(limits=c(-3,3), midpoint=0) +
guides(color=guide_colourbar(nbin=10, raster=FALSE))
I think what you ask for is not possible using the latest (public) version of ggplot2.
Ugly method, do at your own discretion
However, if you install the development version (this led to some version conflicts with other packages on my machine and I guess some things are not fully working yet) using
devtools::install_github("tidyverse/ggplot2")
library(ggplot2)
You will get some more options to modify guides such as ticks.colour, frame.colour or frame.linewidth which lets you customize the colorbar according to your requirements:
set.seed(6)
dat <- data.frame(x = rnorm(100), y = rnorm(100), z=rnorm(100))
ggplot(dat, aes(x,y,color=z)) + geom_point() +
scale_color_gradientn(colours=c("blue","gray80","red"), limits=c(-3,3),
breaks=c(-3/9*8,-3/9*4,0,3/9*4,3/9*8), labels=c(-2.4,-1.2,0,1.2,2.4), na.value = "green",
guide=guide_colorbar(nbin=10, raster=F, barwidth=20, frame.colour=c("black"),
frame.linewidth=1, ticks.colour="black", direction="horizontal")) +
theme(legend.position = "bottom")
Use colours = c() to specify a vector of colors
Use breaks together with labels to manually assign labels at the correct positions along the colorbar. EDIT: We can easily compute the required position along the colorbar by dividing 3 (the length of one half along the colorbar) by 9 (there are 9 half-boxes from the middle of the bar to the centre of the first box) and multiplying that by the number of half-boxes where we want the label to appear.
Values outside of limits will be colored according to na.value
You could additionally specify name = "Your Variable Name" to replace the z next to the colorbar
I see no way to put -3 / 3 at the very ends of the color bar, other than manually placing a text element at the correct position in the plot (which I would strongly advice against).

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

How to plot matrix with background color varying according to entry?

I wanted to ask for any general idea about plotting this kind of plot in R which can compare for example the overlaps of different methods listed on the horizontal and vertical side of the plot? Any sample code or something
Many thanks
A ggplot2-example:
# data generation
df <- matrix(runif(25), nrow = 5)
# bring data to long format
require(reshape2)
dfm <- melt(df)
# plot
require(ggplot2)
ggplot(dfm, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = round(value, 2)))
The corrplot package and corrplot function in that package will create plots similar to what you show above, that may do what you want or give you a starting point.
If you want more control then you could plot the colors using the image function, then use the text function to add the numbers. You can either create the margins large enough to place the text in the margins, see the axis function for the common way to add text labels in the margin. Or you could leave enough space internally (maybe use rasterImage instead of image) and use text to do the labelling. Look at the xpd argument to par if you want to add the lines and the grconvertX and grconvertY functions to help with the coordinates of the line segents.

Setting breakpoints for data with scale_fill_brewer() function in ggplot2

I am creating a map (choropleth) as described on the ggplot2 wiki. Everything works like a charm, except that I am running into an issue mapping a continuous value to the polygon fill color via the scale_fill_brewer() function.
This question describes the problem I'm having. As in the answer, my workaround has been to pre-cut my data into bins using the gtools quantcut() function:
UPDATE: This first example is actually the right way to do this
require(gtools) # needed for quantcut()
...
fill_factor <- quantcut(fill_continuous, q=seq(0,1,by=0.25))
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_factor) +
geom_polygon() +
scale_fill_brewer(name="mybins", palette="PuOr")
This works, however, I feel like I should be able to skip the step of pre-cutting my data and do something like this with the breaks option:
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_continuous) +
geom_polygon() +
scale_fill_brewer(names="mybins", palette="PuOr", breaks=quantile(fill_continuous))
But this doesn't work. Instead I get an error something like:
Continuous variable (composite score) supplied to discrete scale_brewer.
Have I misunderstood the purpose of the "breaks" option? Or is breaks broken?
A major issue with pre-cutting continuous data is that there are three pieces of information used at different points in the code:
The Brewer palette -- determines the maximum number of colors available
The number of break points (or the bin width) -- has to be specified with the data
The actual data to be plotted -- influences the choice of the Brewer palette (sequential/diverging)
A true vicious circle. This can be broken by providing a function that accepts the data and the palette, automatically derives the number of break points and returns an object that can be added to the ggplot object. Something along the following lines:
fill_brewer <- function(fill, palette) {
require(RColorBrewer)
n <- brewer.pal.info$maxcolors[palette == rownames(brewer.pal.info)]
discrete.fill <- call("quantcut", match.call()$fill, q=seq(0, 1, length.out=n))
list(
do.call(aes, list(fill=discrete.fill)),
scale_fill_brewer(palette=palette)
)
}
Use it like this:
ggplot(mydata) + aes(long,lat,group=group) + geom_polygon() +
fill_brewer(fill=fill_continuous, palette="PuOr")
As Hadley explains, the breaks option moves the ticks, but does not make the data continuous. Therefore pre-cutting the data as per the first example in the question is the right way to use the scale_fill_brewer command.

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources