Plot a raster using hexes in ggplot2 - r

I have a GIS raster data frame that I would like to plot using hexagonal tiles. The standard method using tile_geom() is straightforward:
ggplot(raster_df, aes(x, y, fill=blabla)) + geom_tile()
Howewer, I would really like my raster points to be displayed as hexes instead of rects for aesthetic reasons. The end result should be something like in this blog post: http://www.statsblogs.com/2014/09/02/how-to-create-a-hexagonal-bin-plot-in-sas/
I tried using geom_hex() instead of geom_tile(), but because geom_hex() seems to be intended with binning in mind, I can't figure out how to hack it to display my data instead. I would like for every raster point to correspond to a hex, i.e. have no binning at all!
Thank you for any suggestions.
Edit: as requested, here is a sample data raster (its actually very close to what I am using, as I want to plot a world map and colour each raster point according to a custom statistic)
library(maptools)
library(raster)
library(ggplot2)
data('wrld_simpl')
raster_df <- as.data.frame(rasterToPoints(rasterize(wrld_simpl, raster(res=5))))
raster_df$blabla <- rnorm(nrow(raster_df))

It looks as though this might be due to the implementation of geom_hex in ggplot2. I've used this package for a few years, and my first guess is to try:
ggplot(raster_df, aes(x, y, fill=blabla)) + geom_hex(stat="identity")
But this throws an error:
Error in ggplot2:::hexGrob(x = raster_df$x, y = raster_df$y, fill = raster_df$blabla) :
could not find function "hexcoords"
So I looked for the function hexcoords, which appears in the hexbin package. I explicity load that package and try again:
library(hexbin)
ggplot(raster_df, aes(x, y, fill=blabla)) + geom_hex(stat="identity")
And that works. The result isn't particularly beautiful, so it might be better to use the hexbin package a little more directly.
It doesn't seem that geom_hex() was designed to directly plot the data to hexagons without the interim step of stat = "binhex", which is different from many of the other geom_ functions.

Related

Trouble producing a polygon on top of a scatterplot using ggplot

Currently, I am trying to transition my graphical knowledge from the plot function in R, to the ggplot function. I have began constructing scatterplots and corresponding legends for a given data set, however I want to incorporate the function geom_polygon onto my plots using ggplot.
Specifically, I want to capture a triangular region from the origin of a scatterplot. For reproducibility, say I have the following data set:
rawdata<-data.frame(matrix(c(1,1,1,
2,1,-1,
3,-1,-1,
4,-1,1,
4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x.coordinate","y.coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
To construct a scatterplot along with a legend, I have been told to do the following:
p1<-ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,colour=Town,shape=Town))
+ theme_bw() + geom_point()
The result is the following:
Click here.
What I want to do now is produce a polygon. To do so, I have construct the following dataframe to use in the geom_polygon function:
geom_polygon(data=polygondata,aes(x = xa, y = ya),colour="darkslategray2",
fill = "darkslategray2",alpha=0.25)
However, when I combine this with p1, I get the following error:
Error in eval(expr, envir, enclos) : object 'Town' not found
From some messing around, I have noticed that when I omit the shape argument from the ggplot function, I can easily produce the desired output which is shown here. However, I wish to keep the shape for aesthetics.
I also get a similar problem when I try to produce arrows which connect points on the scatterplot using ggplot. However, I will address this problem after, as the root problem may be here.
Add the following to polygondata:
polygondata$Town = NA
Even though you're not using that variable in geom_polygon, ggplot expects it to be there if that column is used for an aesthetic in the main call to ggplot.
Alternatively, I think you could avoid the error if you move the aesthetic mapping in the initial plot to geom_point rather than the main ggplot call, like this:
p1 <- ggplot(data=rawdata) +
theme_bw() +
geom_point(aes(x=x.coordinate, y=y.coordinate, colour=Town, shape=Town))
In that case, you wouldn't need to add a Town column to polygondata.

Trouble producing discrete legend using ggplot for a scatterplot

I am fairly new to the ggplot function in R. Currently, I am struggling to produce a legend for a given data set that I have constructed by hand. For simplicity, suppose this was my data set:
rawdata<-data.frame(matrix(c(1,1,1,
2,1,-1,
3,-1,-1,
4,-1,1
4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x-coordinate","y-coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
Now, using ggplot, I am trying to figure out how to produce a legend on a scatterplot. So far I have done the following:
p1<-ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,fill=rawdata[,1]))
+geom_point(data=rawdata,aes(x=x.coordinate,y=y.coordinate))
I produce the following using the above code,
As you can see, the coordinates have been plotted and the legend has been constructed, but they are only colored black.
I learned that to color coordinates, I would have needed to use the argument colour=rawdata[,1] in the geom_point function to color in points. However, when I try this, I get the following error code:
Error: Aesthetics must be either length 1 or the same as the data (4): colour
I understand that this has something to do with the length of the vector, but as of right now, I have absolutely no idea how to tackle this small problem.
geom_point() takes a colour, not a fill. And, having passed the data into ggplot(data = ..), there's no need to then pass it into the geom_point() again.
I've also fixed an error in the creation of your df in your example.
rawdata<-data.frame(matrix(c(1,1,1,2,1,-1,3,-1,-1,4,-1,1,4,-2,2),5,3,byrow=TRUE))
names(rawdata)<-c("Town","x.coordinate","y.coordinate")
rawdata[,1]<-as.factor(rawdata[,1])
library(ggplot2)
ggplot(data=rawdata,aes(x=x.coordinate,y=y.coordinate,colour=Town)) +
geom_point()

Clip the contour with polygon using ggplot and R

I want to create a contour and then clip the contour by the polygon and only show the contour within the polygon.
Shapefile data can be found here
Csv file can be found here
The code I used is as follows:
library("ggplot2")
library("rgdal")
library("gpclib")
library("maptools")
require(sp)
age2100 <- read.csv("temp.csv",header=TRUE, sep=",")
shape.dir <- "C:/Users/jdbaba/Documents/R working folder/shape" # use your directory name here
lon.shape <- readOGR(shape.dir, layer = "Export_Output_4")
str(lon.shape)
lon.df <- fortify(lon.shape, region = "Id")
p <- ggplot(lon.df, aes(x = long, y = lat, group = group)) +
geom_polygon(colour = "black", fill = "grey80", size = 1) +
theme()
p <- p + geom_point(data=age2100,aes(x=age2100$x,y=age2100$y,group="z"),size=0.1)
p <- p + geom_density2d(colour="red")
p
Here, I have created the map, points and the contour. I don't know whether the code I am using created the contour for variable z or not. If it is not correct can anyone suggest me ?
The sample output that I got is as follows:
Now, I want to clip the contour within the polygon and hide the part of contour that is outside the polygon.
I want to know how to add the labels to the contour and control the contour interval.
Please let me know if my question is not clear.
Thanks
Jdbaba
I can't reproduce your map exactly. The code you provided gives me a map with two sets of contours - one that looks like yours and one that overlaps it in the southern part of the region. I suspect this is an artefact of your group setting. Also, I can see there is an island in the southern part of what I assume is the lake.
I like to clean up and partition my ggplot stuff into bits, since I often find something in an early part of a ggplot call confuses something in a later part. Here's how I would map the region, draw points, and then add a density contour:
map <- function(){
geom_polygon(data=lon.df,aes(x=long,y=lat,group=piece),colour="black",fill="grey80",size=1)
}
points <- function(){
geom_point(data=age2100,aes(x=x,y=y),size=0.1)
}
density <- function(){
geom_density2d(data=age2100,aes(x=x,y=y),colour="red")
}
ggplot()+map() +points() +density()
Which gives this:
Now that's much different to what your contour looks like, and I don't know why. Maybe your group parameter is grouping all the points with the same z?
Anyway, it seems you don't want a density plot, you want a map of your Z values over your area. This is going to need kriging or some other interpolation technique. Forget about ggplot for a while, concentrate on the numbers.
For starters, plot the points coloured by the z value. You should see this:
which at least will give you a good idea of what the correct contour will look like.
Anyway, this is getting into a full-on tutorial..

Plotting content from multiple data frames into a single ggplot2 surface

I am a total R beginner here, with corresponding level of sophistication of this question.
I am using the ROCR package in R to generate plotting data for ROC curves. I then use ggplot2 to draw the plot. Something like this:
library(ggplot2)
library(ROCR)
inputFile <- read.csv("path/to/file", header=FALSE, sep=" ", colClasses=c('numeric','numeric'), col.names=c('score','label'))
predictions <- prediction(inputFile$score, inputFile$label)
auc <- performance(predictions, measure="auc")#y.values[[1]]
rocData <- performance(predictions, "tpr","fpr")
rocDataFrame <- data.frame(x=rocData#x.values[[1]],y=rocData#y.values[[1]])
rocr.plot <- ggplot(data=rd, aes(x=x, y=y)) + geom_path(size=1)
rocr.plot <- rocr.plot + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,4))),colour="black",size=4)
This works well for drawing a single ROC curve. However, what I would like to do is read in a whole directory worth of input files - one file per classifier test results - and make a ggplot2 multifaceted plot of all the ROC curves, while still printing the AUC score into each plot.
I would like to understand what is the "proper" R-style approach to accomplishing this. I am sure I can hack something together by having one loop go through all files in the directory and create a separate data frame for each, and then having another loop to create multiple plots, and somehow getting ggplo2 to output all these plots onto the same surface. However, that does not let me use ggplot2's built-in faceting, which I believe is the right approach. I am not sure how to get my data into proper shape for faceting use, though. Should I be merging all my data frames into a single one, and giving each merged chunk a name (e.g. filename) and faceting on that? If so, is there a library or recommended practice for making this happen?
Your suggestions are appreciated. I am still wrapping my head around the best practices in R, so I'd rather get expert advice instead of just hacking things up to make code that looks more like ordinary declarative programming languages that I am used to.
EDIT: The thing I am least clear on is whether, when using ggplot2's built-in faceting capabilities, I'd still be able to output a custom string (AUC score) into each plot it will generate.
Here is an example of how to generate a plot as you described. I use the built-in dataset quakes:
The code does the following:
Load the ggplot2 and plyr packages
Add a facet variable to quakes - in this case I summarise by depth of earthquake
Use ddply to summarise the mean magnitude for each depth
Use ggplot with geom_text to label the mean magnitude
The code:
library(plyr)
library(ggplot2)
quakes$level <- cut(quakes$depth, 5,
labels=c("Very Shallow", "Shallow", "Medium", "Deep", "Very Deep"))
quakes.summary <- ddply(quakes, .(level), summarise, mag=round(mean(mag), 1))
ggplot(quakes, aes(x=long, y=lat)) +
geom_point(aes(colour=mag)) +
geom_text(aes(label=mag), data=quakes.summary, x=185, y=-35) +
facet_grid(~level) +
coord_map()

Setting breakpoints for data with scale_fill_brewer() function in ggplot2

I am creating a map (choropleth) as described on the ggplot2 wiki. Everything works like a charm, except that I am running into an issue mapping a continuous value to the polygon fill color via the scale_fill_brewer() function.
This question describes the problem I'm having. As in the answer, my workaround has been to pre-cut my data into bins using the gtools quantcut() function:
UPDATE: This first example is actually the right way to do this
require(gtools) # needed for quantcut()
...
fill_factor <- quantcut(fill_continuous, q=seq(0,1,by=0.25))
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_factor) +
geom_polygon() +
scale_fill_brewer(name="mybins", palette="PuOr")
This works, however, I feel like I should be able to skip the step of pre-cutting my data and do something like this with the breaks option:
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_continuous) +
geom_polygon() +
scale_fill_brewer(names="mybins", palette="PuOr", breaks=quantile(fill_continuous))
But this doesn't work. Instead I get an error something like:
Continuous variable (composite score) supplied to discrete scale_brewer.
Have I misunderstood the purpose of the "breaks" option? Or is breaks broken?
A major issue with pre-cutting continuous data is that there are three pieces of information used at different points in the code:
The Brewer palette -- determines the maximum number of colors available
The number of break points (or the bin width) -- has to be specified with the data
The actual data to be plotted -- influences the choice of the Brewer palette (sequential/diverging)
A true vicious circle. This can be broken by providing a function that accepts the data and the palette, automatically derives the number of break points and returns an object that can be added to the ggplot object. Something along the following lines:
fill_brewer <- function(fill, palette) {
require(RColorBrewer)
n <- brewer.pal.info$maxcolors[palette == rownames(brewer.pal.info)]
discrete.fill <- call("quantcut", match.call()$fill, q=seq(0, 1, length.out=n))
list(
do.call(aes, list(fill=discrete.fill)),
scale_fill_brewer(palette=palette)
)
}
Use it like this:
ggplot(mydata) + aes(long,lat,group=group) + geom_polygon() +
fill_brewer(fill=fill_continuous, palette="PuOr")
As Hadley explains, the breaks option moves the ticks, but does not make the data continuous. Therefore pre-cutting the data as per the first example in the question is the right way to use the scale_fill_brewer command.

Resources