create a heatmap with regions in R - r

I have the following kind of data: on a rectangular piece of land (120x50 yards), there are 6 (also rectabgular) smaller areas each with a different kind of plant. The idea is to study the attractiveness of the various kinds of plant to birds. Each time a bird sits down somewhere on the land, I have the exact coordinates of where the bird sits down.
I don't care exactly where the bird sits down, but only care which of the six areas it is. To show the relative preference of birds for the various plants, I want to make a heatmap that makes the areas that are frequented most the darkest.
So, I need to convert the coordinates to code which area the bird visits, and then create a heatmap that shows the differential preference for each land area.
(the research is a bit more involved than this, but this is the general idea.)
How would I do this in R? Is there a R function that takes a vector of coordinates and turns that in such a heatmap? If not, do you have some hints for more on how to do this?

Not meant to be the answer you are looking for, but might give you some inspiration.
# Simulate some data
birdieLandingSimulator <- data.frame(t(sapply(1:100, function(x) c(runif(1, -10,10), runif(1, -10,10)))))
# Assign some coordinates, which ended up not really being used much at all, except for the point colors
assignCoord <- function(x)
{
# Assign the four coordinates clockwise: 1, 2, 3, 4
ifelse(all(x>0), 1, ifelse(!sum(x>0), 3, ifelse(x[1]>0, 2, 4)))
}
birdieLandingSimulator <- cbind(birdieLandingSimulator, Q = apply(birdieLandingSimulator, 1, assignCoord))
# Plot
require(ggplot2)
ggplot(birdieLandingSimulator, aes(x = X1, y = X2)) +
stat_density2d(geom="tile", aes(fill = 1/..density..), contour = FALSE) +
geom_point(aes(color = factor(Q))) + theme_classic() +
theme(axis.title = element_blank(),
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_discrete(guide = FALSE, h=c(180, 270)) +
scale_fill_continuous(name = "Birdie Landing Location")

Use ggplot2. Take a look at the examples for geom_bin2d. It's pretty simple to get 2d bins. Notice that you pass in binwidth for both x and y:
> df = data.frame(x=c(1,2,4,6,3,2,4,2,1,7,4,4),y=c(2,1,4,2,4,4,1,4,2,3,1,1))
> ggplot(df,aes(x=x, y=y,alpha=0.5)) + geom_bin2d(binwidth=c(2,2))

If you don't want to use ggplot, you can use the cut function to separate your data into bins.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
# Separate the data into bins.
x <- cut(x, c(0, 40, 80, 120))
y <- cut(y, c(0, 25, 50))
# Now plot it, suppressing reordering.
heatmap(table(y, x), Colv=NA, Rowv=NA)
Alternatively, to actually plot the regions in their true geographic location, you could draw the boxes yourself with rect. You would have to count the number of points in each region.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
regions <- data.frame(xleft=c(0, 40, 40, 80, 0, 80),
ybottom=c(0, 0, 15, 15, 30, 40),
xright=c(40, 120, 80, 120, 80, 120),
ytop=c(30, 15, 30, 40, 50, 50))
# Color gradient.
col <- colorRampPalette(c("white", "red"))(30)
# Make the plot.
plot(NULL, xlim=c(0, 120), ylim=c(0, 50), xlab="x", ylab="y")
apply(regions, 1, function (r) {
count <- sum(x >= r["xleft"] & x < r["xright"] & y >= r["ybottom"] & y < r["ytop"])
rect(r["xleft"], r["ybottom"], r["xright"], r["ytop"], col=col[count])
text( (r["xright"]+r["xleft"])/2, (r["ytop"]+r["ybottom"])/2, count)
})

Related

How do I change hexbin plot scales?

How do I change hexbin plots scales?
I currently have this:
Instead of the scale jumping from 1 to 718, I would like it to go from 1 to 2, 3, 5, 10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 15935.
Here is the code I used to plot it:
hex <- hexbin(trial$pickup_longitude, trial$pickup_latitude, xbins=600)
plot(hex, colramp = colorRampPalette(LinOCS(12)))
Here's a ggplot method, where you can specify whatever breaks you want.
library(ggplot2)
library(RColorBrewer)
##
# made up sample
#
set.seed(42)
X <- rgamma(10000, shape=1000, scale=1)
Y <- rgamma(10000, shape=10, scale=100)
dt <- data.table(X, Y)
##
# define breaks and labels for the legend
#
brks <- c(0, 1, 2, 5, 10, 20, 50, 100, Inf)
n.br <- length(brks)
labs <- c(paste('<', brks[2:(n.br-1)]), paste('>', brks[n.br-1]))
##
#
ggplot(dt, aes(X, Y))+geom_hex(aes(fill=cut(..count.., breaks=brks)), color='grey80')+
scale_fill_manual(name='Count', values = rev(brewer.pal(8, 'Spectral')), labels=labs)
You cannot control the boundaries of the scale as closely as you want, but you can adjust it somewhat. First we need a reproducible example:
set.seed(42)
X <- rnorm(10000, 10, 3)
Y <- rnorm(10000, 10, 3)
XY.hex <- hexbin(X, Y)
To change the scale we need to specify a function to use on the counts and an inverse function to reverse the transformation. Now, three different scalings:
plot(XY.hex) # Linear, default
plot(XY.hex, trans=sqrt, inv=function(x) x^2) # Square root
plot(XY.hex, trans=log, inv=function(x) exp(x)) # Log
The top plot is the original scaling. The bottom left is the square root transform and the bottom right is the log transform. There are probably too many levels to read these plots clearly. Adding the argument colorcut=6 to the plot command would reduce the number of levels to 5.

Quick, sleek and simple way of adding minor ticks in ggplot2? [duplicate]

This question already has answers here:
Adding minor tick marks to the x axis in ggplot2 (with no labels)
(4 answers)
Closed last month.
This question has been raised a number of times on StackOverflow over the years (see here and here), however I'm yet to come across a way that I'm satisfied with for easily adding unlabelled minor ticks to my ggplot axes.
Let's generate some dummy data to play around with:
df <- data.frame(x = rnorm(1000, mean = 25, sd = 5),
y = rnorm(1000, mean = 23, sd = 3))
There are two methods I've come across for adding unlabelled minor ticks.
Method 1 - Manually construct axis label vectors
Concatenate the values that you would like to appear at major ticks with empty spaces defined using "". If you would like to add just one unlabelled minor tick in-between major tick values, you can construct the vector of axis labels like so:
axis_values <- c(0, "", 10, "", 20, "", 30, "", 40, "", 50)
Or if you'd like n unlabelled minor ticks:
# Where n = 2 and for an axis range [0, 50]
axis_values <- c(0, rep("", 2), 15, rep("", 2), 30, rep("", 2), 45, "")
The user can then supply this vector to the 'labels' argument in the ggplot2::scale_x_continuous or ggplot2::scale_y_continuous functions as long as the length of the vector of labels matches the length of the vector supplied to the 'breaks' argument in the same functions.
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = axis_values, limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = axis_values, limits = c(0, 50))
Method 2 - Define your own function for generating axis label vectors
This post describes a function to which the user can supply a vector of values to appear at major ticks, along with the number of unlabelled minor ticks desired:
insert_minor <- function(major_labs, n_minor) {
labs <- c( sapply( major_labs, function(x) c(x, rep("", n_minor) ) ) )
labs[1:(length(labs)-n_minor)]
}
# Generate plot
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = insert_minor(major_labs = seq(0, 50, 10),
n_minor = 1), limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = insert_minor(major_labs = seq(0, 50, 10),
n_minor = 1), limits = c(0, 50))
Method 2 is the best way of generating unlabelled minor ticks I've seen yet. However drawbacks are:
Not dummy-proof - Users need to make sure that the value given to the 'n_minor' argument is compatible with the data supplied to the 'breaks' and 'major_labs' arguments. Call me lazy, but I don't want to think about this when I'm trying to produce plots quickly.
Function management required - When you want to use this function in another script, you have to
retrieve it from the last script you used it in, or alternatively perhaps you can package it up in a library to call in future scripts.
In my eyes, the ideal solution is for the ggplot2 developers to add an argument to scale_x_continuous or scale_y_continuous ggplot2 functions that takes a user-defined value for the number of unlabelled minor ticks the user would like to add to their plot axes, which then takes the vector supplied to the 'breaks' argument and determines 'major_labs' in the background out of the user's sight.
Has anyone else found any other way of computing unlabelled minor ticks in ggplot2?
A quick, simple, and kinda sleek solution would be to define this one-liner labelling function that only shows breaks that occur at your chosen multiples:
label_at <- function(n) function(x) ifelse(x %% n == 0, x, "")
So you could do:
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = seq(0, 50, 5), labels = label_at(10),
limits = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 5), labels = label_at(5),
limits = c(0, 50))
Which you can easily take to extremes:
ggplot(df, aes(x = x, y = y)) +
geom_point() +
scale_x_continuous(breaks = 1:50, labels = label_at(10), limits = c(0, 50)) +
scale_y_continuous(breaks = 1:50, labels = label_at(10), limits = c(0, 50))

How to add to ggplot2 plot inside of for loop

I'm trying to plot multiple circles of different sizes on a plot using ggplot2's geom_point inside of a for loop. Every time I run it though, it plots all the circles, but all in the location of the last circle instead of in their respective locations as given by the data frame. Below is an example of the code I am running. I'm wondering how I would fix this or if there's a better way to get at what I'm trying to do here.
data <- data.frame("x" = c(0, 500, 1000, 1500, 2000),
"y" = c(1500, 500, 2000, 0, 1000),
"size" = c(3, 5, 1.5, 4.2, 2.6)
)
g <- ggplot(data = data, aes(x = x, y = y)) + xlim(0,2000) + ylim(0,2000)
for(i in 1:5) {
g <- g + geom_point(aes(x=data$x[i],y=data$y[i]), size = data$size[i], pch = 1)
}
print(g)
It's pretty rare to need a for-loop for a plot -- ggplot2 will take the whole dataframe and process it all without you needing to manage each row.
ggplot(data = data, aes(x = x, y = y, size = size)) +
geom_point(pch = 1)

Determine "optimal" x coordinates for nodes when plotting dendritic network with pre-determined y coordinates

I am trying to plot a dendritic network (a river network) in R, using the ggnet2 function, and I want the y axis on the plot to be meaningful... specifically I want it to represent the basin area. I am looking for a method, then, that will calculate x coordinates so that the network displays nicely, with no crossing lines. How might I do this? See the code and example images below. In the first image, the y coordinates are specified to equal the area, which is what I want, but the x-coordinates are non-optimized so the graph looks ugly. In the second image, Fruchterman-Rhinegold placement looks great but the y coordinates are obviously arbitrary. I'm not wedded to using ggnet2, but I do want the network links to be angular (e.g. not like a clustering dendrogram with vertical links between nodes). Thanks!
[![library(GGally)
library(network)
library(sna)
graphmatrix <- matrix(c(0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow=33)
sitenumbers <- c(26,27,30,3,31,6,4,18,32,5,11,21,29,16,8,7,14,25,13,19,23,9,15,12,17,20,28,10,1,2,24,22,33)
areas <- c(4.2,4.2,4.5,2.2,4.8,2.5,2.4,3.6,5.3,2.5,3.1,3.7,4.4,3.4,2.8,2.7,3.3,4.2,3.3,3.6,3.9,3.1,3.4,3.1,3.5,3.7,4.3,3.1,1,1.6,4.1,3.7,5.7)
wsnet <- network(graphmatrix, directed=TRUE)
layout <- gplot.layout.fruchtermanreingold(wsnet, NULL)
wsnet %v% 'sitenumber' = sitenumbers
wsnet %v% 'area' = areas
wsnet %v% 'randomnumber'= sample(sitenumbers)
ggnet2(wsnet, label='sitenumber')
#You can specify y coordinates, but then you need to also specify x coords, so there's tons of line crossing... I want to "optimize" the x coords.
ggnet2(wsnet, label='sitenumber', mode=c('randomnumber','area'))][1]][1]
Here would be a ggraph solution to the problem. We'll start out by laying out a dendrogram and then tell ggraph to use the area as y positions.
library(tidygraph)
library(ggraph)
gr <- as_tbl_graph(wsnet)
lay <- create_layout(gr, "dendrogram")
lay$y <- lay$area
ggraph(lay) +
geom_edge_link() +
geom_node_point(size = 10, shape = 21, fill = "white") +
geom_node_text(aes(label = sitenumber))
Now obviously this is not perfect with intersecting lines and such, but it's a good starting point. You could tweak some positions manually:
lay$x[lay$sitenumber %in% c(12, 10, 17, 20, 28)] <- lay$x[lay$sitenumber %in% c(12, 10, 17, 20, 28)] + 1
lay$x[lay$sitenumber %in% c(1, 2)] <- lay$x[lay$sitenumber %in% c(1, 2)] - 2
lay$x[lay$sitenumber == 27] <- lay$x[lay$sitenumber == 27] + 2
lay$x[lay$sitenumber == 26] <- lay$x[lay$sitenumber == 26] + 3
ggraph(lay) +
geom_edge_link() +
geom_node_point(size = 10, shape = 21, fill = "white") +
geom_node_text(aes(label = sitenumber))
Adjust flavours to taste.

draw several ablines at once with specific color scheme

I have a data frame with slopes and intercepts coming from a series of simple linear regressions. In plotting the ablines I want to use a color coding that is specific for all possible combinations of class and category.
Say the data frame looks as follows:
(intercept <- rnorm(n = 40, mean = 1, sd = 0.25))
(slope <- rnorm(n = 40, mean = 2, sd = 1))
(clss <- c(rep("a", 20), rep("b", 20)))
(ctg <- c(rep("mm", 10), rep("nn", 10), rep("mm", 10), rep("nn", 10)))
df <- data.frame(intercept, slope, clss, ctg)
I managed to plot all ablines using:
plot(1, type="n", axes=FALSE, xlab="", ylab="", xlim=c(0, 10), ylim=c(0, 10))
mapply(abline, df$intercept, df$slope)
I want to plot these lines all in say green when clss=="a" and ctg=="mm" and use different colors for the other clss * ctg combinations.
Probably something like this would work:
by(df, paste(df$clss, df$ctg), mapply(abline, ... ))
But I could not figure out how.
Using ggplot:
library(ggplot2)
gg <- df
gg$color <- paste(gg$clss,".",gg$ctg,sep="")
ggplot(gg) +
geom_point(aes(x=-10,y=-10,color=color)) + # need this to display a legend...
geom_abline(aes(slope=slope, intercept=intercept, color=color)) +
xlim(0,10) + ylim(0,10) + labs(x="X",y="Y")
Produces this:
It turns out in your case you only have 4 unique clss and ctg combinations, so I just picked some random colours and modified your mapply
# get colour for each combination
x <- sample(colours(), length(unique(paste0(df$clss, df$ctg))))
# how many of each combination are there
q <- aggregate(df$intercept, by=list(paste0(df$clss, df$ctg)), length)
# make a colour vector
mycols <- rep(x, q[,2])
mapply(function(x,y,z) { abline(x, y, col=z) },
df$intercept, df$slope,
as.list(mycols) )
#You could obviously pick the colours yourself or choose a gradient

Resources