How do I change hexbin plot scales? - r

How do I change hexbin plots scales?
I currently have this:
Instead of the scale jumping from 1 to 718, I would like it to go from 1 to 2, 3, 5, 10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 15935.
Here is the code I used to plot it:
hex <- hexbin(trial$pickup_longitude, trial$pickup_latitude, xbins=600)
plot(hex, colramp = colorRampPalette(LinOCS(12)))

Here's a ggplot method, where you can specify whatever breaks you want.
library(ggplot2)
library(RColorBrewer)
##
# made up sample
#
set.seed(42)
X <- rgamma(10000, shape=1000, scale=1)
Y <- rgamma(10000, shape=10, scale=100)
dt <- data.table(X, Y)
##
# define breaks and labels for the legend
#
brks <- c(0, 1, 2, 5, 10, 20, 50, 100, Inf)
n.br <- length(brks)
labs <- c(paste('<', brks[2:(n.br-1)]), paste('>', brks[n.br-1]))
##
#
ggplot(dt, aes(X, Y))+geom_hex(aes(fill=cut(..count.., breaks=brks)), color='grey80')+
scale_fill_manual(name='Count', values = rev(brewer.pal(8, 'Spectral')), labels=labs)

You cannot control the boundaries of the scale as closely as you want, but you can adjust it somewhat. First we need a reproducible example:
set.seed(42)
X <- rnorm(10000, 10, 3)
Y <- rnorm(10000, 10, 3)
XY.hex <- hexbin(X, Y)
To change the scale we need to specify a function to use on the counts and an inverse function to reverse the transformation. Now, three different scalings:
plot(XY.hex) # Linear, default
plot(XY.hex, trans=sqrt, inv=function(x) x^2) # Square root
plot(XY.hex, trans=log, inv=function(x) exp(x)) # Log
The top plot is the original scaling. The bottom left is the square root transform and the bottom right is the log transform. There are probably too many levels to read these plots clearly. Adding the argument colorcut=6 to the plot command would reduce the number of levels to 5.

Related

Change the number of breaks using facet_grid in ggplot2

I have a kind of data such as:
y<-rep(c(1, 2, 3), times=5)
group<-rep(c("a", "b", "c", "d", "e"), each=3)
x<-c(2, 3, 4, 5, 7, 10, 10, 15, 19, 8, 10, 14, 25, 28, 33)
a<-data.frame (x, y, group)
and when I use facet_grid() with scales="free_x" option I obtain 5 graphs with different number of breaks. It is possible that the 5 graphs have the same number of breaks? For example 3.
ggplot(a, aes(x, y))+geom_point()+ facet_grid(~group, scales="free_x")
I know that if I remove the scales="free_x" option I obtain the same scale for the 5 graphs, but the plot it turns so ugly. Can you help me?
You can define your own favorite breaks function. In the example below, I show equally spaced breaks. Note that the x in the function has a range that is already expanded by the expand argument to scale_x_continuous. In this case, I scaled it back (for the multiplicative expand argument).
# loading required packages
require(ggplot2)
require(grid)
# defining the breaks function,
# s is the scaling factor (cf. multiplicative expand)
equal_breaks <- function(n = 3, s = 0.05, ...){
function(x){
# rescaling
d <- s * diff(range(x)) / (1+2*s)
seq(min(x)+d, max(x)-d, length=n)
}
}
# plotting command
p <- ggplot(a, aes(x, y)) +
geom_point() +
facet_grid(~group, scales="free_x") +
# use 3 breaks,
# use same s as first expand argument,
# second expand argument should be 0
scale_x_continuous(breaks=equal_breaks(n=3, s=0.05),
expand = c(0.05, 0)) +
# set the panel margin such that the
# axis text does not overlap
theme(axis.text.x = element_text(angle=45),
panel.margin = unit(1, 'lines'))

Plotting a histogram with custom breaks

I have a vector like:
K <- rnorm(10000, mean=100)
I want to create a histogram of K with custom breaks (and labels) like <20, 20-50, 50-75, 75-99, =100, >400, etc.
Any ideas?
With base plotting, you might be better off cutting the vector first, and then using barplot or the plot method for tables.
For example:
K <- rnorm(10000, mean=100, sd = 100)
K.cut <- cut(K, c(-Inf, 20, 50, 75, 100, 400, Inf))
plot(table(K.cut), xaxt='n', ylab='K')
axis(1, at=1:6, labels=c('< 20', '20-50', '50-75', '75-100', '100-400', '> 400'))
box(bty='L')
xax <- barplot(table(K.cut), xaxt='n')
axis(1, at=xax, labels=c('< 20', '20-50', '50-75', '75-100', '100-400', '> 400'))
box(bty='L')
Note that by default, cut includes the upper (but not the lower) bound in each bin, so for example the 20-50 bin includes any 50s, but the 20s will be included in the lower adjacent bin.
Try ggplot version:
library(ggplot)
ggplot()+ geom_histogram(aes(K))
Many options are available for tweaking.

R - draw new layer behind current plot

Just curious, when plotting in R, one can easily change the order of the executive code to change the order of those "layer" on the plot, e.g.
plot(x, type = "n")
lines(y)
points(x)
to get x over the y. Are there any way to do it in an adhoc way, e.g.
plot(x)
lines(y, behind = TRUE) # fictional option behind
While there isn't explicitly a behind option or layers in plot, an easy way to overlay two plots might be using the add = TRUE option in plot. Here is an example with artificial data:
# Load sp package for creating artificial data
library(sp)
# Create sample town points
towns <- data.frame(lon = sample(100), lat = sample(100))
towns <- SpatialPoints(towns)
# Create sample polygon grid
grd <- GridTopology(c(1,1), c(10,10), c(10,10))
polys <- as.SpatialPolygons.GridTopology(grd)
# Plot polygons
plot(polys)
# Add towns (in red colour)
plot(towns, add = TRUE, col = 'red')
As another example, you can plot lines on different layers in ggplot and melt like this:
a <- c(3, 6, 16, 17, 11, 21)
b <- c(0.3, 2.3, 9, 9, 5 ,12)
c <- c(3, 7, 9, 7, 6, 10)
dat <- data.frame(a=a,b=b,c=c)
dat <- melt(dat)
Add an explicit 'x' variable to our data frame:
dat$x <- rep(1:6,times=3)
Then just plot the graph:
ggplot(dat,aes(x=x,y=value)) +
geom_line(aes(colour=variable)) +
scale_colour_manual(values=colours) +
labs(x="time[h]",y="a",colour="") +
opts(title="bla")
Finally, there is explicit support for layers in other packages, such as in PBSmapping for maps.

create a heatmap with regions in R

I have the following kind of data: on a rectangular piece of land (120x50 yards), there are 6 (also rectabgular) smaller areas each with a different kind of plant. The idea is to study the attractiveness of the various kinds of plant to birds. Each time a bird sits down somewhere on the land, I have the exact coordinates of where the bird sits down.
I don't care exactly where the bird sits down, but only care which of the six areas it is. To show the relative preference of birds for the various plants, I want to make a heatmap that makes the areas that are frequented most the darkest.
So, I need to convert the coordinates to code which area the bird visits, and then create a heatmap that shows the differential preference for each land area.
(the research is a bit more involved than this, but this is the general idea.)
How would I do this in R? Is there a R function that takes a vector of coordinates and turns that in such a heatmap? If not, do you have some hints for more on how to do this?
Not meant to be the answer you are looking for, but might give you some inspiration.
# Simulate some data
birdieLandingSimulator <- data.frame(t(sapply(1:100, function(x) c(runif(1, -10,10), runif(1, -10,10)))))
# Assign some coordinates, which ended up not really being used much at all, except for the point colors
assignCoord <- function(x)
{
# Assign the four coordinates clockwise: 1, 2, 3, 4
ifelse(all(x>0), 1, ifelse(!sum(x>0), 3, ifelse(x[1]>0, 2, 4)))
}
birdieLandingSimulator <- cbind(birdieLandingSimulator, Q = apply(birdieLandingSimulator, 1, assignCoord))
# Plot
require(ggplot2)
ggplot(birdieLandingSimulator, aes(x = X1, y = X2)) +
stat_density2d(geom="tile", aes(fill = 1/..density..), contour = FALSE) +
geom_point(aes(color = factor(Q))) + theme_classic() +
theme(axis.title = element_blank(),
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_discrete(guide = FALSE, h=c(180, 270)) +
scale_fill_continuous(name = "Birdie Landing Location")
Use ggplot2. Take a look at the examples for geom_bin2d. It's pretty simple to get 2d bins. Notice that you pass in binwidth for both x and y:
> df = data.frame(x=c(1,2,4,6,3,2,4,2,1,7,4,4),y=c(2,1,4,2,4,4,1,4,2,3,1,1))
> ggplot(df,aes(x=x, y=y,alpha=0.5)) + geom_bin2d(binwidth=c(2,2))
If you don't want to use ggplot, you can use the cut function to separate your data into bins.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
# Separate the data into bins.
x <- cut(x, c(0, 40, 80, 120))
y <- cut(y, c(0, 25, 50))
# Now plot it, suppressing reordering.
heatmap(table(y, x), Colv=NA, Rowv=NA)
Alternatively, to actually plot the regions in their true geographic location, you could draw the boxes yourself with rect. You would have to count the number of points in each region.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
regions <- data.frame(xleft=c(0, 40, 40, 80, 0, 80),
ybottom=c(0, 0, 15, 15, 30, 40),
xright=c(40, 120, 80, 120, 80, 120),
ytop=c(30, 15, 30, 40, 50, 50))
# Color gradient.
col <- colorRampPalette(c("white", "red"))(30)
# Make the plot.
plot(NULL, xlim=c(0, 120), ylim=c(0, 50), xlab="x", ylab="y")
apply(regions, 1, function (r) {
count <- sum(x >= r["xleft"] & x < r["xright"] & y >= r["ybottom"] & y < r["ytop"])
rect(r["xleft"], r["ybottom"], r["xright"], r["ytop"], col=col[count])
text( (r["xright"]+r["xleft"])/2, (r["ytop"]+r["ybottom"])/2, count)
})

Plotting multiple curves same graph and same scale

This is a follow-up of this question.
I wanted to plot multiple curves on the same graph but so that my new curves respect the same y-axis scale generated by the first curve.
Notice the following example:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1)
# second plot
par(new = TRUE)
plot(x, y2, axes = FALSE, xlab = "", ylab = "")
That actually plots both sets of values on the same coordinates of the graph (because I'm hiding the new y-axis that would be created with the second plot).
My question then is how to maintain the same y-axis scale when plotting the second graph.
(The typical method would be to use plot just once to set up the limits, possibly to include the range of all series combined, and then to use points and lines to add the separate series.) To use plot multiple times with par(new=TRUE) you need to make sure that your first plot has a proper ylim to accept the all series (and in another situation, you may need to also use the same strategy for xlim):
# first plot
plot(x, y1, ylim=range(c(y1,y2)))
# second plot EDIT: needs to have same ylim
par(new = TRUE)
plot(x, y2, ylim=range(c(y1,y2)), axes = FALSE, xlab = "", ylab = "")
This next code will do the task more compactly, by default you get numbers as points but the second one gives you typical R-type-"points":
matplot(x, cbind(y1,y2))
matplot(x, cbind(y1,y2), pch=1)
points or lines comes handy if
y2 is generated later, or
the new data does not have the same x but still should go into the same coordinate system.
As your ys share the same x, you can also use matplot:
matplot (x, cbind (y1, y2), pch = 19)
(without the pch matplopt will plot the column numbers of the y matrix instead of dots).
You aren't being very clear about what you want here, since I think #DWin's is technically correct, given your example code. I think what you really want is this:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1,ylim = range(c(y1,y2)))
# Add points
points(x, y2)
DWin's solution was operating under the implicit assumption (based on your example code) that you wanted to plot the second set of points overlayed on the original scale. That's why his image looks like the points are plotted at 1, 101, etc. Calling plot a second time isn't what you want, you want to add to the plot using points. So the above code on my machine produces this:
But DWin's main point about using ylim is correct.
My solution is to use ggplot2. It takes care of these types of things automatically. The biggest thing is to arrange the data appropriately.
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
df <- data.frame(x=rep(x,2), y=c(y1, y2), class=c(rep("y1", 5), rep("y2", 5)))
Then use ggplot2 to plot it
library(ggplot2)
ggplot(df, aes(x=x, y=y, color=class)) + geom_point()
This is saying plot the data in df, and separate the points by class.
The plot generated is
I'm not sure what you want, but i'll use lattice.
x = rep(x,2)
y = c(y1,y2)
fac.data = as.factor(rep(1:2,each=5))
df = data.frame(x=x,y=y,z=fac.data)
# this create a data frame where I have a factor variable, z, that tells me which data I have (y1 or y2)
Then, just plot
xyplot(y ~x|z, df)
# or maybe
xyplot(x ~y|z, df)

Resources