Extrapolating spatial point patterns using R - r

I was wondering if there's a built-in function to extrapolate a point pattern outside the 'parent' window in R. For instance, let's generate a spatial point pattern 'X':
require(spatstat)
X <- ppp(runif(200), runif(200),
c(0,1), c(0,1), unitname=c("metre","metre"))
Let's resample the data:
a <- quadratresample(X, nx=25, ny=5, replace=F, nsamples = 1)
But the the new points are generated within the same area/spatial window
> a
planar point pattern: 200 points
window: rectangle = [0, 1] x [0, 1] metres
My question is: how would I resample the 200 points within a new window bigger than the original window (1 by 1 m); in other words, how would I extrapolate the small set of 200 spatial points to a larger scale while keeping the same resampling density; say I want to see a total of 1,000 data points in a 5 by 5 m extent?

This is pretty easy because spatstat gives us all the right tools. You currently have a 1x1 grid. You want a 5x5 grid which is built out of 25 1x1 grids. We can sample the points for these grids with the nsamples argument:
a <- quadratresample(X, nx = 25, ny = 5, replace = F, nsamples = 25)
Now we have a list of 25 ppps. As you point out, all of these will be in the same 1x1 window. To turn these into a grid, we shift them appropriately, from 0 to 4 units in x shift and 0 to 4 units in y shift:
for (i in seq_along(a)) {
a[[i]] = shift(a[[i]], vec = c((i - 1) %% 5, (i - 1) %/% 5))
}
To combine them, use superimpose:
b = superimpose(a)
This gives a single ppp object in a 5x5 window with 200 * 25 = 5000 points, which preserves the 200 points per unit squared of the original.

Related

How to generate an elliptical cylinder, populate it with randomly distributed points, and measure instances of overlap between those points in R

I would like to determine the probability that a randomly distributed object of Type A occupies or touches (overlaps) the same space as any randomly distributed object of Type B when populated inside an elliptical cylinder. I would then like to loop this simulation many times to generate a more reliable probability value.
I am able to draw the elliptical cylinder using the shape package:
library(shape)
emptyplot(c(-5, 5), c(-15, 15), main = "filled elliptic cylinder")
filledcylinder(rx = 9, ry = 5, len= 2, angle = 00, col = "white",
lcol = "black", lcolint = "grey")
I do not know how to add points (i.e. objects A and B) to this graph. However, I suspect graphical expression is not the way to go with this task (though I find visualising helpful). I suspect a better approach will be to create a function to describe the elliptical cylinder, similar to the cone in the following example, and run the simulation without graphical output:
# Create a function to describe a cone
cone <- function(x, y){
sqrt(x ^ 2 + y ^ 2)
}
# prepare variables.
x <- y <- seq(-1, 1, length = 30)
z <- outer(x, y, cone)
# plot as a 3D surface for visual reference (even though I actually want a volume)
persp(x, y, z,
main="Perspective Plot of a Cone",
zlab = "Height",
theta = 30, phi = 15,
col = "orange", shade = 0.4)
Sadly I do not know how to do this for my elliptical cylinder. I am aware of the paramaters for describing an elliptical cylinder from the following source:
https://mathworld.wolfram.com/EllipticCylinder.html
Unfortunately, I do not understand much of it. I hope the dimensions given in my filledcylinder can act as a guide. Ultimately the dimension values do not matter, what matters is the code structure into which values can be entered.
As for the objects:
Let there be 50 Type A objects and 50 Type B objects of size x=0.4, y=0.4, z=0.4 (same units as in my graphical elliptical cylinder example).
All objects are to be distributed at random within the volume of the elliptical cylinder, with the exception that objects of Type A cannot overlap with another object of Type A, and objects of Type B cannot overlap with other objects of Type B. Type A objects may overlap with Type B objects.
I would like to output the number of Type A objects that overlap with any Type B object in the given volume, this number as a percentage of total Type A objects, and as a percentage of total all objects for each run of the simulation.
I do not know how to even start to do this.
If you can help, I'm afraid statistics, geometry and non-basic R expressions will need to be explained as if to a (not particularly bright) child.
Thank you very very much for your time!
An implementation with heavily commented code for explanations. This assumes the A- and B-type objects must be entirely within the elliptical cylinder.
library(data.table)
rObj <- function(rx, ry, h, n, dims, eps = 2) {
# Function to create a random sample (by rejection) of non-overlapping
# rectangular prism objects inside an elliptical cylinder whose ellipse is
# centered at x = 0, y = 0 and whose height ranges from -dims[3]/2 to h -
# dims[3]/2. The objects have dimensions (x, y, z) = dims, and all edges are
# parallel or orthogonal to each of the x, y, or z axes.
# INPUTS:
# rx: length of the ellipse
# ry: width of the ellipse
# h: height of the elliptical cylinder
# n: number of non-overlapping objects to return
# dims: dimensions of the rectangular prism objects (vector of length 3)
# eps: oversampling factor
# OUTPUT: a data.table with 3 columns and n rows. Each row gives the
# coordinates of the centroid of a sampled object
dt <- data.table()
while(nrow(dt) < n) {
# increase oversampling if it is not the first pass
if (nrow(dt)) eps <- eps*2
rho <- sqrt(runif(eps*n))
phi <- runif(eps*n, 0, 2*pi)
dt <- data.table(
# sample object centroids
# see https://stackoverflow.com/questions/5529148/algorithm-calculate-pseudo-random-point-inside-an-ellipse
# First, uniformly sample on an ellipse centered on x = 0, y = 0,
# with xlength = rx - dims[1] and ylength = ry - dims[2]
# (any object with a centroid outside of this ellipse will stick out of
# the elliptical cylinder, although some with a centroid within the
# smaller ellipse will still stick out of the elliptical cylinder).
x = (rx - dims[1])/2*rho*cos(phi),
y = (ry - dims[2])/2*rho*sin(phi),
# uniformly sample centroid heights
z = runif(eps*n, 0, h - dims[3])
)[
# remove objects that stick out of bounds
# The ellipse satisfies (x/(rx/2))^2 + (y/(ry/2))^2 = 1, which is the
# same as (x/rx)^2 + (y/ry)^2 = 0.25. Taking advantage of symmetry, add
# half of the x and y dimensions of the objects to the absolute value of
# x and y (the object corner furthest from the foci of the ellipse) and
# check if the result satisfies the standard equation.
((abs(x) + dims[1]/2)/rx)^2 + ((abs(y) + dims[2]/2)/ry)^2 < 0.25
][
# remove objects that overlap a previously placed object
# Since each rectangular prism object is oriented with the x, y, z axes,
# two objects overlap if they are closer than their lengths in each
# dimension.
tabulate(
sequence((.N - 1L):1, 2:.N)[ # row numbers (always keep the first row)
(dist(x) < dims[1]) & (dist(y) < dims[2]) & (dist(z) < dims[3])
],
.N
) == 0L
]
}
dt[1:n] # keep the first n objects
}
# function to get pairwise distances between two vectors
dist2 <- function(x, y) abs(outer(x, y, "-"))
fsim <- function(rx, ry, h, nA, nB, dimA, dimB, nreps, eps = 2) {
# function to simulate placement of A and B rectangular prism objects inside
# an elliptical cylinder and count the number of A-type objects that
# intersect at least one B-type object. All object edges are parallel or
# orthogonal to each of the x, y, or z axes.
# INPUTS:
# rx: length of the ellipse
# ry: width of the ellipses
# h: height of the elliptical cylinder
# nA: number of non-overlapping A-type objects to return
# nB: number of non-overlapping B-type objects to return
# dimX: dimensions of the rectangular prism objects (vector of length 3)
# nreps: the number of replications to simulate
# eps: oversampling factor when randomly sampling non-overlapping objects
# by rejection
# OUTPUT: vector of length "nreps" giving the number of A-type objects that
# intersect at least one B-type object for each replication
dims <- rowMeans(cbind(dimA, dimB)) # average dimensions of the A and B objects
out <- integer(nreps) # initialize the output vector
# repeat the simulation "nreps" times
for (i in 1:nreps) {
# get the coordinates of the A- and B-type objects' centroids
A <- rObj(rx, ry, h, nA, dimA, eps)
B <- rObj(rx, ry, h, nB, dimB, eps)
# count the number of A-type objects that intersect at least one B-type
# object
out[i] <- sum(rowSums((dist2(A$x, B$x) < dims[1])*(dist2(A$y, B$y) < dims[2])*(dist2(A$z, B$z) < dims[3])) != 0L)
}
out
}
Time 10K simulation replications:
system.time(overlaps <- fsim(9, 5, 2, 50L, 50L, rep(0.4, 3), rep(0.4, 3), 1e4L))
#> user system elapsed
#> 27.19 0.25 27.67
mean(overlaps)
#> [1] 18.7408
One approach to get an approximate answer to this problem is to discretize things. Set up a volume as a 3 dimensional array of zeros, then randomly generate the parameters of your shapes one at a time.
For each generated shape, find all the elements of the array that would be inside the shape. If any locations would be outside the cylinder or overlap a shape of the same type, try again. Once you have a legal shape, mark those array entries (e.g. 1 for type A, 2 for type B). Do all type A first, then all type B, and keep count of the times when shape B occupies a space that was previously marked for shape A.

Simulate a matricies of data using R as bands of satellite imagery from scratch

I am trying to
1) Simulate a matrix of data using R (in effect an image of numbers where each cell in the matrix has a number on the numerical scale of 0-255 (8 bit data))
2) Map the simulated data using mapping tools
3) Classify the image into 8-10 classes
The idea is to use a simple function to generate an image with 3 bands of Red Green and Blue imagery simulating multispectral imagery from satellite. So a composite of 3 different matricies. Like this.
Then classify the composite by colour into 8 or 10 classes
Any help would be much appreciated.
Based on your comments, here is an approach to sample as a gradient.
imagerows <- 100
imagecols <- 100
cuts <- 8
(imagecols * imagerows) %% cuts == 0 #Must be true
colorgroups <- as.integer(cut(0:255,cuts))
colors <- c("red","green","blue")
result <- lapply(colors,function(y){
unlist(
lapply(seq(1,cuts),function(x){sample((0:255)[colorgroups == x],
size = (imagerows*imagecols)/cuts,
replace = TRUE)})
)})
result is now a list of length 3, each element of which is a 100x100 matrix. The matrix contains 100 * 100 random samples between 0 and 255, but in cuts number of increasing groups.
We can then control the direction of the gradient using byrow = in matrix and by using rev() on the data.
red.matrix <- matrix((result[[1]]),nrow=imagerows,ncol=imagecols,byrow = TRUE)
green.matrix <- matrix((result[[2]]),nrow=imagerows,ncol=imagecols,byrow = FALSE)
blue.matrix <- matrix(rev(result[[3]]),nrow=imagerows,ncol=imagecols,byrow = FALSE)
Then we put the colors together with rgb() which outputs a vector. We can coerce it back into a matrix by assigning dimensions. Then just plot with grid.raster().
library(grid)
rgb.matrix <- rgb(red.matrix,green.matrix,blue.matrix,maxColorValue = 255)
dim(rgb.matrix) <- c(imagerows,imagecols)
grid.newpage()
grid.raster(rgb.matrix,interpolate = FALSE)

R: Sample a matrix for cells close to a specified position

I'm trying to find sites to collect snails by using a semi-random selection method. I have set a 10km2 grid around the region I want to collect snails from, which is broken into 10,000 10m2 cells. I want to randomly this grid in R to select 200 field sites.
Randomly sampling a matrix in R is easy enough;
dat <- matrix(1:10000, nrow = 100)
sample(dat, size = 200)
However, I want to bias the sampling to pick cells closer to a single position (representing sites closer to the research station). It's easier to explain this with an image;
The yellow cell with a cross represents the position I want to sample around. The grey shading is the probability of picking a cell in the sample function, with darker cells being more likely to be sampled.
I know I can specify sampling probabilities using the prob argument in sample, but I don't know how to create a 2D probability matrix. Any help would be appreciated, I don't want to do this by hand.
I'm going to do this for a 9 x 6 grid (54 cells), just so it's easier to see what's going on, and sample only 5 of these 54 cells. You can modify this to a 100 x 100 grid where you sample 200 from 10,000 cells.
# Number of rows and columns of the grid (modify these as required)
nx <- 9 # rows
ny <- 6 # columns
# Create coordinate matrix
x <- rep(1:nx, each=ny);x
y <- rep(1:ny, nx);y
xy <- cbind(x, y); xy
# Where is the station? (edit: not snails nest)
Station <- rbind(c(x=3, y=2)) # Change as required
# Determine distance from each grid location to the station
library(SpatialTools)
D <- dist2(xy, Station)
From the help page of dist2
dist2 takes the matrices of coordinates coords1 and coords2 and
returns the inter-Euclidean distances between coordinates.
We can visualize this using the image function.
XY <- (matrix(D, nr=nx, byrow=TRUE))
image(XY) # axes are scaled to 0-1
# Create a scaling function - scales x to lie in [0-1)
scale_prop <- function(x, m=0)
(x - min(x)) / (m + max(x) - min(x))
# Add the coordinates to the grid
text(x=scale_prop(xy[,1]), y=scale_prop(xy[,2]), labels=paste(xy[,1],xy[,2],sep=","))
Lighter tones indicate grids closer to the station at (3,2).
# Sampling probabilities will be proportional to the distance from the station, which are scaled to lie between [0 - 1). We don't want a 1 for the maximum distance (m=1).
prob <- 1 - scale_prop(D, m=1); range (prob)
# Sample from the grid using given probabilities
sam <- sample(1:nrow(xy), size = 5, prob=prob) # Change size as required.
xy[sam,] # Thse are your (**MY!**) 5 samples
x y
[1,] 4 4
[2,] 7 1
[3,] 3 2
[4,] 5 1
[5,] 5 3
To confirm the sample probabilities are correct, you can simulate many samples and see which coordinates were sampled the most.
snail.sam <- function(nsamples) {
sam <- sample(1:nrow(xy), size = nsamples, prob=prob)
apply(xy[sam,], 1, function(x) paste(x[1], x[2], sep=","))
}
SAMPLES <- replicate(10000, snail.sam(5))
tab <- table(SAMPLES)
cols <- colorRampPalette(c("lightblue", "darkblue"))(max(tab))
barplot(table(SAMPLES), horiz=TRUE, las=1, cex.names=0.5,
col=cols[tab])
If using a 100 x 100 grid and the station is located at coordinates (60,70), then the image would look like this, with the sampled grids shown as black dots:
There is a tendency for the points to be located close to the station, although the sampling variability may make this difficult to see. If you want to give even more weight to grids near the station, then you can rescale the probabilities, which I think is ok to do, to save costs on travelling, but these weights need to be incorporated into the analysis when estimating the number of snails in the whole region. Here I've cubed the probabilities just so you can see what happens.
sam <- sample(1:nrow(xy), size = 200, prob=prob^3)
The tendency for the points to be located near the station is now more obvious.
There may be a better way than this but a quick way to do it is to randomly sample on both x and y axis using a distribution (I used the normal - bell shaped distribution, but you can really use any). The trick is to make the mean of the distribution the position of the research station. You can change the bias towards the research station by changing the standard deviation of the distribution.
Then use the randomly selected positions as your x and y coordinates to select the positions.
dat <- matrix(1:10000, nrow = 100)
#randomly selected a position for the research station
rs <- c(80,30)
# you can change the sd to change the bias
x <- round(rnorm(400,mean = rs[1], sd = 10))
y <- round(rnorm(400, mean = rs[2], sd = 10))
position <- rep(NA, 200)
j = 1
i = 1
# as some of the numbers sampled can be outside of the area you want I oversampled # and then only selected the first 200 that were in the area of interest.
while (j <= 200) {
if(x[i] > 0 & x[i] < 100 & y[i] > 0 & y [i]< 100){
position[j] <- dat[x[i],y[i]]
j = j +1
}
i = i +1
}
plot the results:
plot(x,y, pch = 19)
points(x =80,y = 30, col = "red", pch = 19) # position of the station

spatial distribution of points, R

What would be an easy way to generate a 3 different spatial distribution of points (N = 20 points) using R. For example, 1) random, 2) uniform, and 3) clustered on the same space (50 x 50 grid)?
1) Here's one way to get a very even spacing of 5 points in a 25 by 25 grid numbered from 1 each direction. Put points at (3,18), (8,3), (13,13), (18,23), (23,8); you should be able to generalize from there.
2) as you suggest, you could use runif ... but I'd have assumed from your question you actually wanted points on the lattice (i.e. integers), in which case you might use sample.
Are you sure you want continuous rather than discrete random variables?
3) This one is "underdetermined" - depending on how you want to define things there's a bunch of ways you might do it. e.g. if it's on a grid, you could sample points in such a way that points close to (but not exactly on) already sampled points had a much higher probability than ones further away; a similar setup works for continuous variables. Or you could generate more points than you need and eliminate the loneliest ones. Or you could start with random uniform points and them make them gravitate toward their neighbors. Or you could generate a few cluster-centers (4-10, say), and then scatter points about those centers. Or you could do any of a hundred other things.
A bit late, but the answers above do not really address the problem. Here is what you are looking for:
library(sp)
# make a grid of size 50*50
x1<-seq(1:50)-0.5
x2<-x1
grid<-expand.grid(x1,x2)
names(grid)<-c("x1","x2")
# make a grid a spatial object
coordinates(grid) <- ~x1+x2
gridded(grid) <- TRUE
First: random sampling
# random sampling
random.pt <- spsample(x = grid, n= 20, type = 'random')
Second: regular sampling
# regular sampling
regular.pt <- spsample(x = grid, n= 20, type = 'regular')
Third: clustered at a distance of 2 from a random location (can go outside the area)
# random sampling of one location
ori <- data.frame(spsample(x = grid, n= 1, type = 'random'))
# select randomly 20 distances between 0 and 2
n.point <- 20
h <- rnorm(n.point, 1:2)
# empty dataframe
dxy <- data.frame(matrix(nrow=n.point, ncol=2))
# take a random angle from the randomly selected location and make a dataframe of the new distances from the original sampling points, in a random direction
angle <- runif(n = n.point,min=0,max=2*pi)
dxy[,1]= h*sin(angle)
dxy[,2]= h*cos(angle)
cluster <- data.frame(x=rep(NA, 20), y=rep(NA, 20))
cluster$x <- ori$coords.x1 + dxy$X1
cluster$y <- ori$coords.x2 + dxy$X2
# make a spatial object and plot
coordinates(cluster)<- ~ x+y
plot(grid)
plot(cluster, add=T, col='green')
plot(random.pt, add=T, col= 'red')
plot(regular.pt, add=T, col= 'blue')

Graph to compare two matrices in R

I have two matrices (of approximately 300 x 100) and I would like to plot a graph to see the parts of the first one that are higher than those of the second.
I can do, for instance:
# Calculate the matrices and put them into m1 and m2
# Note that the values are between -1 and 1
par(mfrow=c(1,3))
image(m1, zlim=c(-1,1))
image(m2, zlim=c(-1,1))
image(m1-m2, zlim=c(0,1))
This will plot only the desired regions in the 3rd plot but I would like to do something a bit different, like putting a line around those areas over the first plot in order to highlight them directly there.
Any idea how I can do that?
Thank you
nico
How about:
par(mfrow = c(1, 3))
image(m1, zlim = c(-1, 1))
contour(m1 - m2, add = TRUE)
image(m2, zlim = c(-1, 1))
contour(m1 - m2, add = TRUE)
image(m1 - m2, zlim = c(0, 1))
contour(m1 - m2, add = TRUE)
This adds a contour map around the regions. Sort of puts rings around the areas of the 3rd plot (might want to fiddle with the (n)levels of the contours to get fewer 'circles').
Another way of doing your third image might be:
image(m1>m2)
this produces a matrix of TRUE/FALSE values which gets imaged as 0/1, so you have a two-colour image. Still not sure about your 'putting a line around' thing though...
Here's some code I wrote to do something similar. I wanted to highlight contiguous regions above a 0.95 threshold by drawing a box round them, so I got all the grid squares above 0.95 and did a clustering on them. Then do a bit of fiddling with the clustering output to get the rectangle coordinates of the regions:
computeHotspots = function(xyz, thresh, minsize=1, margin=1){
### given a list(x,y,z), return a data frame where each row
### is a (xmin,xmax,ymin,ymax) of bounding box of a contiguous area
### over the given threshhold.
### or approximately. lets use the clustering tools in R...
overs <- which(xyz$z>thresh,arr.ind=T)
if(length(overs)==0){
## found no hotspots
return(NULL)
}
if(length(overs)==2){
## found one hotspot
xRange <- cbind(xyz$x[overs[,1]],xyz$x[overs[,1]])
yRange <- cbind(xyz$y[overs[,2]],xyz$y[overs[,2]])
}else{
oTree <- hclust(dist(overs),method="single")
oCut <- cutree(oTree,h=10)
oXYc <- data.frame(x=xyz$x[overs[,1]],y=xyz$y[overs[,2]],oCut)
xRange <- do.call("rbind",tapply(oXYc[,1],oCut,range))
yRange <- do.call("rbind",tapply(oXYc[,2],oCut,range))
}
### add user-margins
xRange[,1] <- xRange[,1]-margin
xRange[,2] <- xRange[,2]+margin
yRange[,1] <- yRange[,1]-margin
yRange[,2] <- yRange[,2]+margin
## put it all together
xr <- apply(xRange,1,diff)
xm <- apply(xRange,1,mean)
xRange[xr<minsize,1] <- xm[xr<minsize]-(minsize/2)
xRange[xr<minsize,2] <- xm[xr<minsize]+(minsize/2)
yr <- apply(yRange,1,diff)
ym <- apply(yRange,1,mean)
yRange[yr<minsize,1] <- ym[yr<minsize]-(minsize/2)
yRange[yr<minsize,2] <- ym[yr<minsize]+(minsize/2)
cbind(xRange,yRange)
}
Test code:
x=1:23
y=7:34
m1=list(x=x,y=y,z=outer(x,y,function(x,y){sin(x/3)*cos(y/3)}))
image(m1)
hs = computeHotspots(m1,0.95)
That should give you a matrix of rectangle coordinates:
> hs
[,1] [,2] [,3] [,4]
1 13 15 8 11
2 3 6 17 20
3 22 24 18 20
4 13 16 27 30
Now you can draw them over the image with rect:
image(m1)
rect(hs[,1],hs[,3],hs[,2],hs[,4])
and to show they are where they should be:
image(list(x=m1$x,y=m1$y,z=m1$z>0.95))
rect(hs[,1],hs[,3],hs[,2],hs[,4])
You could of course adapt this to draw circles, but more complex shapes would be tricky. It works best when the regions of interest are fairly compact.
Barry

Resources