Point patterns analysis : clarkevans.test and edge correction - r

First I'm really sorry because I'm newR, so I apologise if I missed a previous respond to this question, and I also apologise because I can not join pictures to my text (and I apologise for english fault because I'm not bilingual).
I'm working with Q-GIS and R. I have a forest (layer = parcels) with georeferenced specific trees (layer = coordinates of specific trees). I want to know if those specific trees are agregates. Therefore I import my Q-GIS layer on R which shows my specific trees (package rgdal, function readOGR).
Then I calculate the clark evans indice (package spatstat, function clarkevans.test) using the following line. Clark evans indice is the ratio of observed mean nearest neighbour distance to adjusted theoretical mean. (R=0 : complete agregation. R=1 : complete random. R=2.14914 : uniform pattern.)
clarkevans.test(PPP, correction="Donnelly", alternative="two.sided")
Where PPP is the format ppp of my layer specific trees :
PPP <- ppp(x = coordinates(spec_trees)[,1], y = coordinates(spec_trees)[,2], xrange = range(coordinates(spec_trees)[,1]), yrange = range(coordinates(spec_trees)[,2]))
And where the Donnelly correction is a correction for a rectangle window. I'm not sure if I have to use it or not. When there is no correction, I have almost the same result.
To the clark envans test, R responds :
R = 0.48929, p-value = 0.002
alternative hypothesis: two-sided
Which means my points are significantly different from a spatial random distribution (p-value > 0.05) AND my points are agregates (R<1).
BUT, I think R might over estimates the agregation, and I need the true value. My forest (layer = parcels) is not a square or an oval. Parcels are discontinues (there are lakes, roads, houses) ! Trees can not be everywhere in the square, where there are no parcels, there can be no trees, so of course there are no specific_trees. But R does not know that, so it just search agregation of specific_trees in an empty square.
So my question is the following one : can I search the spatial pattern of points IN a limited area which has a complex shape ?
I hope I am clear, but do not hesitate to ask me questions.

Since this question has been viewed many times, here is a detailed answer.
The analysis of a point pattern should always take account of the spatial region where points could have occurred. The spatstat package is designed to support this.
Point patterns are represented by objects of class ppp. The spatial region ("window") can be any complicated shape, or shapes, represented by an object of class owin.
You can create a point pattern in an irregularly-shaped window by
A <- ppp(x, y, window=W)
where x and y are vectors of coordinates, and W is a spatial region object of class owin created by the function owin. The object W can be a polygon, or several disconnected polygons, or a binary pixel mask, etc.
By the way, we strongly discourage the use of syntax like
PPP <- ppp(x, y, xrange = range(x), yrange = range(y))
because even when the spatial region is known to be a rectangle, it is unwise to estimate the limits of the rectangle by using the ranges of the coordinates.
For more information, see the spatstat book.

Related

Is there a way to specify the number of segments crossings when creating random pattern of line segments?

I have been using the spatstat package to determine if a point pattern is clustered, random or regular by comparing it to relative frequency distribution of nearest-neighbor distances generated under complete spatial randomness (CSR). Code is as follows (Y is a ppp object of x,y coordinates):
Y<-rpoint(60, 2, fmax=NULL, win=unit.square(),giveup=1000, verbose=FALSE,
nsim=1, drop=TRUE)
envelope(Y, Gest, nsim = 999, nrank = 25, global=FALSE, fix.n=TRUE)
However, I just realized that the random points have to be distributed at the intersections of a polygon network within the window; they cannot be compared against complete spatial randomness, since they are restricted to crossings of segments of a polygon network. Is there any way to simulate spatial randomness on a network pattern?
I managed to create random line segments and add points at each crossings:
l<-rpoisline(4, win=owin())
i<-selfcrossing.psp(l)
par(mfrow=c(1,2))
plot(l)
plot(i)
However, I am not sure how to integrate this with the envelope() function.
I also need the number of crossing to be constant. So is there anyway to specify the number of crossings in the rpoisline() function?

R: Is it possible to plot a grid from x, y spatial coordinates?

I've been working with a spatial model which contains 21,000 grid cells of unequal size (i by j, where i is [1:175] and j is[1:120]). I have the latitude and longitude values in two seperate arrays (lat_array,lon_array) of i and j dimensions.
Plotting the coordinates:
> plot(lon_array, lat_array, main='Grid Coordinates')
Result:
My question: Is it possible to plot these spatial coordinates as a grid rather than as points? Does anyone know of a package or function that might be able to do this? I haven't been able to find anything online to this nature.
Thanks.
First of all it is always a bit dangerous to plot inherently spherical coordinates (lat,long) directly in the plane. Usually you should project them in some way, but I will leave it for you to explore the sp package and the function spTransform or something like that.
I guess in principle you could simply use the deldir package to calculate the Dirichlet tessellation of you points which would give you a nice grid. However, you need a bounding region for this to avoid large cells radiating out from the border of your region. I personally use spatstat to call deldir so I can't give you the direct commands in deldir, but in spatstat I would do something like:
library(spatstat)
plot(lon_array, lat_array, main='Grid Coordinates')
W <- clickpoly(add = TRUE) # Now click the region that contains your grid
i_na <- is.na(lon_array) | is.na(lat_array) # Index of NAs
X <- ppp(lon_array[!i_na], lat_array[!i_na], window = W)
grid <- dirichlet(X)
plot(grid)
I have not tested this yet and I will update this answer once I get the chance to test it with some artificial data. A major problem is the size of your dataset which may take a long time to calculate the Dirichlet tessellation of. I have only tried to call dirichlet on dataset of size up to 3000 points...

options to allow heavily-weighted points on a map to overwhelm other points with low weights

what are some good kriging/interpolation idea/options that will allow heavily-weighted points to bleed over lightly-weighted points on a plotted R map?
the state of connecticut has eight counties. i found the centroid and want to plot poverty rates of each of these eight counties. three of the counties are very populated (about 1 million people) and the other five counties are sparsely populated (about 100,000 people). since the three densely-populated counties have more than 90% of the total state population, i would like those the three densely-populated counties to completely "overwhelm" the map and impact other points across the county borders.
the Krig function in the R fields package has a lot of parameters and also covariance functions that can be called, but i'm not sure where to start?
here is reproducible code to quickly produce a hard-bordered map and then three differently-weighted maps. hopefully i can just make changes to this code, but perhaps it requires something more complex like the geoRglm package? two of the three weighted maps look almost identical, despite one being 10x as weighted as the other..
https://raw.githubusercontent.com/davidbrae/swmap/master/20141001%20how%20to%20modify%20the%20Krig%20function%20so%20a%20huge%20weight%20overwhelms%20nearby%20points.R
thanks!!
edit: here's a picture example of the behavior i want-
disclaimer - I am not an expert on Krigging. Krigging is complex and takes a good understanding of the underlying data, the method and the purpose to achieve the correct result. You may wish to try to get input from #whuber [on the GIS Stack Exchange or contact him through his website (http://www.quantdec.com/quals/quals.htm)] or another expert you know.
That said, if you just want to achieve the visual effect you requested and are not using this for some sort of statistical analysis, I think there are some relatively simple solutions.
EDIT:
As you commented, though the suggestions below to use theta and smoothness arguments do even out the prediction surface, they apply equally to all measurements and thus do not extend the "sphere of influence" of more densely populated counties relative to less-densely populated. After further consideration, I think there are two ways to achieve this: by altering the covariance function to depend on population density or by using weights, as you have. Your weighting approach, as I wrote below, alters the error term of the krigging function. That is, it inversely scales the nugget variance.
As you can see in the semivariogram image, the nugget is essentially the y-intercept, or the error between measurements at the same location. Weights affect the nugget variance (sigma2) as sigma2/weight. Thus, greater weights mean less error at small-scale distances. This does not, however, change the shape of the semivariance function or have much effect on the range or sill.
I think that the best solution would be to have your covariance function depend on population. however, I'm not sure how to accomplish that and I don't see any arguments to Krig to do so. I tried playing with defining my own covariance function as in the Krig example, but only got errors.
Sorry I couldn't help more!
Another great resource to help understand Krigging is: http://www.epa.gov/airtrends/specialstudies/dsisurfaces.pdf
As I said in my comment, the sill and nugget values as well as the range of the semivariogram are things you can alter to affect the smoothing. By specifying weights in the call to Krig, you are altering the variance of the measurement errors. That is, in a normal use, weights are expected to be proportional to the accuracy of the measurement value so that higher weights represent more accurate measurements, essentially. This isn't actually true with your data, but it may be giving you the effect you desire.
To alter the way your data is interpolated, you can adjust two (and many more) parameters in the simple Krig call you are using: theta and smoothness. theta adjusts the semivariance range, meaning that measured points farther away contribute more to the estimates as you increase theta. Your data range is
range <- data.frame(lon=range(ct.data$lon),lat=range(ct.data$lat))
range[2,]-range[1,]
lon lat
2 1.383717 0.6300484
so, your measurement points vary by ~1.4 degrees lon and ~0.6 degrees lat. Thus, you can play with specifying your theta value in that range to see how that affects your result. In general, a larger theta leads to more smoothing since you are drawing from more values for each prediction.
Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=.8)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 0.8", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)
Gives:
Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=1.6)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 1.6", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)
Gives:
Adding the smoothness argument, will change the order of the function used to smooth your predictions. The default is 0.5 leading to a second-order polynomial.
Krig.output.wt <- Krig( cbind(ct.data$lon,ct.data$lat) , ct.data$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),
Covariance="Matern", smoothness = 0.6)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r, ct.map)
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta unspecified; Smoothness = 0.6", outer = FALSE)
points(cbind(ct.data$lon,ct.data$lat))
text(ct.data$lon, ct.data$lat-0.05, ct.data$NAME, cex=0.5)
Gives:
This should give you a start and some options, but you should look at the manual for fields. It is pretty well-written and explains the arguments well.
Also, if this is in any way quantitative, I would highly recommend talking to someone with significant spatial statistics know how!
Kriging is not what you want. (It is a statistical method for accurate--not distorted!--interpolation of data. It requires preliminary analysis of the data--of which you do not have anywhere near enough for this purpose--and cannot accomplish the desired map distortion.)
The example and the references to "bleed over" suggest considering an anamorph or area cartogram. This is a map which will expand and shrink the areas of the county polygons so that they reflect their relative population while retaining their shapes. The link (to the SE GIS site) explains and illustrates this idea. Although its answers are less than satisfying, a search of that site will reveal some effective solutions.
lot's of interesting comments and leads above.
I took a look at the Harvard dialect survey to get a sense for what you are trying to do first. I must say really cool maps. And before I start in on what I came up with...I've looked at your work on survey analysis before and have learned quite a few tricks. Thanks.
So my first take pretty quickly was that if you wanted to do spatial smoothing by way of kernel density estimation then you need to be thinking in terms of point process models. I'm sure there are other ways, but that's where I went.
So what I do below is grab a very generic US map and convert it into something I can use as a sampling window. Then I create random samples of points within that region, just pretend those are your centroids. After I attach random values to those points and plot it up.
I just wanted to test this conceptually, which is why I didn't go through the extra steps to grab cbsa's and also sorry for not projecting, but I think these are the fundamentals. Oh and the smoothing in the dialect study is being done over the whole country. I think. That is the author is not stratifying his smoothing procedure within polygons....so I just added states at the end.
code:
library(sp)
library(spatstat)
library(RColorBrewer)
library(maps)
library(maptools)
# grab us map from R maps package
usMap <- map("usa")
usIds <- usMap$names
# convert to spatial polygons so this can be used as a windo below
usMapPoly <- map2SpatialPolygons(usMap,IDs=usIds)
# just select us with no islands
usMapPoly <- usMapPoly[names(usMapPoly)=="main",]
# create a random sample of points on which to smooth over within the map
pts <- spsample(usMapPoly, n=250, type='random')
# just for a quick check of the map and sampling locations
plot(usMapPoly)
points(pts)
# create values associated with points, be sure to play aroud with
# these after you get the map it's fun
vals <-rnorm(250,100,25)
valWeights <- vals/sum(vals)
ptsCords <- data.frame(pts#coords)
# create window for the point pattern object (ppp) created below
usWindow <- as.owin(usMapPoly)
# create spatial point pattern object
usPPP <- ppp(ptsCords$x,ptsCords$y,marks=vals,window=usWindow)
# create colour ramp
col <- colorRampPalette(brewer.pal(9,"Reds"))(20)
# the plots, here is where the gausian kernal density estimation magic happens
# if you want a continuous legend on one of the sides get rid of ribbon=FALSE
# and be sure to play around with sigma
plot(Smooth(usPPP,sigma=3,weights=valWeights),col=col,main=NA,ribbon=FALSE)
map("state",add=TRUE,fill=FALSE)
example no weights:
example with my trivial weights
There is obviously a lot of work in between this and your goal of making this type of map reproducible at various levels of spatial aggregation and sample data, but good luck it seems like a cool project.
p.s. initially I did not use any weighting, but I suppose you could provide weights directly to the Smooth function. Two example maps above.

dimensions of kde object from ks package, R

I am using the ks package from R to estimate 2d space utilization using distance and depth information. What I would like to do is to use the 95% contour output to get the maximum vertical and horizontal distance. So essentially, I want to be able to get the dimensions or measurements of the resulting 95% contour.
Here is a piece of code with as an example,
require(ks)
dist<-c(1650,1300,3713,3718)
depth<-c(22,19.5,20.5,8.60)
dd<-data.frame(cbind(dist,depth))
## auto bandwidth selection
H.pi2<-Hpi(dd,binned=TRUE)*1
ddhat<-kde(dd,H=H.pi2)
plot(ddhat,cont=c(95),lwd=1.5,display="filled.contour2",col=c(NA,"palegreen"),
xlab="",ylab="",las=1,ann=F,bty="l",xaxs="i",yaxs="i",
xlim=c(0,max(dd[,1]+dd[,1]*0.4)),ylim=c(60,-3))
Any information about how to do this will be very helpful. Thanks in advance,
To create a 95% contour polygon from your 'kde' object:
library(raster)
im.kde <- image2Grid (list(x = ddhat$eval.points[[1]], y = ddhat$eval.points[[2]], z = ddhat$estimate))
kr <- raster(im.kde)
It is likely that one will want to resample this raster to a higher resolution before constructing polygons, and include the following two lines, before creation of the polygon object:
new.rast <- raster(extent(im.kde),res = c(50,50))
kr <- resample(kr, new.rast)
bin.kr <- kr
bin.kr[bin.kr < contourLevels(k, prob = 0.05)]<-NA
bin.kr[bin.kr > 0]<-1
k.poly<-rasterToPolygons(bin.kr,dissolve=T)
Note that the results are similar, but not identical, to Hawthorne Beier's GME function 'kde'. He does use the kde function from ks, but must do something slightly different for the output polygon.
At the moment I'm going for the "any information" prize rather than attempting a final answer. The ks:::plot.kde function dispatches to ks:::plotkde.2d in this case. It works its magic through side effects and I cannot get these functions to return values that can be inspected in code. You would need to hack the plotkde.2d function to return the values used to plot the contour lines. You can visualize what is in ddhat$estimate with:
persp(ddhat$estimate)
It appears that contourLevels examines the estimate-matrix and finds the value at which greater than the specified % of the total density will reside.
> contourLevels(ddhat, 0.95)
95%
1.891981e-05
And then draws the contout based on which values exceed that level. (I just haven't found the code that does that yet.)

What techniques exists in R to visualize a "distance matrix"?

I wish to present a distance matrix in an article I am writing, and I am looking for good visualization for it.
So far I came across balloon plots (I used it here, but I don't think it will work in this case), heatmaps (here is a nice example, but they don't allow to present the numbers in the table, correct me if I am wrong. Maybe half the table in colors and half with numbers would be cool) and lastly correlation ellipse plots (here is some code and example - which is cool to use a shape, but I am not sure how to use it here).
There are also various clustering methods but they will aggregate the data (which is not what I want) while what I want is to present all of the data.
Example data:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist(nba[1:20, -1], )
I am open for ideas.
You could also use force-directed graph drawing algorithms to visualize a distance matrix, e.g.
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist_m <- as.matrix(dist(nba[1:20, -1]))
dist_mi <- 1/dist_m # one over, as qgraph takes similarity matrices as input
library(qgraph)
jpeg('example_forcedraw.jpg', width=1000, height=1000, unit='px')
qgraph(dist_mi, layout='spring', vsize=3)
dev.off()
Tal, this is a quick way to overlap text over an heatmap. Note that this relies on image rather than heatmap as the latter offsets the plot, making it more difficult to put text in the correct position.
To be honest, I think this graph shows too much information, making it a bit difficult to read... you may want to write only specific values.
also, the other quicker option is to save your graph as pdf, import it in Inkscape (or similar software) and manually add the text where needed.
Hope this helps
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dst <- dist(nba[1:20, -1],)
dst <- data.matrix(dst)
dim <- ncol(dst)
image(1:dim, 1:dim, dst, axes = FALSE, xlab="", ylab="")
axis(1, 1:dim, nba[1:20,1], cex.axis = 0.5, las=3)
axis(2, 1:dim, nba[1:20,1], cex.axis = 0.5, las=1)
text(expand.grid(1:dim, 1:dim), sprintf("%0.1f", dst), cex=0.6)
A Voronoi Diagram (a plot of a Voronoi Decomposition) is one way to visually represent a Distance Matrix (DM).
They are also simple to create and plot using R--you can do both in a single line of R code.
If you're not famililar with this aspect of computational geometry, the relationship between the two (VD & DM) is straightforward, though a brief summary might be helpful.
Distance Matrices--i.e., a 2D matrix showing the distance between a point and every other point, are an intermediate output during kNN computation (i.e., k-nearest neighbor, a machine learning algorithm which predicts the value of a given data point based on the weighted average value of its 'k' closest neighbors, distance-wise, where 'k' is some integer, usually between 3 and 5.)
kNN is conceptually very simple--each data point in your training set is in essence a 'position' in some n-dimension space, so the next step is to calculate the distance between each point and every other point using some distance metric (e.g., Euclidean, Manhattan, etc.). While the training step--i.e., construcing the distance matrix--is straightforward, using it to predict the value of new data points is practically encumbered by the data retrieval--finding the closest 3 or 4 points from among several thousand or several million scattered in n-dimensional space.
Two data structures are commonly used to address that problem: kd-trees and Voroni decompositions (aka "Dirichlet tesselation").
A Voronoi decomposition (VD) is uniquely determined by a distance matrix--i.e., there's a 1:1 map; so indeed it is a visual representation of the distance matrix, although again, that's not their purpose--their primary purpose is the efficient storage of the data used for kNN-based prediction.
Beyond that, whether it's a good idea to represent a distance matrix this way probably depends most of all on your audience. To most, the relationship between a VD and the antecedent distance matrix will not be intuitive. But that doesn't make it incorrect--if someone without any statistics training wanted to know if two populations had similar probability distributions and you showed them a Q-Q plot, they would probably think you haven't engaged their question. So for those who know what they are looking at, a VD is a compact, complete, and accurate representation of a DM.
So how do you make one?
A Voronoi decomp is constructed by selecting (usually at random) a subset of points from within the training set (this number varies by circumstances, but if we had 1,000,000 points, then 100 is a reasonable number for this subset). These 100 data points are the Voronoi centers ("VC").
The basic idea behind a Voronoi decomp is that rather than having to sift through the 1,000,000 data points to find the nearest neighbors, you only have to look at these 100, then once you find the closest VC, your search for the actual nearest neighbors is restricted to just the points within that Voronoi cell. Next, for each data point in the training set, calculate the VC it is closest to. Finally, for each VC and its associated points, calculate the convex hull--conceptually, just the outer boundary formed by that VC's assigned points that are farthest from the VC. This convex hull around the Voronoi center forms a "Voronoi cell." A complete VD is the result from applying those three steps to each VC in your training set. This will give you a perfect tesselation of the surface (See the diagram below).
To calculate a VD in R, use the tripack package. The key function is 'voronoi.mosaic' to which you just pass in the x and y coordinates separately--the raw data, not the DM--then you can just pass voronoi.mosaic to 'plot'.
library(tripack)
plot(voronoi.mosaic(runif(100), runif(100), duplicate="remove"))
You may want to consider looking at a 2-d projection of your matrix (Multi Dimensional Scaling). Here is a link to how to do it in R.
Otherwise, I think you are on the right track with heatmaps. You can add in your numbers without too much difficulty. For example, building of off Learn R :
library(ggplot2)
library(plyr)
library(arm)
library(reshape2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,
rescale = rescale(value))
(p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue")+geom_text(aes(label=round(rescale,1))))
A dendrogram based on a hierarchical cluster analysis can be useful:
http://www.statmethods.net/advstats/cluster.html
A 2-D or 3-D multidimensional scaling analysis in R:
http://www.statmethods.net/advstats/mds.html
If you want to go into 3+ dimensions, you might want to explore ggobi / rggobi:
http://www.ggobi.org/rggobi/
In the book "Numerical Ecology" by Borcard et al. 2011 they used a function called *coldiss.r *
you can find it here: http://ichthyology.usm.edu/courses/multivariate/coldiss.R
it color codes the distances and even orders the records by dissimilarity.
another good package would be the seriation package.
Reference:
Borcard, D., Gillet, F. & Legendre, P. (2011) Numerical Ecology with R. Springer.
A solution using Multidimensional Scaling
data = read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")
dst = tcrossprod(as.matrix(data[,-1]))
dst = matrix(rep(diag(dst), 50L), ncol = 50L, byrow = TRUE) +
matrix(rep(diag(dst), 50L), ncol = 50L, byrow = FALSE) - 2*dst
library(MASS)
mds = isoMDS(dst)
#remove {type = "n"} to see dots
plot(mds$points, type = "n", pch = 20, cex = 3, col = adjustcolor("black", alpha = 0.3), xlab = "X", ylab = "Y")
text(mds$points, labels = rownames(data), cex = 0.75)

Resources