Related
I have a dataset of spatial locations data. I want to do a point pattern analysis using the spatstat package in R using this data. I want the best polygon area for the analysis instead of the rectangle area. The code I have is
original_data = read.csv("/home/hudamoh/PhD_Project_Moh_Huda/Dataset_files/my_coordinates.csv")
plot(original_data$row, original_data$col)
which results in a plot that looks like this
Setting the data for point pattern data
point_pattern_data = ppp(original_data$row, original_data$col, c(0, 77), c(0, 116))
plot(point_pattern_data)
summary(point_pattern_data)
resulting in a plot that looks like this
#The observed data has considerably wide white spaces, which I want to remove for a better analysis area. Therefore, I want to make the point pattern a polygon instead of a rectangle. The vertices for the polygon are the pairs of (x,y) below to avoid white space as much as possible.
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
I find these vertices above manually by considering the plot below (with the grid lines)
plot(original_data$row,original_data$col)
grid(nx = 40, ny = 25,
lty = 2, # Grid line type
col = "gray", # Grid line color
lwd = 2) # Grid line width
So I want to make the point pattern polygon. The code is
my_data_poly = owin(poly = list(x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40), y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)))
plot(my_data_poly)
but it results in an error. The error is
I fix it by
my_data_poly = owin(poly = list(x = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8), y = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)))
plot(my_data_poly)
It results in a plot
However, this is not what I want. How to get the observed area as a polygon in point pattern data analysis?
This should be a reasonable solution to the problem.
require(sp)
poly = Polygon(
cbind(original_data$col,
original_data$row)
))
This will create a polygon from your points. You can use this document to understand the sp package better
We don’t have access to the point data you read in from file, but if you just want to fix the polygonal window that is not a problem.
You need to traverse the vertices of your polygon sequentially and anti-clockwise.
The code connects the first point you give to the next etc. Your vertices are:
library(spatstat)
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
vert <- ppp(x, y, window = owin(c(0,80),c(0,120)))
plot.ppp(vert, main = "", show.window = FALSE, chars = NA)
text(vert)
Point number 13 is towards the bottom left and 14 in the top right, which gives the funny crossing in the polygon.
Moving the order around seems to help:
xnew <- c(x[1:11], x[13:12], x[23:14])
ynew <- c(y[1:11], y[13:12], y[23:14])
p <- owin(poly = cbind(xnew, ynew))
plot(p, main = "")
It is unclear from your provided plot of the data that you really should apply point pattern analysis.
The main assumption underlying point process modelling as implemented in spatstat
is that the locations of events (points) are random and the process that
generated the random locations is of interest.
Your points seem to be on a grid and maybe you need another tool for your analysis.
Of course spatstat has a lot of functionality for simply handling and summarising data like this so you may still find useful tools in there.
I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")
I apologize in advance if my code looks very amateurish.
I'm trying to assign quadrants to 4 measurement stations approximately located on the edges of a town.
I have the coordinates of these 4 stations:
a <- c(13.2975,52.6556)
b <- c(14.0083,52.5583)
c <- c(13.3722,52.3997)
d <- c(12.7417,52.6917)
Now my idea was to create lines connecting the north-south and east-west stations:
line.1 <- matrix(c(d[1],b[1],d[2],b[2]),ncol=2)
line.2 <- matrix(c(a[1],c[1],a[2],c[2]),ncol=2)
Plotting all the stations the connecting lines looks allright, however not very helpful for analyzing it on a computer.
So I calculated the eucledian vectors for the two lines:
vec.1 <- as.vector(c((b[1]-d[1]),(b[2]-d[2])))
vec.2 <- as.vector(c((c[1]-a[1]),(c[2]-a[2])))
which allowed me to calculate the angle between the two lines in degrees:
alpha <- acos((vec.1%*%vec.2) / (sqrt(vec.1[1]^2+vec.1[2]^2)*
sqrt(vec.2[1]^2+vec.2[2]^2)))) * 180/pi
The angle I get for alpha is 67.7146°. This looks fairly good. From this angle I can easily calculate the other 3 angles of the intersection, however I need values relative to the grid so I can assign values from 0°-360° for the wind directions.
Now my next planned step was to find the point where the two lines intersect, add a horizontal and vertical abline through that point and then calculate the angle relative to the grid. However I can't find a proper example that does that and I don't think I have a nice linear equation system I could solve.
Is my code way off? Or maybe anyone knows of a package which could help me? It feels like my whole approach is a bit wrong.
Okay I managed to calculate the intersection point, using line equations. Here is how.
The basic equation for two points is like this:
y - y_1 = (y_2-y_1/x_2-x_1) * (x-x_1)
If you make one for each of the two lines, you can just substitute the fractions.
k.1 <- ((c[2]-a[2])/(c[1]-a[1]))
k.2 <- ((b[2]-d[2])/(b[1]-d[1]))
Reshaping the two functions you get a final form for y:
y <- (((-k.1/k.2)*d[2]+k.1*d[1]-k.1*c[1]+d[2])/(1-k.1/k.2))
This one you can now use to calculate the x-value:
x <- ((y-d[2])+d[1]*k.2)/k.2
In my case I get
y = 52.62319
x = 13.3922
I'm starting to really enjoy this program!
Wikipedia has a good article on finding the intersection between two line segments with an explicit formula. However, you don't need to know the point of intersection to calculate the angle to the grid (or axes of coordinate system.) Just compute the angles from your vec.1 and vec.2 to the basis vectors:
e1 <- c(1, 0)
e2 <- c(0, 1)
as you have done.
I'm trying to do the same thing asked in this question, Cartogram + choropleth map in R, but starting from a SpatialPolygonsDataFrame and hoping to end up with the same type of object.
I could save the object as a shapefile, use scapetoad, reopen it and convert back, but I'd rather have it all within R so that the procedure is fully reproducible, and so that I can code dozens of variations automatically.
I've forked the Rcartogram code on github and added my efforts so far here.
Essentially what this demo does is create a SpatialGrid over the map, look up the population density at each point of the grid and convert this to a density matrix in the format required for cartogram() to work on. So far so good.
But, how to interpolate the original map points based on the output of cartogram()?
There are two problems here. The first is to get the map and grid into the same units to allow interpolation. The second is to access every point of every polygon, interpolate it, and keep them all in right order.
The grid is in grid units and the map is in projected units (in the case of the example longlat). Either the grid must be projected into longlat, or the map into grid units. My thought is to make a fake CRS and use this along with the spTransform() function in package(rgdal), since this handles every point in the object with minimal fuss.
Accessing every point is difficult because they are several layers down into the SpPDF object: object>polygons>Polygons>lines>coords I think. Any ideas how to access these while keeping the structure of the overall map intact?
This problem can be solved with the getcartr package, available on Chris Brunsdon's GitHub, as beautifully explicated in this blog post.
The quick.carto function does exactly what you want -- takes a SpatialPolygonsDataFrame as input and has a SpatialPolygonsDataFrame as output.
Reproducing the essence of the example in the blog post here in case the link goes dead, with my own style mixed in & typos fixed:
(Shapefile; World Bank population data)
library(getcartr)
library(maptools)
library(data.table)
world <- readShapePoly("TM_WORLD_BORDERS-0.3.shp")
#I use data.table, see blog post if you want a base approach;
# data.table wonks may be struck by the following step as seeming odd;
# see here: http://stackoverflow.com/questions/32380338
# and here: https://github.com/Rdatatable/data.table/issues/1310
# for some background on what's going on.
world#data <- setDT(world#data)
world.pop <- fread("sp.pop.totl_Indicator_en_csv_v2.csv",
select = c("Country Code", "2013"),
col.names = c("ISO3", "pop"))
world#data[world.pop, Population := as.numeric(i.pop), on = "ISO3"]
#calling quick.carto has internal calls to the
# necessary functions from Rcartogram
world.carto <- quick.carto(world, world$Population, blur = 0)
#plotting with a color scale
x <- world#data[!is.na(Population), log10(Population)]
ramp <- colorRampPalette(c("navy", "deepskyblue"))(21L)
xseq <- seq(from = min(x), to = max(x), length.out = 21L)
#annoying to deal with NAs...
cols <- ramp[sapply(x, function(y)
if (length(z <- which.min(abs(xseq - y)))) z else NA)]
plot(world.carto, col = cols,
main = paste0("Cartogram of the World's",
" Population by Country (2013)"))
I wish to present a distance matrix in an article I am writing, and I am looking for good visualization for it.
So far I came across balloon plots (I used it here, but I don't think it will work in this case), heatmaps (here is a nice example, but they don't allow to present the numbers in the table, correct me if I am wrong. Maybe half the table in colors and half with numbers would be cool) and lastly correlation ellipse plots (here is some code and example - which is cool to use a shape, but I am not sure how to use it here).
There are also various clustering methods but they will aggregate the data (which is not what I want) while what I want is to present all of the data.
Example data:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist(nba[1:20, -1], )
I am open for ideas.
You could also use force-directed graph drawing algorithms to visualize a distance matrix, e.g.
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist_m <- as.matrix(dist(nba[1:20, -1]))
dist_mi <- 1/dist_m # one over, as qgraph takes similarity matrices as input
library(qgraph)
jpeg('example_forcedraw.jpg', width=1000, height=1000, unit='px')
qgraph(dist_mi, layout='spring', vsize=3)
dev.off()
Tal, this is a quick way to overlap text over an heatmap. Note that this relies on image rather than heatmap as the latter offsets the plot, making it more difficult to put text in the correct position.
To be honest, I think this graph shows too much information, making it a bit difficult to read... you may want to write only specific values.
also, the other quicker option is to save your graph as pdf, import it in Inkscape (or similar software) and manually add the text where needed.
Hope this helps
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dst <- dist(nba[1:20, -1],)
dst <- data.matrix(dst)
dim <- ncol(dst)
image(1:dim, 1:dim, dst, axes = FALSE, xlab="", ylab="")
axis(1, 1:dim, nba[1:20,1], cex.axis = 0.5, las=3)
axis(2, 1:dim, nba[1:20,1], cex.axis = 0.5, las=1)
text(expand.grid(1:dim, 1:dim), sprintf("%0.1f", dst), cex=0.6)
A Voronoi Diagram (a plot of a Voronoi Decomposition) is one way to visually represent a Distance Matrix (DM).
They are also simple to create and plot using R--you can do both in a single line of R code.
If you're not famililar with this aspect of computational geometry, the relationship between the two (VD & DM) is straightforward, though a brief summary might be helpful.
Distance Matrices--i.e., a 2D matrix showing the distance between a point and every other point, are an intermediate output during kNN computation (i.e., k-nearest neighbor, a machine learning algorithm which predicts the value of a given data point based on the weighted average value of its 'k' closest neighbors, distance-wise, where 'k' is some integer, usually between 3 and 5.)
kNN is conceptually very simple--each data point in your training set is in essence a 'position' in some n-dimension space, so the next step is to calculate the distance between each point and every other point using some distance metric (e.g., Euclidean, Manhattan, etc.). While the training step--i.e., construcing the distance matrix--is straightforward, using it to predict the value of new data points is practically encumbered by the data retrieval--finding the closest 3 or 4 points from among several thousand or several million scattered in n-dimensional space.
Two data structures are commonly used to address that problem: kd-trees and Voroni decompositions (aka "Dirichlet tesselation").
A Voronoi decomposition (VD) is uniquely determined by a distance matrix--i.e., there's a 1:1 map; so indeed it is a visual representation of the distance matrix, although again, that's not their purpose--their primary purpose is the efficient storage of the data used for kNN-based prediction.
Beyond that, whether it's a good idea to represent a distance matrix this way probably depends most of all on your audience. To most, the relationship between a VD and the antecedent distance matrix will not be intuitive. But that doesn't make it incorrect--if someone without any statistics training wanted to know if two populations had similar probability distributions and you showed them a Q-Q plot, they would probably think you haven't engaged their question. So for those who know what they are looking at, a VD is a compact, complete, and accurate representation of a DM.
So how do you make one?
A Voronoi decomp is constructed by selecting (usually at random) a subset of points from within the training set (this number varies by circumstances, but if we had 1,000,000 points, then 100 is a reasonable number for this subset). These 100 data points are the Voronoi centers ("VC").
The basic idea behind a Voronoi decomp is that rather than having to sift through the 1,000,000 data points to find the nearest neighbors, you only have to look at these 100, then once you find the closest VC, your search for the actual nearest neighbors is restricted to just the points within that Voronoi cell. Next, for each data point in the training set, calculate the VC it is closest to. Finally, for each VC and its associated points, calculate the convex hull--conceptually, just the outer boundary formed by that VC's assigned points that are farthest from the VC. This convex hull around the Voronoi center forms a "Voronoi cell." A complete VD is the result from applying those three steps to each VC in your training set. This will give you a perfect tesselation of the surface (See the diagram below).
To calculate a VD in R, use the tripack package. The key function is 'voronoi.mosaic' to which you just pass in the x and y coordinates separately--the raw data, not the DM--then you can just pass voronoi.mosaic to 'plot'.
library(tripack)
plot(voronoi.mosaic(runif(100), runif(100), duplicate="remove"))
You may want to consider looking at a 2-d projection of your matrix (Multi Dimensional Scaling). Here is a link to how to do it in R.
Otherwise, I think you are on the right track with heatmaps. You can add in your numbers without too much difficulty. For example, building of off Learn R :
library(ggplot2)
library(plyr)
library(arm)
library(reshape2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,
rescale = rescale(value))
(p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue")+geom_text(aes(label=round(rescale,1))))
A dendrogram based on a hierarchical cluster analysis can be useful:
http://www.statmethods.net/advstats/cluster.html
A 2-D or 3-D multidimensional scaling analysis in R:
http://www.statmethods.net/advstats/mds.html
If you want to go into 3+ dimensions, you might want to explore ggobi / rggobi:
http://www.ggobi.org/rggobi/
In the book "Numerical Ecology" by Borcard et al. 2011 they used a function called *coldiss.r *
you can find it here: http://ichthyology.usm.edu/courses/multivariate/coldiss.R
it color codes the distances and even orders the records by dissimilarity.
another good package would be the seriation package.
Reference:
Borcard, D., Gillet, F. & Legendre, P. (2011) Numerical Ecology with R. Springer.
A solution using Multidimensional Scaling
data = read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")
dst = tcrossprod(as.matrix(data[,-1]))
dst = matrix(rep(diag(dst), 50L), ncol = 50L, byrow = TRUE) +
matrix(rep(diag(dst), 50L), ncol = 50L, byrow = FALSE) - 2*dst
library(MASS)
mds = isoMDS(dst)
#remove {type = "n"} to see dots
plot(mds$points, type = "n", pch = 20, cex = 3, col = adjustcolor("black", alpha = 0.3), xlab = "X", ylab = "Y")
text(mds$points, labels = rownames(data), cex = 0.75)