1-dimensional Point-Process in R (spatstat) - r

I have asked another question, which was closed as Too Broad. Now, I will try to specify.
Again, I would like to simulate a 1-dimensional point process in R. So far, I've only been working on 2-dimensional simulations and would need a bit of help.
My goal is a simulation like in the picture
But, I only need the real line with the random points on it.
I use spatstat and have already found out that I can generate random points on a 1-dim Line with:
rpoisppOnLines(lambda, L, lmax = NULL, ..., nsim=1, drop=TRUE)
Now, I would like to produce the real line, preferably with matching labeling.
Does anyone have an idea?

Here is some crude code on getting samples from a point process.
library(spatstat)
lambda = 5
L = psp(0, 0, 3, 0, owin(c(0, 3), c(-1, 1)))
pp = rpoisppOnLines(lambda, L, lmax = NULL, nsim=1, drop=TRUE)
plot(pp$x, pp$y, pch = 4, lwd = 2, cex = 2)
abline(0, 0)
You could make your plot fancy with ggplot2

You could use a simple linear network to represent the one dimensional line
segment you want to simulate on. This also makes it possible to fit models
(lppm), estimate the intensity non-parametrically (density.lpp), estimate
the K-function (linearK), and a bunch of other things:
library(spatstat)
x_start <- 0
x_end <- 3
endpoints <- ppp(x=c(x_start, x_end), y=c(0,0), window = owin(c(x_start, x_end), c(-.1,.1)))
L <- linnet(endpoints, edges = matrix(c(1,2),ncol = 2))
X <- rpoislpp(lambda = 5, L = L)
However, this tool is designed for points on a complicated network and not
just the real line, so the plotting method is not really adapted to this
setting, and might not produce exactly what you want (too much white space):
plot(X, pch = 4, lwd = 2, main = "")
axis(1)
You can extract the coordinates of the point pattern using coords and then
use the plotting method from the other answer from there:
co <- coords(X)
co$x
#> [1] 1.3306861 2.5550691 1.7776248 2.9486675 1.8571362 2.5020587 1.4843001
#> [8] 0.4371669 0.8478670
Created on 2018-12-18 by the reprex package (v0.2.1)

Related

How to get the best polygon point pattern data in spatstat analysis in R

I have a dataset of spatial locations data. I want to do a point pattern analysis using the spatstat package in R using this data. I want the best polygon area for the analysis instead of the rectangle area. The code I have is
original_data = read.csv("/home/hudamoh/PhD_Project_Moh_Huda/Dataset_files/my_coordinates.csv")
plot(original_data$row, original_data$col)
which results in a plot that looks like this
Setting the data for point pattern data
point_pattern_data = ppp(original_data$row, original_data$col, c(0, 77), c(0, 116))
plot(point_pattern_data)
summary(point_pattern_data)
resulting in a plot that looks like this
#The observed data has considerably wide white spaces, which I want to remove for a better analysis area. Therefore, I want to make the point pattern a polygon instead of a rectangle. The vertices for the polygon are the pairs of (x,y) below to avoid white space as much as possible.
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
I find these vertices above manually by considering the plot below (with the grid lines)
plot(original_data$row,original_data$col)
grid(nx = 40, ny = 25,
lty = 2, # Grid line type
col = "gray", # Grid line color
lwd = 2) # Grid line width
So I want to make the point pattern polygon. The code is
my_data_poly = owin(poly = list(x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40), y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)))
plot(my_data_poly)
but it results in an error. The error is
I fix it by
my_data_poly = owin(poly = list(x = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8), y = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)))
plot(my_data_poly)
It results in a plot
However, this is not what I want. How to get the observed area as a polygon in point pattern data analysis?
This should be a reasonable solution to the problem.
require(sp)
poly = Polygon(
cbind(original_data$col,
original_data$row)
))
This will create a polygon from your points. You can use this document to understand the sp package better
We don’t have access to the point data you read in from file, but if you just want to fix the polygonal window that is not a problem.
You need to traverse the vertices of your polygon sequentially and anti-clockwise.
The code connects the first point you give to the next etc. Your vertices are:
library(spatstat)
x = c(3,1,1,0.5,0.5,1,2,2.5,5.5, 16,21,28,26,72,74,76,75,74,63,58,52,47,40)
y = c(116,106,82.5,64,40,35,25,17.5,5,5,5,10,8,116,100,50,30,24,17,10,15,15,8)
vert <- ppp(x, y, window = owin(c(0,80),c(0,120)))
plot.ppp(vert, main = "", show.window = FALSE, chars = NA)
text(vert)
Point number 13 is towards the bottom left and 14 in the top right, which gives the funny crossing in the polygon.
Moving the order around seems to help:
xnew <- c(x[1:11], x[13:12], x[23:14])
ynew <- c(y[1:11], y[13:12], y[23:14])
p <- owin(poly = cbind(xnew, ynew))
plot(p, main = "")
It is unclear from your provided plot of the data that you really should apply point pattern analysis.
The main assumption underlying point process modelling as implemented in spatstat
is that the locations of events (points) are random and the process that
generated the random locations is of interest.
Your points seem to be on a grid and maybe you need another tool for your analysis.
Of course spatstat has a lot of functionality for simply handling and summarising data like this so you may still find useful tools in there.

How to circle variable to observed (not latent) variables in dagitty plot

How would I put a circle around certaiin variables in the following plot?
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
png("mp.png", width = 500, height = 500,res=300)
plot(g)
dev.off()
In the web based tool you can indicate eg latent or adjusted and it changes the color of the circle, but this is not quite what I am looking for, although if it were possible to get these in the plot from R that would be sufficient, although I don't really like the way the variable is next to the circle in the web based version. I really wanted to circle observed variables and not circle unobserved ones.
I wrote a function which takes the points you want to circle as input, extracts the position of said points and circles them.
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
circle_points <- function(points_to_circle, g) {
#few regexs to extract the points and the positions from "g"
#can surely be optimized, made nicer and more robust but it works for now
fsplit <- strsplit(g[1], "\\]")[[1]]
fsplit <- fsplit[-length(fsplit)]
fsplit <- substr(fsplit, 1, nchar(fsplit)-1)
fsplit[1] <- substr(fsplit[1], 6, nchar(fsplit))
vars <- sapply(regmatches(fsplit,
regexec("\\\n(.*?)\\s*\\[", fsplit)), "[", 2)
pos <- sub(".*pos=\\\"", "", fsplit)
#build dataframe with extracted information
res_df <- data.frame(vars = vars,
posx = sapply(strsplit(pos, ","), "[",1),
posy = sapply(strsplit(pos, ","), "[",2))
df_to_circle <- res_df[res_df$vars %in% points_to_circle,]
#y-position seems to be inverted and has to be multiplied by -1
points(c(as.numeric(df_to_circle$posx)),
c(as.numeric(df_to_circle$posy) * -1),
cex = 4)
}
plot(g)
circle_points(c("A", "Y"), g)
This results in:
You can of course work with the cex parameter, adding colors etc. It seems that the positioning of the circles is a bit off-centered so maybe manipulate the x and y positions in circle_points by a slim margin.
I did not find any information in dagitty, but bnlearn package can add circle/or other shape easily. But I just noticed you only want to add circle to observed traits rather than latent variables (better mentioned this in your title). Then my code might not be what you are looking for. I still attached the code here for your reference. Alternatively, you can distinguish observed/latent traits in different color. This can be easily done using bnlearn (https://www.bnlearn.com/examples/graphviz-plot/)
library(bnlearn)
tree = model2network("[X][W|X][A|X][Y|A:X]")
graphviz.plot(tree, main = "DAG structure", shape = "circle",
layout = "circo")

Computing the Tukey median

I am trying to compute the data depth of two variables with the following function:
library(depth)
x <- data.frame(data$`math score`, data$`reading score`)
depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-8, ndir = 1000)
the first variable after depth is u which stands for Numerical vector whose depth is to be calculated. Dimension has to be the same as that of the observations.
I have 1000 observations however I get the following error message:
Error in depth(1000, x, method = "Tukey", approx = FALSE, eps = 1e-08, :
Dimension mismatch between the data and the point u.
Does someone know how to solve this issue?
Thank you in advance!
If you look at the documentation for the function depth, it says:
u    Numerical vector whose depth is to be calculated. Dimension has to be the same as that of the observations.
So u has to be a point in multidimensional space represented by a vector with n components, whereas x has to be a matrix or data frame of m by n components, (m rows for m points). You are comparing u to all the other multidimensional points in the set x to find the minimum number of points that could share a half-space with u.
Let's create a very example in two dimensional space:
library(depth)
set.seed(100)
x <- data.frame(x = c(rnorm(10, -5, 2), rnorm(10, 5, 2)), y = rnorm(20, 0, 2))
plot(x)
The depth function calculates the depth of a particular point relative to the data. So let's use the origin:
u <- data.frame(x = 0, y = 0)
points(u, col = "red", pch = 16)
Naively we might think that the origin here has a depth of 10/20 points (i.e. the most obvious way to partition this dataset is a vertical line through the origin with 10 points on each side, but instead we find:
depth(u, x)
#> [1] 0.35
This indicates that there is a half-space including the origin that only contains 0.35 of the points, i.e. 7 points out of 20:
depth(u, x) * nrow(x)
#> [1] 7
And we can see that visually like this:
abline(0, -0.07)
points(x[x$y < (-0.07 * x$x),], col = "blue", pch = 16)
Where we have coloured these 7 points blue.
So it's not clear what result you expect from the depth function, but you will need to give it a value of c(math_score, reading_score) where math_score and reading_score are test values for which you want to know the depth.

Density based clustering that allows user to specify number of clusters

I have data that consists of roughly 100,000 points on a 2-d graph. Each point has X and Y coordinates. I'm looking for an algorithm that will cluster these points based on density but I want to specify the number of clusters.
I originally tried K-Means since this would allow me to specify the number of clusters. However, my data naturally "clumps" into ridges. K-Means would inevitably bisect some of these ridges. DBSCAN seems like a better fit simply due to the shape of my data, but with DBSCAN I can't specify the number of clusters I'd like.
Essentially what I'm trying to find is an algorithm that will optimally cluster the graph into N groups based on density. Where N is supplied by me. At this point I don't care where it's implemented (R, Python, FORTRAN...).
Any direction you can provide would be much appreciated.
In an area of high density, the points tend to be close together, so clustering on the (euclidian) distance may give similar results (not always).
For example, with these three normals in 2 dimensions:
x1 <- mnormt::rmnorm(200, c(10,10), matrix(c(20,0,0,.1), 2, 2))
x2 <- mnormt::rmnorm(100, c(10,20), matrix(c(20,0,0,.1), 2, 2))
x3 <- mnormt::rmnorm(300, c(23, 15), matrix(c(.1,0,0,35), 2, 2))
xx <- rbind(x1, x2, x3)
plot(xx, col=rep(c("grey10","pink2", "green4"), times=c(200,100,300)))
We can apply different clustering algorithms:
# hierarchical
clustering <- hclust(dist(xx,
method = "euclidian"),
method = "ward.D")
h.cl <- cutree(clustering, k=3)
# K-means and dbscan
k.cl <- kmeans(xx, centers = 3L)
d.cl <- dbscan::dbscan(xx, eps = 1)
And we see on this particular example, the hierarchical clustering and DBSCAN produced similar results, whereas K-means cut one of the clusters in a wrong way.
opar <- par(mfrow=c(3,1), mar = c(1,1,1,1))
plot(xx, col = k.cl$cluster, main="K-means")
plot(xx, col = d.cl$cluster, main="DBSCAN")
plot(xx, col = h.cl, main="Hierarchical")
par(opar)
Of course, there is no guarantee this will work on your particular data.

Drawing a smooth implicit surface with misc3d

The misc3d package provides a great implementation of the marching cubes algorithm, allowing to plot implicit surfaces.
For example, let's plot a Dupin cyclide:
a = 0.94; mu = 0.56; c = 0.34 # cyclide parameters
f <- function(x, y, z, a, c, mu){ # implicit equation f(x,y,z)=0
b <- sqrt(a^2-c^2)
(x^2+y^2+z^2-mu^2+b^2)^2 - 4*(a*x-c*mu)^2 - 4*b^2*y^2
}
# define the "voxel"
nx <- 50; ny <- 50; nz <- 25
x <- seq(-c-mu-a, abs(mu-c)+a, length=nx)
y <- seq(-mu-a, mu+a, length=ny)
z <- seq(-mu-c, mu+c, length=nz)
g <- expand.grid(x=x, y=y, z=z)
voxel <- array(with(g, f(x,y,z,a,c,mu)), c(nx,ny,nz))
# plot the surface
library(misc3d)
surf <- computeContour3d(voxel, level=0, x=x, y=y, z=z)
drawScene.rgl(makeTriangles(surf))
Nice, except that the surface is not smooth.
The documentation of drawScene.rgl says: "Object-specific rendering features such as smoothing and material are controlled by setting in the objects." I don't know what does that mean. How to get a smooth surface?
I have a solution but not a straightforward one: this solution consists in building a mesh3d object from the output of computeContour3d, and to include the surface normals in this mesh3d.
The surface normals of an implicit surface defined by f(x,y,z)=0 are simply given by the gradient of f. It is not hard to derive the gradient for this example.
gradient <- function(xyz,a,c,mu){
x <- xyz[1]; y <- xyz[2]; z <- xyz[3]
b <- sqrt(a^2-c^2)
c(
2*(2*x)*(x^2+y^2+z^2-mu^2+b^2) - 8*a*(a*x-c*mu),
2*(2*y)*(x^2+y^2+z^2-mu^2+b^2) - 8*b^2*y,
2*(2*z)*(x^2+y^2+z^2-mu^2+b^2)
)
}
Then the normals are computed as follows:
normals <- apply(surf, 1, function(xyz){
gradient(xyz,a,c,mu)
})
Now we are ready to make the mesh3d object:
mesh <- list(vb = rbind(t(surf),1),
it = matrix(1:nrow(surf), nrow=3),
primitivetype = "triangle",
normals = rbind(-normals,1))
class(mesh) <- c("mesh3d", "shape3d")
And finally to plot it with rgl:
library(rgl)
shade3d(mesh, color="red")
Nice, the surface is smooth now.
But is there a more straightforward way to get a smooth surface, without building a mesh3d object? What do they mean in the documentation: "Object-specific rendering features such as smoothing and material are controlled by setting in the objects."?
I don't know what the documentation is suggesting. However, you can do it via a mesh object slightly more easily than you did (though the results aren't quite as nice), using the addNormals() function to calculate the normals automatically rather than by formula.
Here are the steps:
Compute the surface as you did.
Create the mesh without normals. This is basically what you did, but using tmesh3d():
mesh <- tmesh3d(t(surf), matrix(1:nrow(surf), nrow=3), homogeneous = FALSE)
Calculate which vertices are duplicates of which others:
verts <- apply(mesh$vb, 2, function(column) paste(column, collapse = " "))
firstcopy <- match(verts, verts)
Rewrite the indices to use the first copy. This is necessary, since the misc3d functions give a collection of disconnected triangles; we need to work out which are connected.
it <- as.numeric(mesh$it)
it <- firstcopy[it]
dim(it) <- dim(mesh$it)
mesh$it <- it
At this point, there are a lot of unused vertices in the mesh; if memory was a problem you might want to add a step to remove them. I'm going to skip that.
Add the normals
mesh <- addNormals(mesh)
Here are the before and after shots. Left is without normals, right is with them.
It's not quite as smooth as your solution using computed normals, but it's not always easy to find those.
There's an option smooth in the makeTriangles function:
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
I think the result is equivalent to #user2554330's solution, but this is more straightforward.
EDIT
The result is highly better with the rmarchingcubes package:
library(rmarchingcubes)
contour_shape <- contour3d(
griddata = voxel, level = 0,
x = x, y = y, z = z
)
library(rgl)
tmesh <- tmesh3d(
vertices = t(contour_shape[["vertices"]]),
indices = t(contour_shape[["triangles"]]),
normals = contour_shape[["normals"]],
homogeneous = FALSE
)
open3d(windowRect = c(50, 50, 562, 562))
view3d(zoom=0.8)
shade3d(tmesh, color = "darkred")

Resources