I use density.lpp for kernel density estimation. I want to pick specific segment in that and plot the estimation through chosen segment. As an example, I have a road which is a combination of two segments. each segments have different length so I don't know how many pieces each of them are divided by.
here is the locations of vertices and road segment ids.
https://www.dropbox.com/s/fmuul0b6lus279c/R.csv?dl=0
here is the code I used to create spatial lines data frame and random points on the network and get density estimation.
Is there a way to know how many pieces each segment divided by? OR if I want to plot locations vs estimation for chosen segment how can I do? Using dimyx=100 created 199 estimation points but I don't know how many of them belongs to Swid=1 or Swid=2.
One approached I used was, using gDistance it works fine in this problem because these segments connected to one directions however, when there is 4 ways connection, some of the lambda values connects to another segments which is not belongs to that segment. I provided picture and circled 2 points, when I used gDistance, those points connected to other segments. Any ideas?
R=read.csv("R.csv",header=T,sep=",")
R2.1=dplyr::select(R, X01,Y01,Swid)
coordinates(R2.1) = c("X01", "Y01")
proj4string(R2.1)=CRS("+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
plot(R2.1,main="nodes on the road")
##
LineXX <- lapply(split(R2.1, R2.1$Swid), function(x) Lines(list(Line(coordinates(x))), x$Swid[1L]))
##
linesXY <- SpatialLines(LineXX)
data <- data.frame(Swid = unique(R2.1$Swid))
rownames(data) <- data$Swid
lxy <- SpatialLinesDataFrame(linesXY, data)
proj4string(lxy)=proj4string(trtrtt.original)
W.1=as.linnet.SpatialLines(lxy)
Rand1=runiflpp(250, W.1)
Rand1XY=coords(Rand1)[,1:2]
W2=owin(xrange=c(142751.98, 214311.26), yrange=c(3353111, 3399329))
Trpp=ppp(x=Rand1XY$x, y=Rand1XY$y, window=W2) ### planar point object
L.orig=lpp(Trpp,W.1) # discrete
plot(L.orig,main="Original with accidents")
S1=bw.scott(L.orig)[1] # in case to change bandwitdh
Try274=density(L.orig,S1,distance="path",continuous=TRUE,dimyx=100)
L=as.linnet(L.orig)
length(Try274[!is.na(Try274$v)])
[1] 199
This is a question about the spatstat package.
The result of density.lpp is an object of class linim. For any such object, you can use as.data.frame to extract the data. This yields a data frame with one row for each sample point on the network. For each sample point, the data are xc, yc (coordinates of nearest pixel centre), x,y (exact coordinates of sample point on network), seg (identifier of segment), tp (relative position along segment) and values (the density value). If you split the data frame by the seg column, you will get the data for invididual segments of the network.
However, it seems that you may want information about the internal workings of density.lpp. In order to achieve adequate accuracy during the computation phase, density.lpp subdivides each network segment into many short segments (using a complex set of rules). This information is lost when the final results are discretised into a linim object and returned. The attribute "dx" reports the length of the short segments that were used in the computation phase, but that's all.
If you email me directly I can show you how to extract the internal information.
Related
I am trying to perform DBSCAN clustering on the data https://www.kaggle.com/arjunbhasin2013/ccdata. I have cleaned the data and applied the algorithm.
data1 <- read.csv('C:\\Users\\write\\Documents\\R\\data\\Project\\Clustering\\CC GENERAL.csv')
head(data1)
data1 <- data1[,2:18]
dim(data1)
colnames(data1)
head(data1,2)
#to check if data has empty col or rows
library(purrr)
is_empty(data1)
#to check if data has duplicates
library(dplyr)
any(duplicated(data1))
#to check if data has NA values
any(is.na(data1))
data1 <- na.omit(data1)
any(is.na(data1))
dim(data1)
Algorithm was applied as follows.
#DBSCAN
data1 <- scale(data1)
library(fpc)
library(dbscan)
set.seed(500)
#to find optimal eps
kNNdistplot(data1, k = 34)
abline(h = 4, lty = 3)
The figure shows the 'knee' to identify the 'eps' value. Since there are 17 attributes to be considered for clustering, I have taken k=17*2 =34.
db <- dbscan(data1,eps = 4,minPts = 34)
db
The result I obtained is "The clustering contains 1 cluster(s) and 147 noise points."
No matter whatever values I change for eps and minPts the result is same.
Can anyone tell where I have gone wrong?
Thanks in advance.
You have two options:
Increase the radius of your center points (given by the epsilon parameter)
Decrease the minimum number of points (minPts) to define a center point.
I would start by decreasing the minPts parameter, since I think it is very high and since it does not find points within that radius, it does not group more points within a group
A typical problem with using DBSCAN (and clustering in general) is that real data typically does not fall into nice clusters, but forms one connected point cloud. In this case, DBSCAN will always find only a single cluster. You can check this with several methods. The most direct method would be to use a pairs plot (a scatterplot matrix):
plot(as.data.frame(data1))
Since you have many variables, the scatterplot pannels are very small, but you can see that the points are very close together in almost all pannels. DBSCAN will connect all points in these dense areas into a single cluster. k-means will just partition the dense area.
Another option is to check for clusterability with methods like VAT or iVAT (https://link.springer.com/chapter/10.1007/978-3-642-13657-3_5).
library("seriation")
## calculate distances for a small sample
d <- dist(data1[sample(seq(nrow(data1)), size = 1000), ])
iVAT(d)
You will see that the plot shows no block structure around the diagonal indicating that clustering will not find much.
To improve clustering, you need to work on the data. You can remove irrelevant variables, you may have very skewed variables that should be transformed first. You could also try non-linear embedding before clustering.
I want to assess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
I have some points on a map.
I can draw a simple 400 m buffer around them.
I want to determine which buffers overlap and then count the number of overlaps.
This number of overlaps should relate back to the original point so I can see which point has the highest number of overlaps and therefore if I were to walk 400 m from that point I could determine how many other points I could get to.
I've asked this question in GIS overflow, but I'm not sure it's going to get answered for ArcGIS and I think I'd prefer to do the work in R.
This is what I'm aiming for
https://www.newham.gov.uk/Documents/Environment%20and%20planning/EB01.%20Evidence%20Base%20-%20Cumulative%20Impact%20V2.pdf
To simplify here's some code
# load packages
library(easypackages)
needed<-c("sf","raster","dplyr","spData","rgdal",
"tmap","leaflet","mapview","tmaptools","wesanderson","DataExplorer","readxl",
"sp" ,"rgisws","viridis","ggthemes","scales","tidyverse","lubridate","phecharts","stringr")
easypackages::libraries(needed)
## read in csv data; first column is assumed to be Easting and second Northing
polls<-st_as_sf(read.csv(url("https://www.caerphilly.gov.uk/CaerphillyDocs/FOI/Datasets_polling_stations_csv.aspx")),
coords = c("Easting","Northing"),crs = 27700)
polls_buffer_400<-st_buffer(plls,400)
polls_intersection<-st_intersection(x=polls_buffer_400,y=polls_buffer_400)
plot(polls_intersection$geometry)
That should show the overlapping buffers around the polling stations.
What I'd like to do is count the number of overlaps which is done here:
polls_intersection_grouped<-polls_intersection%>%group_by(Ballot.Box.Polling.Station)%>%count()
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things? How can I :
asess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
It's probably terribly bad form but here's my original GIS question
https://gis.stackexchange.com/questions/328577/buffer-analysis-of-points-counting-intersects-of-resulting-polygons
Edit:
this gives the intersections different colours which is great.
plot(polls_intersection$geometry,col = sf.colors(categorical = TRUE, alpha = .5))
summary(lengths(st_intersects(polls_intersection)))
What am I colouring here? I mean it looks nice but I really don't know what I'm doing.
How can I : asess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
Here is how to add a column to your initial sfc of pollings stations that tells you how many polling stations are within 400m of each feature in that sfc.
Note that the minimum value is 1 because a polling station is always within 400m of itself.
# n_neighbors shows how many polling stations are within 400m
polls %>%
mutate(n_neighbors = lengths(st_is_within_distance(polls, dist = 400)))
Similarly, for your sfc collection of intersecting polygons, you could add a column that counts the number of buffer polygons that contain each intersection polygon:
polls_intersection %>%
mutate(n_overlaps = lengths(st_within(geometry, polls_buffer_400)))
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things?
If you want to plot these things I highly recommend using ggplot2. It makes it very clear how you associate an attribute like colour with a specific variable.
For example, here is an example mapping the alpha (transparency) of each polygon to a scaled version of the n_overlaps column:
library(ggplot2)
polls_intersection %>%
mutate(n_overlaps = lengths(st_covered_by(geometry, polls_buffer_400))) %>%
ggplot() +
geom_sf(aes(alpha = 0.2*n_overlaps), fill = "red")
Lastly, there should be a better way to generate your intersecting polygons that already counts overlaps. This is built in to the st_intersection function for finding intersections of sfc objects with themselves.
However, your data in particular generates an error when you try to do this:
st_intersection(polls_buffer_400)
# > Error in CPL_nary_intersection(x) :
#> Evaluation error: TopologyException: side location conflict at 315321.69159061194 199694.6971799387.
I don't know what a "side location conflict" is. Maybe #edzer could help with that. However, most subsets of your data do not contain that conflict. For example:
# this version adds an n.overlaps column automatically:
st_intersection(polls_buffer_400[1:10,]) %>%
ggplot() + geom_sf(aes(alpha = 0.2*n.overlaps), fill = "red")
I'm trying to use distanceFromPoints function in raster package as:
distanceFromPoints(object,xy,...)
Where, object is raster and xy is matrix of x and y coordinates
Now, if my raster has, for example, 1000 cells and xy represents one point, I get 1000 values representing distances between xy and each raster cell. my problem is when xy has multiple coordinates, e.g., 10 points. the function description indicates that xy can be multiple points but when I run this function with multiple XY points, I still get only 1000 values while I'm expecting 1000 values for each coordinate in XY. How does this work?
Thanks!
using distanceFromPoints on multiple points gives a single value for each raster cell, which is the distance to the nearest point to that cell.
To create raster layers giving the distance to each point separately, you can use apply
a reproducible example:
r = raster(matrix(nrow = 10, ncol = 10))
p = data.frame(x=runif(5), y=runif(5))
dp = apply(p, 1, function(p) distanceFromPoints(r,p))
This gives a list of raster layers, each having the distance to one point
# for example, 1st raster in the list has the distance to the 1st point
plot(dp[[1]])
points(p[1,])
For convenience, you can convert this list into a raster stack
st = stack(dp)
plot(st)
A final word of caution:
It should be noted that the raster objects thus created do not really contain any more information than the list of points from which they are generated. As such, they are a computationally- and memory-expensive way to store that information. I can't easily think of any situation in which this would be a sensible way to solve a specific question. Therefore, it may be worth thinking again about the reasons you want these raster layers, and asking whether there may be a more efficient way to solve you overall problem.
I have a time series dataset with spatial data (x,y coordinates). Each point is static in location, but its value varies over time, ie. each point has its own unique function. I want to assign these functions as a mark, so I can plot the point pattern with each individual time series as a plotting symbol.
This is an exploratory step to eventually perform some spatial functional data analysis.
As an example, I want something like Figure 2 published in this article:
*Delicado,P., R. Giraldo, C. Comas, and J. Mateu. 2010. Spatial Functional Data: Some Recent Contibutions. Environmetrics 21:224-239
I'm having trouble posting an image of the figure
1) Working in R with ggplot2, I can plot a line of change in quant of each id over time:
(Fake example dataset, where x and y are Carteian coordinates, id is an individual observation, and quant are values of id at each year):
x<-c(1,1,1,2,2,2,3,3,3)
y<-c(1,1,1,2,2,2,3,3,3)
year<-c(1,2,3,1,2,3,1,2,3)
id<-c("a","a","a","b","b","b","c","c","c")
quant<-c(5,2,4,2,4,2,4,4,6)
allData<-data.frame(x,y,year,id,quant)
ggplot(allData,aes(x=year,y=quant, group=id))+geom_line()
2) Or I can plot the geographic point pattern of id:
ggplot(allData,aes(x=x,y=y,color=id))+geom_point()
I want to plot the graph from (2), but use the line plots from (1) as the point symbols (marks). Any suggestions?
I have generated a connectivity matrix representing a network of geographical points connected by ocean currents. Each point releases particles that are received by the others. The number of particles released and received by each point is summarized in this square matrix. For example an element Aij of the matrix correspond to the amount of particles emitted by the ith point and received by the jth.
My purpose is to be able to plot this as a network such that each point constitutes a vertex and the connections between two points constitute an edge. I would like those edges to be of different colors according to the amount of particles exchanged. Those have to be marked by an arrow.
I could plot those points according to their geographic coordinates and I could plot those edges the way I wanted. My only concern is now how to add a legend relating the color of the edges with the amount of particles they represent.
Can anyone help me with that? Here is my code so far:
library(ggplot2)
library(plyr)
library(sp)
library(statnet)
connectivityMatrix <- as.matrix(read.table(file='settlementMatrix001920.dat'))
coordinates <- as.matrix(read.table(file='NoTakeReefs_center_LonLat.dat'))
net <- as.network(connectivityMatrix, matrix.type = "adjacency", directed = TRUE)
minX<-min(coordinates[,1])#-0.5
maxX<-max(coordinates[,1])#+0.5
minY<-min(coordinates[,2])#-0.5
maxY<-max(coordinates[,2])#+0.5
p<-plot(net, coord=coordinates,xlim=c(minX,maxX),ylim=c(minY,maxY),edge.col=connectivityMatrix,object.scale=0.01)
without having your real data, here as a sample example
matrixValues<-matrix(c(0,1,2,3,
0,0,0,0,
0,0,0,0,
0,0,0,0),ncol=4)
net<-as.network(matrixValues)
plot(net,edge.col=matrixValues)
# plot legend using non-zero values from matrix
legend(1,1,fill = unique(as.vector(matrixValues[matrixValues>0])),
legend=unique(as.vector(matrixValues[matrixValues>0])))
you may have to adjust the first two coordinate values in legend to draw it where you need on the plot. You could also construct your network slightly differently so that the values were loaded in from the matrix (see the ignore.eval argument to as.network(). In which case you would use edge.col='myValueName' for the plot command and get.edge.attribute(net,'myValueName') to feed the values into legend.