I am working with dataframe which has lat and long coordinates. I want to cluster those coordinates based on their location closeness in R and then plot it on some map.
I am able to plot the points on map with leaflet package,which gives me nice map layout and lat and long coordinates. Just don't know how to cluster those points lets say in 3 clusters. Will k-means clustering appropriate for this kind of problems? Or do I have to apply some distance metrics and then use clustering algorithm. I am bit confused with online literature available on clustering of geographic data.
Here is what I am doing in R
map <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=df_final$order_long, lat=df_final$order_lat)
map
Please help.
Can you help clarify your question: Do you already know the lon/lat coordinates about which you would like to cluster your data or are you trying to determine the ideal centroids based on your data?
If you know the coordinates of each centroid, then you could just run your data through the kmeans algorithm with a max iteration of one. The following would do that:
set.seed(1)
centroids <- data.frame(lat=1:3, lon=4:6) # Input the coordinates for your centroids here
locations <- data.frame(lat=runif(50,1,3), lon=runif(50,4,6))
kmeans(locations, centroids, iter.max=1) # Set your initial centroids and then iter once
If you do not know the coordinates and want kmeans to find them for you, then just increase iter.max or leave it as default (10).
This question has been asked several times here, please use search.
k-means is a bad choice for such data:
how do you find k?
k-means uses squared-Euclidean but you want a spherical geo-distance
k-means is sensitive to outliers
If you google a little bit, you will find examples why this does not work well. Instead, have a look at OPTICS for example.
Related
I am trying to split a concave polygon into convex polygons using r.
I am trying to figure out how to successfully accomplish this for one polygon with the hopes of implementing this on a large number of polygons in an automated way.
The only way I could think of so far was to use triangulation to break this shape into several smaller shapes, then group those into some minimized number convex polygons.
library(sp)
library(rgdal)
library(sf)
files <- list.files("~/Cluster polygons 2020",pattern=".shp", full.names=TRUE)
cluster=readOGR(files[1])
spatstat::is.convex(maptools::as.owin.SpatialPolygons(cluster[1,])) #CHECK IF CONVEX
[1] FALSE
plygn=sfdct::ct_triangulate(sf::st_as_sf(cluster[1,]),D=TRUE)
plygn=st_collection_extract(plygn, "POLYGON")
plygn=as_Spatial(plygn)
length(plygn) #HOW MANY TRIANGLES GENERATED?
[1] 58
This is as far as I have gotten. Is there a clever/principled way group the triangles into the smallest number of groups and then merge them so the final product is a set of convex polygons? Or is there an entirely better approach to this problem?
I appreciate the help. Here is a link to the shapefile
I'm still somewhat new to R and the sf package...
I have two sets of multipolygon data that I am trying to analyze. My first set of polygons (fires) contains hundreds of wildfire perimeters. The second set (towns) contains hundreds of urban areas boundaries.
For each fire, I would like to calculate the distance to the closest town (fire polygon edge to closest town polygon edge), and add that as a field to each fire.
So far I have mostly been using the sf package for spatial data. In my searches, I can only find minimum distance methods for polygons to points, points to points, lines to points, etc. but cannot seem to find polygon to polygon examples. Any help to send me in the right direction would be much appreciated! Thank you.
#TimSalabim Thank you for sending me in the right direction. I was able to accomplish what I was after. Maybe not the most elegant solution, but it worked.
# create an index of the nearest feature
index <- st_nearest_feature(x = poly1, y = poly2)
# slice based on the index
poly2 <- poly2 %>% slice(index)
# calculate distance between polygons
poly_dist <- st_distance(x = poly1, y= poly2, by_element = TRUE)
# add the distance calculations to the fire polygons
poly1$distance <- poly_dist
I have two data sets with spatial points (in .csv format): data1 with 220 spatial points with latitude and longitude and data2 with 80 spatial points with latitude and longitude. For data2 I have one covariate indicated the genetic origin of each points. Spatial points in both datasets are not exactly the same.
I would like to assign the genetic origin for spatial points in data1. It seems that I need to define around each point in data2 a square (or other) to be able to associate a genetic origin at each points in data1.
I am using R and I think packages as raster or sp may be useful.
Thanks for your help.
Best,
Marie.
You need to make your mind up about how you want to assign "genetic origin". One approach that seem to be hinting at is assigning it to its nearest neighbor.
When asking a question you should always include some example data.
library(raster)
d1 <- data.frame(lon=c(1,5,55,31), lat=c(3,7,20,22))
d2 <- data.frame(lon=c(4,2,8,65,5,4), lat=c(50,-90,20,32,10,10), origin=LETTERS[1:6], stringsAsFactors=FALSE)
Here is how you can assign origin based on the nearest known origin
# make sure your data are (x,y) or (longitude,latitude), not the reverse
pd <- pointDistance(d1, d2[,1:2], lonlat=TRUE)
nd <- apply(pd, 1, which.min)
d1$origin <- d2$origin[nd]
So basically I have some spatial data, which I've found weighted matrix by distance dnearneigh in R and I am wondering if I can generate an interactive plot of the link distribution by changing the distance variable for the weighted matrix.
The data set:
CA.poly <- readShapePoly('CaliforniaCounty.shp')
This is a shapefile for California county and using this I can generate a weight matrix based on the distance of each county.
coords<-coordinates(CA.poly)
W_dist<-dnearneigh(coords,0,1.5,longlat = FALSE)
And after generating the matrix I can plot the link distribution by using:
plot(W_dist,coordinates(CA.poly))
This will show a network of counties where two counties are connected if their distance (between centroids) are less than 1.5 km.
All the codes are in a Rmd file and I am wondering is there a way to output in html an interactive plot where you (user) can change the distance parameter (change 1.5 km to 1 km for example) and the graph will change.
I looked up methods like using shiny and plotly but I don't think they suit my goal. Any suggestions?
I have a shapefile with 50+ different polygonal shapes (representing 50+ different regions) and 10,000+ data points that are supposed to be present in one of the regions. The thing is, the 10,000+ points are already coded with a region they are supposed to be in, and I want to figure out how far they are from this coded region in geo-spatial distance.
My current approach (code below), which involves converting shapefiles to owin objects from the sp library and using distfun gets me distances in lat,long euclidean space. But I would like to get geo-spatial distances (eventually to convert to km). Where should I go next?
#basically cribbed from http://cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf (page 9)
shp <- readShapeSpatial("myShapeFile.shp", proj4string=CRS("+proj=longlat +datum=WGS84"))
regions <- lapply(slot(shp, "polygons"), function(x) SpatialPolygons(list(x)))
windows <- lapply(regions, as.owin)
# need to convert this to geo distance
distance_from_region <- function(regionData, regionName) {
w <- windows[[regionName]]
regionData$dists <- distfun(w)(regionData$lat, regionData$long)
regionData
}
I'd project the data to a euclidean (or near euclidean) coordinate system - unless you are spanning a large chunk of the globe then this is feasible. Use spTransform from maptools or sp or rgdal (I forget which) and convert to a UTM zone near your data.
You also might do better with package rgeos and the gDistance function:
gDistance by default returns the cartesian minimum distance
between the two geometries in the units of the current projection.
If your data is over a large chunk of globe then... tricky... 42...
Barry