Anonymize spatial data

Anonymize spatial data - r

I'm putting together some training materials for a workshop on R for spatial data analysis in archaeology, and need to anonymize the true site locations (confidential information) while keeping the spatial integrity of the features within the site's bounding box.
My initial thought is to subtract the minimum X & Y values of the bounding box (all are in UTM coords) from all of the geometries - i.e., make the whole thing an arbitrary relative grid.
I have ESRI point and polygon shapefiles imported to R with sf, and some rasters loaded with terra but not sure how to go about doing the global spatial adjustment for either.
I also use QGIS, so some solution through there to pre-process the geometries before loading to R would work as well - but again, don't know how to do that.

Here is how you can do that.
Original data:
library(terra)
r <- rast(system.file("ex/elev.tif", package="terra"))
v <- vect(system.file("ex/lux.shp", package="terra"))
p <- spatSample(v, 10)
plot(r)
lines(v)
points(p)
Shift and rotate
# move to different location
rs <- shift(r, -10, 10)
vs <- shift(v, -10, 10)
ps <- shift(p, -10, 10)
# rotate
rst <- t(rs)
e <- ext(rst)
x0 = e[2] + (e[2]-e[2])/2
x1 = e[3] + (e[4]-e[3])/2
vst <- t(vs) |> spin(180, x0, x1)
pst <- t(ps) |> spin(180, x0, x1)
plot(rst)
lines(vst)
points(pst)
Of course if you know the true location of some places on the map, e.g. if you know this is Luxembourg, it is not too difficult to reverse engineer and get the original data.
You could also add some jitter to the points:
xy <- crds(pst)
xy <- xy + runif(length(xy), -0.05, 0.05)
pstj <- vect(xy)
Set a fake coordinate reference system to all the datasets, like
crs(pstj) <- "+proj=utm +zone=23"
And save the data like this
writeVector(pstj, "sites.shp")
writeRaster(rst, "grid.shp")

Related

Generate random points from a raster with a pre-established distance between the points in R

I want to generate random points over a raster, but I need these points to have a distance between them, like 10000 meters. I've seen that is possible to establish distance between points using the package 'spatstat', but I didn't understand how to use this package to generate points based on a raster.
This is what I did to generate the points without the distance criteria:
#Number of points
n.points = 63
#Generate random points from a raster
sampling = raster::sampleRandom(myraster,size=63,na.rm=TRUE,
cells=FALSE,xy=TRUE,sp=FALSE,asRaster=FALSE)
#Select just the coordinates and transform in a data frame
xy = as.data.frame(sampling[,c(1,2)])
#Spatialize these points
spdf = sp::SpatialPointsDataFrame(coords = xy,
data = as.data.frame(xy),
proj4string =
CRS("+proj=longlat +datum=WGS84 +no_defs"))

Here's some code that tries to select points according to your rule that the closest point to any point is exactly 10000m from it. This code ignores any issues of map projection and curvature of the Earth; that should be fine in a relatively small area, but not over a very large one.
r <- 10000 # distance between points
n.points <- 63
x <- matrix(NA, nrow = n.points, ncol = 2)
# Sample one point randomly in the region. I'll assume the region is
# +/- 100000 in each coordinate; if your region is some other shape,
# change this code:
x[1,] <- runif(2, -100000, 100000)
for (i in 2:n.points) {
# Count how many tries to find the next point
tries <- 0
repeat {
# Pick an existing point
j <- sample(1:(i-1), 1)
# Pick a direction from it
theta <- runif(1, 0, 2*pi)
# Find the point at distance r in that direction
y <- x[j,] + r*c(cos(theta), sin(theta))
# Is the point in the region?
if (any(y < -100000) || any(y > 100000))
next
# Calculate the distances to all other points
dists <- apply(x[1:(i-1), , drop=FALSE],
1,
function(row) sqrt(sum((row - y)^2)))
# If this point is far enough from existing points, keep it
if (all(dists >= r))
break
# If not, try again, but not forever...
tries <- tries + 1
if (tries > 100000)
stop("failed")
}
x[i,] <- y
}
plot(x)
Created on 2022-10-10 with reprex v2.0.2

Repeat for loop for all rows of a spatial points data frame

I want to calculate the shortestPath distance (using gDistance package) between a set of geographic coordinates, using a transition layer of the ocean to prevent 'movement' across land.
Here is how I created the transition layer:
library(raster); library(gdistance); library(maptools); library(rgdal); library(sp)
mapcrs <- "+proj=longlat +datum=WGS84 +no_defs"
data(wrld_simpl)
world <- wrld_simpl
worldshp <- spTransform(world, mapcrs)
ras <- raster(nrow=300,ncol=300)
crs(ras) <- crs(oceans.shp)
extent(ras) <- extent(worldshp)
landmask <- rasterize(worldshp, ras)
landras <- is.na(landmask)
tr <- transition(landras, transitionFunction = mean, directions = 8, symm = FALSE)
tr = geoCorrection(tr, scl=FALSE)
I then want to calculate the shortestPath distance between every coordinate in my dataset i.e. location 1 to location n, location 2 to location n etc.
Let's produce some hypothetical geographic coordinates and convert to spatial points
x <- rnorm(10, mean = -40, sd=5)
y <- rnorm(10, mean = 20, sd=5)
xy <- cbind(x,y); colnames(xy) <- c("lon","lat")
xy <- SpatialPoints(xy); projection(xy) <- projection(mapcrs)
Using the shortestPath function in gDistance, I can calculate the distance from the first coordinate (i.e. xy[1]) to all other xy coordinates, like so.
dist <- shortestPath(tr, origin = xy, goal = xy, output="SpatialLines")
I then tried to apply a for loop to sequentially calculate distance from location 1 to all other locations, and then calculating distance from location 2 to all other locations etc., which I wrote as follows:
for(i in seq_along(xy)){
AtoB <- shortestPath(tr, origin = xy[i,], goal=xy, output="SpatialLines")
i <- i+1
}
This, however, still only calculates the distances relative to the first xy spatial point and does not 'loop' for all subsequent rows. I don't know what I'm doing wrong. It's probably super-easy, but I'm struggling. Any help would be appreciated.
Thanks in advance,
Tony
---- UPDATE ----
We have come up with a bit of a work around (thanks Charley Clubley) but it still won't produce outputs for every spatial line. This will generate a matrix of distances.
The work around is as follows:
Using xy as a matrix, not spatial points
distances <- matrix(ncol=nrow(xy), nrow=nrow(xy))
xy_b <- xy ## Coords needs to be as a matrix (not spatial points)
## This generates an error indicating there are no more rows to delete once complete, but the computation works
for (i in 1:nrow(xy_b)) {
AtoB <-shortestPath(tr, xy_b, xy, output="SpatialLines")
length <- SpatialLinesLengths(AtoB)
distances[i, ] <- length
xy_b <- xy_b[-1,]
}

Streamlining binary rasterization in R

I have a few very small country-level polygon and point shapefiles that I would like to rasterize in R. The final product should be one global binary raster (indicating whether grid cell center is covered by a polygon / point lies within cell or not). My approach is to loop over the shapefiles and do the following for each shapefile:
# load shapefile
shp = sf::read_sf(shapefile_path)
# create a global raster template with resolution 0.0083
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res = gridsize)
# rasterize polygon or point shapefile to raster
rr = rasterize(shp, r, background = 0) #all grid cells that are not covered get 0
# convert to binary raster
values(rr)[values(rr)>0] = 1
Here, rr is the raster file where the polygons / points in shp are coded as 1 and all other grid cells are coded as 0. Afterwards, I take the sum over all rr to arrive at one global binary raster file including all polygons / points.
The final two steps are incredibly slow. In addition, I get RAM problems when I try to replace the all positive values in rr with 1 as the cell count is very large due to the fine resolution. I was wondering whether it is possible to come up with a smarter solution for what I'd like to achieve.
I have already found the fasterize package that has a speedy implementation of rasterize which works fine. I think it would be of great help if someone has a solution where rasterize directly returns a binary raster.

This is how you can do this better with raster. Note the value=1 argument, and also that that I changed your specification of the extent -- as what you do is probably not correct.
library(raster)
v <- shapefile(shapefile_path)
ext <- extent(-180, 180, -65, 75)
r <- raster(ext, res = 1/120)
rr <- rasterize(v, r, value=1, background = 0)
There is no need for your last step, but you could have done
rr <- clamp(rr, 0, 1)
# or
rr <- rr > 0
# or
rr <- reclassify(rr, cbind(1, Inf, 1))
raster::calc is not very efficient for simple arithmetic like this
It should be much faster to rasterize all vector data in one step, rather than in a loop, especially with large rasters like this (for which the program may need to write a temp file for each iteration).
To illustrate this solution with example data
library(raster)
cds1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60))
cds2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55))
cds3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45))
v <- spLines(cds1, cds2, cds3)
r <- raster(ncols=90, nrows=45)
r <- rasterize(v, r, field=1)
To speed things up, you can use terra (the replacement for raster)
library(raster)
f <- system.file("ex/lux.shp", package="terra")
v <- as.lines(vect(f))
r <- rast(v, ncol=75, nrow=100)
x <- rasterize(v, r, field=1)

Something that seems to work computationally and significantly improves computation time is to
Create one large shapefile shp instead of working with individual rasterized shapefiles.
Use the fasterize package to rasterize the merged shapefile.
Use raster::calc to avoid memory problems.
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res=gridsize)
rr = fasterize(shp, r, background = 0) #all not covered cells get 0, others get sum
# convert to binary raster
fun = function(x) {x[x>0] <- 1; return(x) }
r2 = raster::calc(rr, fun)

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

I need to calculate the magnitude-per-unit area of polylines that fall within a radius around each cell. Essentially I need to calculate a km/km2 road density within a 500m pixel search radius. ArcMap has a quick and easy tool that handles this, but I need a pure R solution.
Here is a link on how line density works: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/how-line-density-works.htm
And this is how to use it in a python (arcpy) script: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/line-density.htm
I currently execute a backwards approach using raster::focal function, calculating a density of burned in road features. I then convert the km2/km2 output to km/km2.
#Import libraries
library(raster)
library(rgdal)
library(gdalUtils)
#Read-in an already created raster mask (cells are all set to 0)
mask <- raster("x://path to raster mask...")
#Make a copy of the mask to burn features in, keeping the original untouched
roads_mask <- file.copy(mask, "x://output path ...//roads.tif")
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Rasterize spatial lines data frame ie. burn road features into mask
#Where road features get a value of 1, mask extent gets a value of 0
roads_raster <- gdalUtils::gdal_rasterize(src_datasource = roads_sldf,
dst_filename = "x://output path ...//roads.tif", b = 1,
burn = 1, l = "roads", output_Raster = TRUE)
#Run a 1km circular radius density function (be mindful of edge effects)
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
1km_rdDensity <- raster::focal(roads_raster, weight, fun=sum, filename = '',
na.rm=TRUE, pad=TRUE, NAonly=FALSE, overwrite=TRUE)
#Convert km2/km2 road density to km/km2
#Set up the moving window
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
#Count how many records in each column of the moving window are > 0
columnCount <- apply(weight,2,function(x) sum(x > 0))
#Get the sum of the column count
number_of_cells <- sum(columnCount)
#multiply km2/km2 density by number of cells in the moving window
step1 <- roads_raster * number_of_cells
#Rescale step1 output with respect to cell size(30m) and radius of a circle
final_rdDensity <- (step1*0.03)/3.14159265
#Write out final km/km2 road density raster
writeRaster(final_rdDensity,"X://path to output...", datatype = 'FLT4S', overwrite = TRUE)
After some more research I think I may be able to use a kernel function, however I don't want to apply the smoothing algorithm... As well the output is an 'im' object which I would need to write to as a 'tif'
#Import libraries
library(spatstat)
library(rgdal)
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Convert roads spatial lines data frame to psp object
psp_roads <- as.psp(roads_sldf)
#Apply kernel density, however this is where I am unsure of the arguments
road_density <- spatstat::density.psp(psp_roads, sigma = 0.01, eps = 500)
Cheers.

See this question https://gis.stackexchange.com/questions/138861/calculating-road-density-in-r-using-kernel-density
Tried to mark as a duplicate but doesn't work because the other Q is on gis stack exchange
Short answer is use spatstat.geom::pixellate()
I also needed spatstat.geom::as.psp(sf::st_geometry(x)) to convert an sf lines object to the correct format and maptools::as.im.RasterLayer(r) to convert a raster. I was able to convert the result to RasterLayer with raster::raster(pix_res)

Perhaps you can use terra::rasterizeGeom which is available in the development version that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f) |> as.lines()
r <- rast(v, res=.1)
Solution
x <- rasterizeGeom(v, r, fun="length", "km")
And then use focal sum, but you would not have a perfect circle.
What you could do instead, if your dataset is not too large, is create a circle for each grid cell and use intersect. Something like this:
p <- xyFromCell(r, 1:ncell(r)) |> vect(crs="+proj=longlat")
p$id <- 1:ncell(r)
b <- buffer(p, 10000)
values(v) <- NULL
i <- intersect(v, b)
x <- aggregate(perim(i), list(id=i$id), sum)
r[x$id] <- x[,2]

Calculating the distance between points in R

I looked through the questions that been asked but dealing with coordinates but couldn't find something can help me out with my problem.
I have dataset that contain ID, Speed, Time , List of Latitude & Longitude. ( dataset can be found in the link)
https://drive.google.com/file/d/1MJUvM5WEhua7Rt0lufCyugBdGSKaHMGZ/view?usp=sharing
I want to measure the distance between each point of Latitude & Longitude.
For example;
Latitude has: x1 ,x2 ,x3 ,...x1000
Longitude has: y1 ,y2 ,y3 ,..., y100
I want to measure the distance between (x1,y1) to all the points , and (x2,y2) to all the points, and so on.
The reason I'm doing this to know which point close to which and assign index to each location based on the distance.
if (x1, y1) is close to (x4,y4) so (x1, y1) will get the index A for example and (x4,y4) will get labeled as B. sort the points in order based on distance.
I tried gDistance function but showed error message: "package ‘gDistance’ is not available (for R version 3.4.3)"
and if I change the version to 3.3 library(rgeos) won't work !!
Any suggestions?
here's what I tried,
#requiring necessary packages:
library(sp) # vector data
library(rgeos) # geometry ops
#Read the data and transform them to spatial objects
d <- read.csv("ReadyData.csv")
sp.ReadData <- d
coordinates(sp.ReadyData) <- ~Longitude + Latitude
d <- gDistance(sp.ReadyData, byid= TRUE)
here's update my solution, I created spatial object and made spatial data frame as follow:
#Create spatial object:
lonlat <- cbind(spatial$Longitude, spatial$Latitude)
#Create a SpatialPoints object:
library(sp)
pts <- SpatialPoints(lonlat)
crdref <- CRS('+proj=longlat +datum=WGS84')
pts <- SpatialPoints(lonlat, proj4string=crdref)
# make spatial data frame
ptsdf <- SpatialPointsDataFrame(pts, data=spatial)
Now I'm trying to measure the Distance for longitude/latitude coordinates. I tried dist method but seems not working for me and tried pointDistance method:
gdis <- pointDistance(pts, lonlat=TRUE)
still not clear for me how this function can measure the distance, I need to figure out the distance so I can locate the point in the middle and assign numbers for each point based on its location from the middle point..

You can use raster::pointDistance or geosphere::distm among others functions.
Part of your example data (please avoid files in your questions):
d <- read.table(sep=",", text='
"OBU ID","Time Received","Speed","Latitude","Longitude"
"1",20,1479171686325,0,38.929596,-77.2478813
"2",20,1479171686341,0,38.929596,-77.2478813
"3",20,1479171698485,1.5,38.9295887,-77.2478945
"4",20,1479171704373,1,38.9295048,-77.247922
"5",20,1479171710373,0,38.9294865,-77.2479055
"6",20,1479171710373,0,38.9294865,-77.2479055
"7",20,1479171710373,0,38.9294865,-77.2479055
"8",20,1479171716373,2,38.9294773,-77.2478712
"9",20,1479171716374,2,38.9294773,-77.2478712
"10",20,1479171722373,1.32,38.9294773,-77.2477417')
Solution:
library(raster)
m <- pointDistance(d[, c("Longitude", "Latitude")], lonlat=TRUE)
To get the nearest point to each point, you can do
mm <- as.matrix(as.dist(m))
diag(mm) <- NA
i <- apply(mm, 1, which.min)
The point pairs
p <- cbind(1:nrow(mm), i)
To get the distances, you can do:
mm[p]
Or do this:
apply(mm, 1, min, na.rm=TRUE)
Note that rgeos::gDistance is for planar data, not for longitude/latitude data.
Here is a similar question/answer with some illustration.
our data set is too large to make a single distance matrix. You can process your data in chunks to with that. Here I am showing that with a rather small chunk size of 4 rows. Make this number much bigger to speed up processing time.
library(geosphere)
chunk <- 4 # rows
start <- seq(1, nrow(d), chunk)
end <- c(start[-1], nrow(d))
x <- d[, c("Longitude", "Latitude")]
r <- list()
for (i in 1:length(start)) {
y <- x[start[i]:end[i], , drop=FALSE]
m <- distm(y, x)
m[cbind(1:nrow(m), start[i]:end)] <- NA
r[[i]] <- apply(m, 1, which.min)
}
r <- unlist(r)
r
# [1] 2 1 1 5 6 6 5 5 9 8 8 8
So for your data:
d <- read.csv("ReadyData.csv")
chunk <- 100 # rows
# etc
This will take a long time.
An alternative approach:
library(spdep)
x <- as.matrix(d[, c("Longitude", "Latitude")])
k <- as.vector(knearneigh(x, k=1, longlat=TRUE)$nn)

Assuming you have p1 as spatialpoints of x and p2 as spatialpoints of y, to get the index of the nearest other point:
ReadyData$cloDist <- apply(gDistance(p1, p2, byid=TRUE), 1, which.min)
If you have the same coordinate in the list you will get an index of the point itself since the closest place to itself is itself. An easy trick to avoid that is to use the second farthest distance as reference with a quick function:
f_which.min <- function(vec, idx) sort(vec, index.return = TRUE)$ix[idx]
ReadyData$cloDist2 <- apply(gDistance(p1, p2, byid=TRUE), 1, f_which.min,
idx = 2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Anonymize spatial data - r

Related

Generate random points from a raster with a pre-established distance between the points in R

Repeat for loop for all rows of a spatial points data frame

Streamlining binary rasterization in R

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

Calculating the distance between points in R

Categories

Resources