Computing HR using Kernel Density. Grid issues? - r

My dataset includes animal locations and id. What I am trying to do is that I am trying to compute Home Range using kernel density function. As my dataset was huge, I tried it splitting the dataset into two.
> library(sp)
> library(adehabitatHR)
> head(temp)
id x y
92 10 480147.6 3112738
93 10 480081.6 3112663
94 10 479992.6 3112667
95 10 479972.4 3112759
96 10 479931.7 3112758
97 10 479970.7 3112730
Each dataset has 99586 observations which include 190 unique IDs. As a result, I am unable to produce a reproducible dataset.
When I try to use the kernelUD function, I have no problems computing. When I try to get the 95% of HR, it gives me error.
> kernel_temp<- kernelUD(temp)
> kernel_95 <- getverticeshr(kernel_temp, percent = 95)
Error in getverticeshr.estUD(x[[i]], percent, ida = names(x)[i], unin, :
The grid is too small to allow the estimation of home-range.
You should rerun kernelUD with a larger extent parameter
So I search about this problem and I find out a solution. I pass the grid function now with the given grid for the points and I get another error for creating the grid coordinates.
> x <- seq(min(temp$x),max(temp$x),by=1.)
> y <- seq(min(temp$y),max(temp$y),by=1.)
> xy <- expand.grid(x=x,y=y)
> gc()
> coordinates(xy) <- ~x+y
Error: cannot allocate vector of size 6.7 Gb
I have a windows system with 32gb ram, I have been checking my processes and I see that I have RAM remaining but R is unable to allot.
Moving ahead I passed a random grid value just to see if it worked, but still the same error.
> kernel_temp<- kernelUD(temp, grid = 1000)
> kernel_95 <- getverticeshr(kernel_temp, percent = 95)
Error in getverticeshr.estUD(x[[i]], percent, ida = names(x)[i], unin, :
The grid is too small to allow the estimation of home-range.
You should rerun kernelUD with a larger extent parameter
When I expand the xy grid- I see my observations are
which is huge. I wanted to know if there was any easier way of computing the HR or passing the grid function without the grid being so huge?
Any help is greatly appriciated. :)
EDIT-
I tried extent = 2 and having the same problem.
> kernel_temp<- kernelUD(temp, extent = 2)
> kernel_95 <- getverticeshr(kernel_temp, percent = 95)
Error in getverticeshr.estUD(x[[i]], percent, ida = names(x)[i], unin, :
The grid is too small to allow the estimation of home-range.
You should rerun kernelUD with a larger extent parameter

After a few more consultations from friends and colleagues, I found the answer.
When you have numerous locations, the best way to calculate HR with KDE is by playing around with the grid size and the extent. Lower the grid and increase the extent is the best answer for this.
In this case, I was able to calculate HR with-
kernelUD(locs_year,grid = 500, h="href", extent = 5)
I tried with multiple methods grid=1000 but still was not able to. grid = 500, extent = 5 was the sweet spot.!
Thank you for your help.! And not sure but someday, it this answer mind be useful to someone. :)

Related

DBSCAN Clustering returning single cluster with noise points

I am trying to perform DBSCAN clustering on the data https://www.kaggle.com/arjunbhasin2013/ccdata. I have cleaned the data and applied the algorithm.
data1 <- read.csv('C:\\Users\\write\\Documents\\R\\data\\Project\\Clustering\\CC GENERAL.csv')
head(data1)
data1 <- data1[,2:18]
dim(data1)
colnames(data1)
head(data1,2)
#to check if data has empty col or rows
library(purrr)
is_empty(data1)
#to check if data has duplicates
library(dplyr)
any(duplicated(data1))
#to check if data has NA values
any(is.na(data1))
data1 <- na.omit(data1)
any(is.na(data1))
dim(data1)
Algorithm was applied as follows.
#DBSCAN
data1 <- scale(data1)
library(fpc)
library(dbscan)
set.seed(500)
#to find optimal eps
kNNdistplot(data1, k = 34)
abline(h = 4, lty = 3)
The figure shows the 'knee' to identify the 'eps' value. Since there are 17 attributes to be considered for clustering, I have taken k=17*2 =34.
db <- dbscan(data1,eps = 4,minPts = 34)
db
The result I obtained is "The clustering contains 1 cluster(s) and 147 noise points."
No matter whatever values I change for eps and minPts the result is same.
Can anyone tell where I have gone wrong?
Thanks in advance.
You have two options:
Increase the radius of your center points (given by the epsilon parameter)
Decrease the minimum number of points (minPts) to define a center point.
I would start by decreasing the minPts parameter, since I think it is very high and since it does not find points within that radius, it does not group more points within a group
A typical problem with using DBSCAN (and clustering in general) is that real data typically does not fall into nice clusters, but forms one connected point cloud. In this case, DBSCAN will always find only a single cluster. You can check this with several methods. The most direct method would be to use a pairs plot (a scatterplot matrix):
plot(as.data.frame(data1))
Since you have many variables, the scatterplot pannels are very small, but you can see that the points are very close together in almost all pannels. DBSCAN will connect all points in these dense areas into a single cluster. k-means will just partition the dense area.
Another option is to check for clusterability with methods like VAT or iVAT (https://link.springer.com/chapter/10.1007/978-3-642-13657-3_5).
library("seriation")
## calculate distances for a small sample
d <- dist(data1[sample(seq(nrow(data1)), size = 1000), ])
iVAT(d)
You will see that the plot shows no block structure around the diagonal indicating that clustering will not find much.
To improve clustering, you need to work on the data. You can remove irrelevant variables, you may have very skewed variables that should be transformed first. You could also try non-linear embedding before clustering.

R - spatstat: Calculate density for a new point

Is it possible to use spatstat to estimate the intensity function for a give ppp object and calculate its value considering a new point? For example, can I evaluate D at new_point:
# packages
library(spatstat)
# define a random point within Window(swedishpines)
new_point <- ppp(x = 45, y = 45, window = Window(swedishpines))
# estimate density
(D <- density(swedishpines))
#> real-valued pixel image
#> 128 x 128 pixel array (ny, nx)
#> enclosing rectangle: [0, 96] x [0, 100] units (one unit = 0.1 metres)
Created on 2021-03-30 by the reprex package (v1.0.0)
I was thinking that maybe I can superimpose() the two ppp objects (i.e. swedishpines and new_point) and then run density setting at = "points" and weights = c(rep(1, points(swedishpines)), 0) but I'm not sure if that's the suggested approach (and I'm not sure if the appended point is ignored during the estimation process).
I know that it may sound like a trivial question, but I read some docs and didn't find an answer or a solution.
There are two ways to do this.
The first is simply to take the pixel image of intensity, and extract the pixel values at the desired locations using [:
D <- density(swedishpines)
v <- D[new_points]
See the help for density.ppp and [.im.
The other way is to use densityfun:
f <- densityfun(swedishpines)
v <- f(new_points)
See the help for densityfun.ppp
The first route is more efficient and the second way is more accurate.
Technical issue: if some of the new_points could lie outside the window of swedishpines then the value at these points is (mathematically) undefined. Both of the methods described above will simply ignore such points, and the resulting vector v will be shorter than the number of new points. If you need to handle this continengcy, the easiest way is to use D[new_points, drop=FALSE] which returns NA values for such locations.

Normalizing an R stars object by grid area?

first post :)
I've been transitioning my R code from sp() to sf()/stars(), and one thing I'm still trying to grasp is accounting for the area in my grids.
Here's an example code to explain what I mean.
library(stars)
library(tidyverse)
# Reading in an example tif file, from stars() vignette
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
x
# Get areas for each grid of the x object. Returns stars object with "area" in units of [m^2]
x_area <- st_area(x)
x_area
I tried loosely adopting code from this vignette (https://github.com/r-spatial/stars/blob/master/vignettes/stars5.Rmd) to divide each value in x by it's grid area, and it's not working as expected (perhaps because my objects are stars and not sf?)
x$test1 = x$L7_ETMs.tif / x_area # Some computationally intensive calculation seems to happen, but doesn't produce the results I expect?
x$test1 = x$L7_ETMs.tif / x_area$area # Throws error, "non-conformable arrays"
What does seem to work is the following.
x %>%
mutate(test1 = L7_ETMs.tif / units::set_units(as.numeric(x_area$area), m^2))
Here are the concerns I have with this code.
I worry that as I turn the x_area$area (a matrix, areas in lat/lon) into a numeric vector, I may mess up the lat/lon matching between the grid and it's area. I did some rough testing to see if the areas match up the way I expect them to, but can't escape the worry that this could lead to errors that are difficult to catch.
It just doesn't seem clean that I start with "x_area" in the correct units, only to remove then set the units again during the computation.
Can someone suggest a "cleaner" implementation for what I'm trying to do, i.e. multiplying or dividing grids by its area while maintaining units throughout? Or convince me that the code I have is fine?
Thanks!
I do not know how to improve the stars code, but you can compare the results you get with this
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
library(terra)
r <- rast(tif)
a <- cellSize(r, sum=FALSE)
x <- r / a
With planar data you could do this when it is safe to assume there is no distortion (generally not the case, but it can be the case)
y <- r / prod(res(r))

Smoothing directional (angular) data in R

I'm trying to deal with some motion analysis software tracking errors after the data is exported. For some frames the direction is rotated by 180 degrees from the "true" direction.
I would like to smooth the data set so that when the direction changes by ~180 in a single frame, it is transformed to reflect the actual angle.
Is anyone aware of a way to solve this using any of the circular statistics packages in R language such as CircStats? Alternatively, I could imagine a script that checks if frame to frame variation is near 180 degrees, subtracts 180 if this is true, then moves to the next frame. Does this sound like a reasonable approach and would it be easily implemented in R?
I'm afraid I don't have the rep to upload a figure describing the problem (it's very easy to see), but here is a example dataset.
Thanks for the help. I've been a longtime user of stack overflow but have never failed to find my answer before needing to ask before.
David
edit - attached image
It was an interesting problem to solve! It needs to be iterative since whenever a value is changed, it can solve a problem but create another... Let me know if it does the trick.
threshold <- 90
correction <- 180
dat <- read.table("angle_data.txt", header=TRUE)
dat <- ts(dat)
repeat {
diffs <- dat - lag(dat, k = 1)
probl <- which(abs(diffs[,2]) > threshold)
if(length(probl)==0)
break
obs.1 <- dat[probl[1], 2]
obs.2 <- dat[probl[1] + 1, 2]
dat[probl[1] + 1, 2] <- obs.2 + sign(obs.1 - obs.2) * 180
}

dimensions of kde object from ks package, R

I am using the ks package from R to estimate 2d space utilization using distance and depth information. What I would like to do is to use the 95% contour output to get the maximum vertical and horizontal distance. So essentially, I want to be able to get the dimensions or measurements of the resulting 95% contour.
Here is a piece of code with as an example,
require(ks)
dist<-c(1650,1300,3713,3718)
depth<-c(22,19.5,20.5,8.60)
dd<-data.frame(cbind(dist,depth))
## auto bandwidth selection
H.pi2<-Hpi(dd,binned=TRUE)*1
ddhat<-kde(dd,H=H.pi2)
plot(ddhat,cont=c(95),lwd=1.5,display="filled.contour2",col=c(NA,"palegreen"),
xlab="",ylab="",las=1,ann=F,bty="l",xaxs="i",yaxs="i",
xlim=c(0,max(dd[,1]+dd[,1]*0.4)),ylim=c(60,-3))
Any information about how to do this will be very helpful. Thanks in advance,
To create a 95% contour polygon from your 'kde' object:
library(raster)
im.kde <- image2Grid (list(x = ddhat$eval.points[[1]], y = ddhat$eval.points[[2]], z = ddhat$estimate))
kr <- raster(im.kde)
It is likely that one will want to resample this raster to a higher resolution before constructing polygons, and include the following two lines, before creation of the polygon object:
new.rast <- raster(extent(im.kde),res = c(50,50))
kr <- resample(kr, new.rast)
bin.kr <- kr
bin.kr[bin.kr < contourLevels(k, prob = 0.05)]<-NA
bin.kr[bin.kr > 0]<-1
k.poly<-rasterToPolygons(bin.kr,dissolve=T)
Note that the results are similar, but not identical, to Hawthorne Beier's GME function 'kde'. He does use the kde function from ks, but must do something slightly different for the output polygon.
At the moment I'm going for the "any information" prize rather than attempting a final answer. The ks:::plot.kde function dispatches to ks:::plotkde.2d in this case. It works its magic through side effects and I cannot get these functions to return values that can be inspected in code. You would need to hack the plotkde.2d function to return the values used to plot the contour lines. You can visualize what is in ddhat$estimate with:
persp(ddhat$estimate)
It appears that contourLevels examines the estimate-matrix and finds the value at which greater than the specified % of the total density will reside.
> contourLevels(ddhat, 0.95)
95%
1.891981e-05
And then draws the contout based on which values exceed that level. (I just haven't found the code that does that yet.)

Resources