Subset 3D matrix using polygon coordinates - r

I'm working on some bioacoustical analysis and got stuck with an issue that I believe it can be worked out mathematically. I'll use an sound sample from seewavepackage:
library(seewave)
library(tuneR)
data(tico)
By storing a spectrogram (i.e. graphic representation of the sound wave tico) in an R object, we can now deal with the wave file computationally.
s <- spectro(tico, plot=F)
class(s)
>[1] "list"
length(s)
>[1] 3
The object created s consists in two numerical vectors x = s$time, y = s$freq representing the X and Y axis, respectively, and a matrix z = s$amp of amplitude values with the same dimensions of x and y. Z is a virtually a 3D matrix that can be plotted using persp3D (plot3D), plot_ly (plotly) or plot3d (rgl). Alternatively, the wave file can be plotted in 3D using seewave if one wishes to visualize it as an interative rgl plot.
spectro3D(tico)
That being said, the analysis I'm conducting aims to calculate contours of relative amplitude:
con <- contourLines(x=s$time, y=s$freq, z=t(s$amp), levels=seq(-25, -25, 1))
Select the longest contour:
n.con <- numeric(length(con))
for(i in 1:length(con)) n.con[i] <- length(con[[i]]$x)
n.max <- which.max(n.con)
con.max <- con[[n.max]]
And then plot the selected contour against the spectrogram of tico:
spectro(tico, grid=F, osc=F, scale=F)
polygon(x=con.max$x, y=con.max$y, lwd=2)
Now it comes the tricky part. I must find a way to "subset" the matrix of amplitude values s$amp using the coordinates of the longest contour con.max. What I aim to achieve is a new matrix containing only the amplitude values inside the polygon. The remaining parts of the spectrogram should then appear as blank spaces.
One approach I though it could work would be to create a loop that replaces every value outside the polygon for a given amplitude value (e.g. -25 dB). I once did an similar approach to remove the values below -30 dB and it worked out perfectly:
for(i in 1:length(s$amp)){if(s$amp[i] == -Inf |s$amp[i] <= -30)
{s$amp[i] <- -30}}
Another though would be to create a new matrix with the same dimensions of s$amp, subset s$amp using the coordinates of the contour, then replace the subset on the new matrix. Roughly:
mt <- matrix(-30, nrow=nrow(s$amp), ncol = ncol(s$amp))
sb <- s$amp[con.max$y, con.max$x]
new.mt <- c(mt, sb)
s$amp <- new.mt
I'll appreciate any help.

Related

How to visualize a distance matrix on the map by the thickness or color of the line connect the adjacent localities in R?

Suppose I have two datasets: (1) a data frame: coordinates of localities, each with ID; and (2) a linguistic distance matrix which reflects the linguistic distance between these localities.
# My data are similar to this structure
# dataframe
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
my.data <- data.frame(id = id, x_coor = x_coor, y_coor = y_coor)
# linguistic distance matrix
A B C D
B 308.298557
C 592.555483 284.256926
D 141.421356 449.719913 733.976839
E 591.141269 282.842712 1.414214 732.562625
Now, I want to visualize the linguistic distance between every two sites onto a map by the thickness or color of the line connect the adjacent localities in R.
Just like this:
enter image description here
My idea is to generate the delaunay triangulation by deldir or tripack package in R.
# generate delaunay triangulation
library(deldir)
de=deldir(my.data$x_coor,my.data$y_coor)
plot.deldir(de,wlines="triang",col='blue',wpoints = "real",cex = 0.1)
text(my.data$x_coor,my.data$y_coor,my.data$id)
this is the plot:
enter image description here
My question is how to reflect the linguistic distance by the thickness or color of the edges of triangles? Is there any other better method?
Thank you very much!
What you want to do in respect of the line widths can be done "fairly
easily" by the deldir package. You simply call plot.deldir() with the
appropriate value of "lw" (line width).
At the bottom of this answer is a demonstration script "demo.txt" which shows how to do this in the case of your example. In particular this script shows
how to obtain the appropriate value of lw from the "linguistic distance
matrix". I had to make some adjustments in the way this matrix was
presented. I.e. I had to convert it into a proper matrix.
I have rescaled the distances to lie between 0 and 10 to obtain the
corresponding values of the line widths. You might wish to rescale in a different manner.
In respect of colours, there are two issues:
(1) It is not at all clear how you would like to map the "linguistic
distances" to colours.
(2) Unfortunately the code for plot.deldir() is written in a very
kludgy way, whence the "col" argument to segments() cannot be
appropriately passed on in the same manner that the "lw" argument can.
(I wrote the plot.deldir() code a long while ago, when I knew far less about
R programming than I know now! :-))
I will adjust this code and submit a new version of deldir to CRAN
fairly soon.
#
# Demo script
#
# Present the linguistic distances in a useable way.
vldm <- c(308.298557,592.555483,284.256926,141.421356,449.719913,
733.976839,591.141269,282.842712,1.414214,732.562625)
ldm <- matrix(nrow=5,ncol=5)
ldm[row(ldm) > col(ldm)] <- vldm
ldm[row(ldm) <= col(ldm)] <- 0
ldm <- (ldm + t(ldm))/2
rownames(ldm) <- LETTERS[1:5]
colnames(ldm) <- LETTERS[1:5]
# Set up the example data. It makes life much simpler if
# you denote the "x" and "y" coordinates by "x" and "y"!!!
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
# Eschew nomenclature like "my.data". Such nomenclature
# is Micro$oft-ese and is an abomination!!!
demoDat <- data.frame(id = id, x = x_coor, y = y_coor)
# Form the triangulation/tessellation.
library(deldir)
dxy <- deldir(demoDat)
# Plot the triangulation with line widths proportional
# to "linguistic distances". Note that plot.deldir() is
# a *method* for plot, so you do not have to (and shouldn't)
# type the ".deldir" in the plotting command.
plot(dxy,col=0) # This, and plotting with "add=TRUE" below, is
# a kludge to dodge around spurious warnings.
ind <- as.matrix(dxy$delsgs[,c("ind1","ind2")])
lwv <- ldm[ind]
lwv <- 10*lwv/max(lwv)
plot(dxy,wlines="triang",col='grey',wpoints="none",
lw=10*lwv/max(lwv),add=TRUE)
with(demoDat,text(x,y,id,col="red",cex=1.5))

how to get point set (x,y) in a desired area in r

The figure is the plot of x,y set in a excel file, total 8760 pair of x and y. I want to remove the noise data pair in red circle area and output a new excel file with remain data pair. How could I do it in R?
Using #G5W's example:
Make up data:
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
Fit a quantile regression to the 90th percentile:
library(quantreg)
library(splines)
qq <- rq(y~ns(x,20),tau=0.9,data=XY)
Compute and draw the predicted curve:
xvec <- seq(0,16,length.out=101)
pp <- predict(qq,newdata=data.frame(x=xvec))
plot(y~x,data=XY)
lines(xvec,pp,col=2,lwd=2)
Keep only points below the predicted line:
XY2 <- subset(XY,y<predict(qq,newdata=data.frame(x)))
plot(y~x,data=XY2)
lines(xvec,pp,col=2,lwd=2)
You can make the line less wiggly by lowering the number of knots, e.g. y~ns(x,10)
Both R and EXCEL read and write .csv files, so you can use those to transfer the data back and forth.
You do not provide any data so I made some junk data to produce a similar problem.
DATA
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
One way to identify noise points is by looking at the distance to the nearest neighbors. In dense areas, nearest neighbors will be closer. In non-dense areas, they will be further apart. The package dbscan provides a nice function to get the distance to the k nearest neighbors. For this problem, I used k=6, but you may need to tune for your data. Looking at the distribution of distances to the 6th nearest neighbor we see that most points have 6 neighbors within a distance of 0.2
XY6 = kNNdist(XY, 6)
plot(density(XY6[,6]))
So I will assume that point whose 6th nearest neighbor is further away are noise points. Just changing the color to see which points are affected, we get
TYPE = rep(1,8760)
TYPE[XY6[,6] > 0.2] = 2
plot(XY, col=TYPE)
Of course, if you wish to restrict to the non-noise points, you can use
NonNoise = XY[XY6[,6] > 0.2,]

Find correct 2D translation of a subset of coordinates

I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.

How to estimate the area of 95% contour of a kde object from ks R package

I'm trying to estimate the area of the 95% contour of a kde object from the ks package in R.
If I use the example data set from the ks package, I would create the kernel object as follow:
library(ks)
data(unicef)
H.scv <- Hscv(x=unicef)
fhat <- kde(x=unicef, H=H.scv)
I can easily plot the 25, 50, 75% contour using the plot function:
plot(fhat)
But I want to estimate the area within the contour.
I saw a similar question here, but the answer proposed does not solve the problem.
In my real application, my dataset is a time series of coordinates of an animal and I want to measure the home range size of this animal using a bivariate normal kernel. I'm using ks package because it allows to estimate the bandwith of a kernel distribution with methods such as plug-in and smoothed cross-validation.
Any help would be really appreciated!
Here are two ways to do it. They are both fairly complex conceptually, but actually very simple in code.
fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
z=estimate,levels=cont["95%"])[[1]])
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677
library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,]) # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677
Explanation
First, the kde(...) function returns a kde object, which is a list with 9 elements. You can read about this in the documentation, or you can type str(fhat) at the command line, or, if you're using RStudio (highly recommended), you can see this by expanding the fhat object in the Environment tab.
One of the elements is $eval.points, the points at which the kernel density estimates are evaluated. The default is to evaluate at 151 equally spaced points. $eval.points is itself a list of, in your case 2 vectors. So, fhat$eval.points[[1]] represents the points along "Under-5" and fhat$eval.points[[2]] represents the points along "Ave life exp".
Another element is $estimate, which has the z-values for the kernel density, evaluated at every combination of x and y. So $estimate is a 151 X 151 matrix.
If you call kde(...) with compute.cont=TRUE, you get an additional element in the result: $cont, which contains the z-value in $estimate corresponding to every percentile from 1% to 99%.
So, you need to extract the x- and y-values corresponding to the 95% contour, and use that to calculate the area. You would do that as follows:
fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
z=estimate,levels=cont["95%"])[[1]])
Now, contour.95 has the x- and y-values corresponding to the 95% contour of fhat. There are (at least) two ways to get the area. One uses the pracma package and calculates
it directly.
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677
The reason for the negative value has to do with the ordering of x and y: polyarea(...) is interpreting the polygon as a "hole", so it has negative area.
An alternative uses the area calculation routines in rgeos (a GIS package). Unfortunately, this requires you to first turn your coordinates into a "SpatialPolygon" object, which is a bit of a bear. Nevertheless, it is also straightforward.
library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,]) # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677
Another method would be to use the contourSizes() function within the kde package. I've also been interested in using this package to compare both 2D and 3D space use in ecology, but I wasn't sure how to extract the 2D density estimates. I tested this method by estimating the area of an "animal" which was limited to the area of a circle with a known radius. Below is the code:
set.seed(123)
require(GEOmap)
require(kde)
# need this library for the inpoly function
# Create a data frame centered at coordinates 0,0
data = data.frame(x=0,y=0)
# Create a vector of radians from 0 to 2*pi for making a circle to
# test the area
circle = seq(0,2*pi,length=100)
# Select a radius for your circle
radius = 10
# Create a buffer for when you simulate points (this will be more clear below)
buffer = radius+2
# Simulate x and y coordinates from uniform distribution and combine
# values into a dataframe
createPointsX = runif(1000,min = data$x-buffer, max = data$x+buffer)
createPointsY = runif(1000,min = data$y-buffer, max = data$y+buffer)
data1 = data.frame(x=createPointsX,y=createPointsY)
# Plot the raw data
plot(data1$x,data1$y)
# Calculate the coordinates used to create a cirle with center 0,0 and
# with radius specified above
coords = as.data.frame(t(rbind(data$x+sin(circle)*radius,
data$y+cos(circle)*radius)))
names(coords) = c("x","y")
# Add circle to plot with red line
lines(coords$x,coords$y,col=2,lwd=2)
# Use the inpoly function to calculate whether points lie within
# the circle or not.
inp = inpoly(data1$x, data1$y, coords)
data1 = data1[inp == 1,]
# Finally add points that lie with the circle as blue filled dots
points(data1$x,data1$y,pch=19,col="blue")
# Radius of the circle (known area)
pi * radius^2
#[1] 314.1593
# Sub in your own data here to calculate 95% homerange or 50% core area usage
H.pi = Hpi(data1,binned=T)
fhat = kde(data1,H=H.pi)
ct1 = contourSizes(fhat, cont = 95, approx=TRUE)
# Compare the known area of the circle to the 95% contour size
ct1
# 5%
# 291.466
I've also tried creating 2 un-connected circles and testing the contourSizes() function and it seems to work really well on disjointed distributions.

visualization for high-dimensional points in R

I have a centroid, e.g., A. and I have other 100 points. All of these points are of high-dimensions, e.g, 1000 dimensions. Is there a way to visualize these points in a two-dimensional space in-terms of their distance with A.
A common (though simple) way to visualize high-dimensional points in low dimensional space is to use some form of multi-dimensional scaling:
dat <- matrix(runif(1000*99),99,1000)
#Combine with "special" point
dat <- rbind(rep(0.1,1000),dat)
out <- cmdscale(dist(dat),k = 2)
#Plot everything, highlighting our "special" point
plot(out)
points(out[1,1],out[1,2],col = "red")
You can also check out isoMDS or sammon in the MASS package for other implementations in R.
The distance (by which I assume you mean the norm of the difference vector) is only 1 value, so you can calculate these norms and show them on a 1D plot, but for 2D you'll need a second parameter.

Resources