XYZ data to 2D plot in R - r

I need to make a 2D plot of distance travelled versus my value at that point ("intensity").
My data is formatted as:
lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917
and it continues for 282 rows like this. I was looking at a few packages that calculate distance between longitude (lon) and latitude (lat) points (such as geosphere), but I couldn't understand how to get my data into the format that it wanted. I know the total distance travelled in degrees should be 4.01538, evenly spaced out between the 282 points, but I don't know how I could make a column in R with this in mind.

dfrm$dist<- cumsum(c(0, with(dfrm, sqrt( (lat[-1]-lat[-nrow(dfrm)])^2+
(lon[-1]-lon[-nrow(dfrm)])^2
))) )
with(dfrm, plot(dist, intensity, type="b"))
Or choose a more "geographic" distance measure with the lagged column values. But given the increments, I doubt the error from using a naive distance measure can be that much.

From here I found some packages to calculate distance between coordinates. Assuming your data is called dtf and using the RSEIS package:
dtf <- data.frame(rbind(c(-85.01478,37.99030,-68.3167),
c(-85.00752,37.97601,-68.0247),c(-85.00027,37.96172,-67.9565),
c(-84.99302,37.94743,-67.8917)))
names(dtf) <- c('lon','lat','int')
library(RSEIS)
travelint <- function(i,data){
ddeg <- GreatDist(dtf$lon[i],dtf$lat[i],dtf$lon[i+1],dtf$lat[i+1])$ddeg;
dint <- dtf$int[i+1] - dtf$int[i]; return(list(ddeg,dint))}
out <- sapply(1:(nrow(dtf)-1),data=dtf,travelint)
out <- data.frame(matrix(as.numeric(out),ncol=2,byrow=T))
out$X1 <- cumsum(out$X1)
This will take your data, calculate the distance traveled between points and the intensity change between them. After that it can be plotted like this:
ggplot(out,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity Change")
If instead you want increasing intensity , you can use cumsum again to get the cumulative change in intensity and then add it to the first intensity:
out2 <- out
out2 <- rbind(c(0,0),out2)
out2$X2 <- cumsum(out2$X2) + dtf$int[1]
ggplot(out2,aes(X1,X2)) + geom_line() +
labs(x="Distance (Degrees)",y="Intensity")

As mentioned by DWin you can use naive measure or geographic distance measure. Here I am using gdist function from Imap package calculates Great-circle distance .
library(Imap)
library(lattice)
#Dummy data
longlat <- read.table(text="lon lat intensity
1. -85.01478 37.99030 -68.3167
2. -85.00752 37.97601 -68.0247
3. -85.00027 37.96172 -67.9565
4. -84.99302 37.94743 -67.8917", header=TRUE)
ll <- lapply(seq(nrow(longlat)-1), function(x){
start <- longlat[x,]
end <- longlat[x+1,]
cbind(distance = gdist(start$lon, start$lat, end$lon, end$lat,units = "m"),
intensity = end$intensity - start$intensity)
})
dd <- as.data.frame(do.call(rbind,ll))
library(lattice)
xyplot(intensity~distance,dd,type= c('p','l'),pch=20,cex=2)

Related

R terra calculate area moment of inertia OR how to get (weighted) raster-cell distance from patch-centroid

I'm trying to calculate a measure akin to the moment of inertia using a raster layer and I am struggling to figure out how to get the distance of each cell to a patch's centroid and then extracting both that distance and the cell's value.
I want to calculate the moment of inertia (get the squared distance of each cell to its patches centroid, multiply by value of cell, sum these values by patch, and then divide by the sum of all values per patch). I provide a simplified set-up below. The code creates a simple raster layer, patches clusters of cells, and gets their centroids. I know that the function in question to use next is probably terra::distance (maybe in combination with terra::zonal?!) -- how do I calculate the distance by patch?
#lonlat
library(terra)
r <- rast(ncols=36, nrows=18, crs="+proj=longlat +datum=WGS84")
r[498:500] <- 1
r[3:6] <- 1
r[111:116] <- 8
r[388:342] <- 1
r[345:349] <- 3
r_patched <- patches(r, directions = 8, allowGaps = F)
testvector <- terra::as.polygons(r_patched, trunc=T, dissolve = T)
p_centr <- geom(centroids(testvector), df=T)
##next steps
#1. get distance of each cell from patch's centroid
#r <- distance(r)
#2. multiply cell value by squared distance to centroid
I think you need to loop over the patches. Something like this:
p_centr <- centroids(testvector)
v <- rep(NA, length(p_centr))
for (i in 1:length(p_centr)) {
x <- ifel(r_patched == p_centr$patches[i], i, NA)
x <- trim(x)
d <- distance(x, p_centr[i,])
d <- mask(d, x)
# square distance and multiply with cell values
d <- d^2 * crop(r, d)
v[i] <- global(d, "sum", na.rm=TRUE)[[1]]
}
v / sum(v)
#[1] 1.213209e-05 1.324495e-02 9.864759e-01 2.669833e-04

Best way to cluster long/lat hotspot points in one city in R?

I am new to R and (unsupervised) machine learning. I'm trying to find out the best cluster solution for my data in R.
What is my data about?
I have a dataset with +/- 800 long / lat WGS84 coordinates in one city.
Long is in the range 6.90 - 6.95
lat is in the range 52.29 - 52.33
What do I want?
I want to find "hotspots" based on their density. As example: minimum 5 long/lat points in a range of 50 meter. This is a point plot example:
Why do I want this?
As example: let's assume that every single point is a car accident. By clustering the points I hope to see which areas need attention. (min x points in a range of x meter needs attention)
What have I found?
The following clustering algorithms seems possible for my solution:
DBscan (https://cran.r-project.org/web/packages/dbscan/dbscan.pdf)
HDBscan(https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html)
OPTICS (https://www.rdocumentation.org/packages/dbscan/versions/0.9-8/topics/optics)
City Clustering Algorithm (https://cran.r-project.org/web/packages/osc/vignettes/paper.pdf)
My questions
What is the best solution or algorithm for my case in R?
Is it true that I have to convert my long/lat to a distance / Haversine matrix first?
Find something interested on: https://gis.stackexchange.com/questions/64392/finding-clusters-of-points-based-distance-rule-using-r
I changed this code a bit, using the outliers as places where a lot happens
# 1. Make spatialpointsdataframe #
xy <- SpatialPointsDataFrame(
matrix(c(x,y), ncol=2), data.frame(ID=seq(1:length(x))),
proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
# 2. Use DISTM function to generate distance matrix.#
mdist <- distm(xy)
# 3. Use hierarchical clustering with complete methode#
hc <- hclust(as.dist(mdist), method="complete")
# 4. Show dendogram#
plot(hc, labels = input$street, xlab="", sub="",cex=0.7)
# 5. Set distance: in my case 300 meter#
d=300
# 6. define clusters based on a tree "height" cutoff "d" and add them to the SpDataFrame
xy$clust <- cutree(hc, h=d)
# 7. Add clusters to dataset#
input$cluster <- xy#data[["clust"]]
# 8. Plot clusters #
plot(input$long, input$lat, col=input$cluster, pch=20)
text(input$long, input$lat, labels =input$cluster)
# 9. Count n in cluster#
selection2 <- input %>% count(cluster)
# 10. Make a boxplot #
boxplot(selection2$n)
#11. Get first outlier#
outlier <- boxplot.stats(selection2$n)$out
outlier <- sort(outlier)
outlier <- as.numeric(outlier[1])
#12. Filter clusters greater than outlier#
selectie3 <- as.vector(selection2 %>% filter(selection2$n >= outlier[1]) %>% select(cluster))
#13. Make a new DF with all outlier clusters#
heatclusters <- input %>% filter(cluster%in% c(selectie3$cluster))
#14. Plot outlier clusters#
plot(heatclusters$long, heatclusters$lat, col=heatclusters$cluster)
#15. Plot on density map ##
googlemap + geom_point(aes(x=long , y=lat), data=heatclusters, color="red", size=0.1, shape=".") +
stat_density2d(data=heatclusters,
aes(x =long, y =lat, fill= ..level..), alpha = .2, size = 0.1,
bins = 10, geom = "polygon") + scale_fill_gradient(low = "green", high = "red")
Don't know if this a good solution. But it seems to work. Maybe someone has any other suggestion?

Calculating Angles from Spatial Points in R

I am looking at some dispersal data and would like to get distance between points and also the angle between those points. So far, I have only been able to achieve the first part. Using the teal data from the adehabitatLT package I have done this:
require("adehabitatLT")
require("sp")
data("teal")
teal <- teal[1:10 ,]
capsd <- SpatialPointsDataFrame(coords = SpatialPoints(coords =
teal[, c("x","y")], proj4string = CRS("+proj=longlat +datum=WGS84
+ellps=WGS84 +towgs84=0,0,0")), data=teal)
capdistance <- as.data.frame(pointDistance(capsd))
The capdistance is a 10x10 dataframe displaying the distances between the first 10 points of the teal dataset.
Does anyone know how I would calculate the angle between these points to create a similar matrix to the capdistance data.frame? I have searched, but so far I have not found anything that would calculate the angle between two set locations. Any help would be greatly appreciated.
EDIT
So I have been looking around and it would seem that the bearing function from the geosphere package would be useful for this, but I am still (at least) a step away from working this all the way through:
require("geosphere")
capbearing1 <- bearing(capsd[1:10 ,], capsd[1 ,])
capbearing2 <- bearing(capsd[1:10 ,], capsd[2 ,])
I could repeat this ten times to achieve ten lists each one giving the angle of one of the ten points relative to all ten points (itself and the nine others); however, I would really like this to operate smoothly to give all ten lists at once as a single matrix; again, any help is very appreciated.
cygps gave some good code if you are utilizing UTMs in a single zone and have limited data points, so try that out if you have those parameters.
foo <- function(df) {
x1 <- x2 <- df$x
y1 <- y2 <- df$y
Xpair<-merge(x1,x2)
names(Xpair)<-c("x1","x2")
Ypair<-merge(y1,y2)
names(Ypair)<-c("y1","y2")
dist <- c(sqrt((Xpair$x1 - Xpair$x2)^2 + (Ypair$y1 - Ypair$y2)^2), NA)
dx <- c(Xpair$x1 - Xpair$x2, NA)
dy <- c(Ypair$y1 - Ypair$y2, NA)
abs.angle <- ifelse(dist < 1e-07, NA, atan2(dy, dx))
so <- list(dist, abs.angle)
return(so)
}
I adapted this function from adehabitatLT:as.ltraj. It produces your distance and absolute angle matrices (assuming you wanted distances and angles between all points, not time-ordered distances and angles).

how to get point set (x,y) in a desired area in r

The figure is the plot of x,y set in a excel file, total 8760 pair of x and y. I want to remove the noise data pair in red circle area and output a new excel file with remain data pair. How could I do it in R?
Using #G5W's example:
Make up data:
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
Fit a quantile regression to the 90th percentile:
library(quantreg)
library(splines)
qq <- rq(y~ns(x,20),tau=0.9,data=XY)
Compute and draw the predicted curve:
xvec <- seq(0,16,length.out=101)
pp <- predict(qq,newdata=data.frame(x=xvec))
plot(y~x,data=XY)
lines(xvec,pp,col=2,lwd=2)
Keep only points below the predicted line:
XY2 <- subset(XY,y<predict(qq,newdata=data.frame(x)))
plot(y~x,data=XY2)
lines(xvec,pp,col=2,lwd=2)
You can make the line less wiggly by lowering the number of knots, e.g. y~ns(x,10)
Both R and EXCEL read and write .csv files, so you can use those to transfer the data back and forth.
You do not provide any data so I made some junk data to produce a similar problem.
DATA
set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8))
XY = data.frame(x,y)
One way to identify noise points is by looking at the distance to the nearest neighbors. In dense areas, nearest neighbors will be closer. In non-dense areas, they will be further apart. The package dbscan provides a nice function to get the distance to the k nearest neighbors. For this problem, I used k=6, but you may need to tune for your data. Looking at the distribution of distances to the 6th nearest neighbor we see that most points have 6 neighbors within a distance of 0.2
XY6 = kNNdist(XY, 6)
plot(density(XY6[,6]))
So I will assume that point whose 6th nearest neighbor is further away are noise points. Just changing the color to see which points are affected, we get
TYPE = rep(1,8760)
TYPE[XY6[,6] > 0.2] = 2
plot(XY, col=TYPE)
Of course, if you wish to restrict to the non-noise points, you can use
NonNoise = XY[XY6[,6] > 0.2,]

Fast Fourier Transform in R. What am I doing wrong?

I am a non-expert in Fourier analysis and quite don't get what R's function fft() does. Even after crossreading a lot I couldnt figure it out.
I built an example.
require(ggplot2)
freq <- 200 #sample frequency in Hz
duration <- 3 # length of signal in seconds
#arbitrary sine wave
x <- seq(-4*pi,4*pi, length.out = freq*duration)
y <- sin(0.25*x) + sin(0.5*x) + sin(x)
which looks like:
fourier <- fft(y)
#frequency "amounts" and associated frequencies
amo <- Mod(fft(y))
freqvec <- 1:length(amo)
I ASSUME that fft expects a vector recorded over a timespan of 1 second, so I divide by the timespan
freqvec <- freqvec/duration
#and put this into a data.frame
df <- data.frame(freq = freqvec, ammount = amo)
Now I ASSUMABLY can/have to omit the second half of the data.frame since the frequency "amounts" are only significant to half of the sampling rate due to Nyquist.
df <- df[(1:as.integer(0.5*freq*duration)),]
For plotting I discretize a bit
df.disc <- data.frame(freq = 1:100)
cum.amo <- numeric(100)
for (i in 1:100){
cum.amo[i] <- sum(df$ammount[c(3*i-2,3*i-1,3*i)])
}
df.disc$ammount <- cum.amo
The plot function for the first 20 frequencies:
df.disc$freq <- as.factor(df.disc$freq)
ggplot(df.disc[1:20,], aes(x=freq, y=ammount)) + geom_bar(stat = "identity")
The result:
Is this really a correct spectrogram of the above function? Are my two assumptions correct? Where is my mistake? If there is no, what does this plot now tell me?
EDIT:
Here is a picture without discretization:
THANKS to all of you,
Micha.
Okay, okay. Due to the generally inferior nature of my mistake the solution is quite trivial.
I wrote freq = 200 and duration = 3. But the real duration is from -4pi to 4 pi, hence 8pi resulting in a "real" sample frequency of 1/ ((8*pi)/600) = 23.87324 which does not equal 200.
Replacing the respective lines in the example code by
freq <- 200 #sample frequency in Hz
duration <- 6 # length of signal in seconds
x <- seq(0,duration, length.out = freq*duration)
y <- sin(4*pi*x) + sin(6*pi*x) + sin(8*pi*x)
(with a more illustrative function) yields the correct frequencies as demonstrated by the following plot (restricted to the important part of the frequency domain):

Resources