I am plotting a raster like this one:
library(terra)
set.seed(1)
rr <- rast(matrix(rnorm(400, 1.5, 1), nrow=20, ncol=20))
plot(rr)
What I would ideally like is to add points to the plot where the raster values are above a threshold, the points I would like are seen here:
plot(clamp(rr, lower=3))
In my mind, something like this is possible (but in reality, this didn't work)
plot(rr)
points(clamp(rr, lower=3))
or maybe
plot(rr)
points(rr[rr>3])
obviously my examples don't work... anyone got an idea?
thank you,
m
You can do
library(terra)
set.seed(1)
rr <- rast(matrix(rnorm(400, 1.5, 1), nrow=20, ncol=20))
x <- clamp(rr, lower=3, values=FALSE)
plot(rr)
points(as.points(x))
lines(x)
## or polygons
# polys(as.polygons(x), dens=10)
So while lines(x) works, points(x) does not (this has now been fixed in the development version).
For this to work, the cells you do not want points for must be NA. Hence the values=FALSE in clamp.
To show different values (or ranges) for each point you can use plot(..., add=TRUE)
p <- as.points(x)
plot(rr, plg=list(shrink=.3))
plot(p, "lyr.1", add=T, legend=T)
Related
I need to create something like a spider chart in R without using any libraries. That’s my code for now. It creates a figure with points number equal to the length of vector ‘a’. However, I’d like each point to be at the distance from the coordinates center equal to a respective number in a vector, for example one point at a distance 1, another at 2, so on. Is it possible to do so?
a <- 1:6
angle <- seq(0, 2*pi, (2*pi)/length(a))
x <- cos(angle)
y <- sin(angle)
plot(x, y,
type = "l")
See ?stars:
a <- 1:6
stars(matrix(a, nrow=1), scale=FALSE)
For future reference, using R's built-in help search would have found this with ??spider
I have a bunch of points in 2D space and have calculated a convex hull for them. I would now like to "tighten" the hull so that it no longer necessarily encompasses all points. In the typical nails-in-board-with-rubber-band analogy, what I'd like to achieve is to be able to tune the elasticity of the rubber band and allow nails to bend at pressure above some limit. That's just an analogy, there is no real physics here. This would kind-of be related to the reduction in hull area if a given point was removed, but not quite because there could be two points that are very close to each-other. This is not necessarily related to outlier detection, because you could imagine a pattern where a large fractions of the nails would bend if they are on a narrow line (imagine a hammer shape for example). All of this has to be reasonably fast for thousands of points. Any hints where I should look in terms of algorithms? An implementation in R would be perfect, but not needed.
EDIT AFTER COMMENT: The three points I've labelled are those with largest potential for reducing the hull area if they are excluded. In the plot there is no other set of three points that would result in a larger area reduction. A naive implementation of what I'm looking for would maybe be to randomly sample some fraction of the points, calculate the hull area, remove each point on the hull iteratively, recalculate the area, repeat many times and remove points that tend to lead to high area reduction. Maybe this could be implemented in some random forest variant? It's not quite right though, because I would like the points to be removed one by one so that you get the following result. If you looked at all points in one go it would possibly be best to trim from the edges of the "hammer head".
Suppose I have a set of points like this:
set.seed(69)
x <- runif(20)
y <- runif(20)
plot(x, y)
Then it is easy to find the subset points that sit on the convex hull by doing:
ss <- chull(x, y)
This means we can plot the convex hull by doing:
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "red")
Now we can randomly remove one of the points that sits on the convex hull (i.e. "bend a nail") by doing:
bend <- ss[sample(ss, 1)]
x <- x[-bend]
y <- y[-bend]
And we can then repeat the process of finding the convex hull of this new set of points:
ss <- chull(x, y)
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "blue", lty = 2)
To get the point which will, on removal, cause the greatest reduction in area, one option would be the following function:
library(sp)
shrink <- function(coords)
{
ss <- chull(coords[, 1], coords[, 2])
outlier <- ss[which.min(sapply(seq_along(ss),
function(i) Polygon(coords[ss[-i], ], hole = FALSE)#area))]
coords[-outlier, ]
}
So you could do something like:
coords <- cbind(x, y)
new_coords <- shrink(coords)
new_chull <- new_coords[chull(new_coords[, 1], new_coords[, 2]),]
new_chull <- rbind(new_chull, new_chull[1,])
plot(x, y)
lines(new_chull[,1], new_chull[, 2], col = "red")
Of course, you could do this in a loop so that new_coords is fed back into shrink multiple times.
Calculate a robust center and variance using mcd.cov in MASS and the mahalanobis distance of each point from it (using mahalanobis in psych). We then show a quantile plot of the mahalanobis distances using PlotMD from modi and also show the associated outliers in red in the second plot. (There are other functions in modi that may be of interest as well.)
library(MASS)
library(modi)
library(psych)
set.seed(69)
x <- runif(20)
y <- runif(20)
m <- cbind(x, y)
mcd <- cov.mcd(m)
md <- mahalanobis(m, mcd$center, mcd$cov)
stats <- PlotMD(md, 2, alpha = 0.90)
giving:
(continued after screenshot)
and we show the convex hull using lines and the outliers in red:
plot(m)
ix <- chull(m)
lines(m[c(ix, ix[1]), ])
wx <- which(md > stats$halpha)
points(m[wx, ], col = "red", pch = 20)
Thank you both! I've tried various methods for outlier detection, but it's not quite what I was looking for. They have worked badly due to weird shapes of my clusters. I know I talked about convex hull area, but I think filtering on segment lengths yields better results and is closer to what I really wanted. Then it would look something like this:
shrink <- function(xy, max_length = 30){
to_keep <- 1:(dim(xy)[1])
centroid <- c(mean(xy[,1]), mean(xy[,2]))
while (TRUE){
ss <- chull(xy[,1], xy[,2])
ss <- c(ss, ss[1])
lengths <- sapply(1:(length(ss)-1), function(i) sum((xy[ss[i+1],] - xy[ss[i],])^2))
# This gets the point with the longest convex hull segment. chull returns points
# in clockwise order, so the point to remove is either this one or the one
# after it. Remove the one furthest from the centroid.
max_point <- which.max(lengths)
if (lengths[max_point] < max_length) return(to_keep)
if (sum((xy[ss[max_point],] - centroid)^2) > sum((xy[ss[max_point + 1],] - centroid)^2)){
xy <- xy[-ss[max_point],]
to_keep <- to_keep[-ss[max_point]]
}else{
xy <- xy[-ss[max_point + 1],]
to_keep <- to_keep[-ss[max_point + 1]]
}
}
}
It's not optimal because it factors in the distance to the centroid, which I would have liked to avoid, and there is a max_length parameter that should be calculated from the data instead of being hard-coded.
No filter:
It looks like this because there are 500 000 cells in here, and there are many that end up "wrong" when projecting from ~20 000 dimensions to 2.
Filter:
Note that it filters out points at tips of some clusters. This is less-than-optimal but ok. The overlap between some clusters is true and should be there.
I have a 2-dimensional data set.
I use the R's smooth.spline function to smooth my points graph following an example in this article:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.smooth.spline.html
So that I get the spline graph similar to the green line on this picture
I'd like to know the X values, where the first derivative of the smoothing spline equals zero (to determine exact minimum or maximum).
My problem is that my initial dataset (or a dataset that I could auto-generate) to feed into the predict() function does not contain such exact X values that correspond to the smoothing spline extrema.
How can I find such X values?
Here is the picture of the first derivative of the green spline line above
But exact X coordinate of extremums are still not exact.
My approximate R script to generate the pictures looks like the following
sp1 <- smooth.spline(df)
pred.prime <- predict(sp1, deriv=1)
pred.second <- predict(sp1, deriv=2)
d1 <- data.frame(pred.prime)
d2 <- data.frame(pred.second)
dfMinimums <- d1[abs(d1$y) < 1e-4, c('x','y')]
I think that there are two problems here.
You are using the original x-values and they are spaced too far apart AND
Because of the wide spacing of the x's, your threshold for where you consider the derivative "close enough" to zero is too high.
Here is basically your code but with many more x values and requiring smaller derivatives. Since you do not provide any data, I made a coarse approximation to it that should suffice for illustration.
## Coarse approximation of your data
x = runif(300, 0,45000)
y = sin(x/5000) + sin(x/950)/4 + rnorm(300, 0,0.05)
df = data.frame(x,y)
sp1 <- smooth.spline(df)
Spline code
Sx = seq(0,45000,10)
pred.spline <- predict(sp1, Sx)
d0 <- data.frame(pred.spline)
pred.prime <- predict(sp1, Sx, deriv=1)
d1 <- data.frame(pred.prime)
Mins = which(abs(d1$y) < mean(abs(d1$y))/150)
plot(df, pch=20, col="navy")
lines(sp1, col="darkgreen")
points(d0[Mins,], pch=20, col="red")
The extrema look pretty good.
plot(d1, type="l")
points(d1[Mins,], pch=20, col="red")
The points identified look like zeros of the derivative.
You can use my R package SplinesUtils: https://github.com/ZheyuanLi/SplinesUtils, which can be installed by
devtools::install_github("ZheyuanLi/SplinesUtils")
The function to be used are SmoothSplinesAsPiecePoly and solve. I will just use the example under the documentation.
library(SplinesUtils)
## a toy dataset
set.seed(0)
x <- 1:100 + runif(100, -0.1, 0.1)
y <- poly(x, 9) %*% rnorm(9)
y <- y + rnorm(length(y), 0, 0.2 * sd(y))
## fit a smoothing spline
sm <- smooth.spline(x, y)
## coerce "smooth.spline" object to "PiecePoly" object
oo <- SmoothSplineAsPiecePoly(sm)
## plot the spline
plot(oo)
## find all stationary / saddle points
xs <- solve(oo, deriv = 1)
#[1] 3.791103 15.957159 21.918534 23.034192 25.958486 39.799999 58.627431
#[8] 74.583000 87.049227 96.544430
## predict the "PiecePoly" at stationary / saddle points
ys <- predict(oo, xs)
#[1] -0.92224176 0.38751847 0.09951236 0.10764884 0.05960727 0.52068566
#[7] -0.51029209 0.15989592 -0.36464409 0.63471723
points(xs, ys, pch = 19)
One caveat in the #G5W implementation that I found is that it sometimes returns multiple records close around extrema instead of a single one. On the diagram they cannot be seen, since they all fall into one point effectively.
The following snippet from here filters out single extrema points with the minimum value of the first derivative:
library(tidyverse)
df2 <- df %>%
group_by(round(y, 4)) %>%
filter(abs(d1) == min(abs(d1))) %>%
ungroup() %>%
select(-5)
I have a grid of rectangles, whose coordinates are stored in the variable say, 'gridPoints' as shown below:
gridData.Grid=GridTopology(c(min(data$LATITUDE),min(data$LONGITUDE)),c(0.005,0.005),c(32,32));
gridPoints = as.data.frame(coordinates(gridData.Grid))[1:1000,];
names(gridPoints) = c("LATITUDE","LONGITUDE");
plot(gridPoints,col=4);
points(data,col=2);
When plotted, these are the black points in the image,
Now, I have another data set of points called say , 'data', which when plotted are the blue points above.
I would want a count of how many blue points fall within each rectangle in the grid. Each rectangle can be represented by the center of the rectangle, along with the corresponding count of blue points within it in the output. Also, if the blue point lies on any of the sides of the rectangle, it can be considered as lying within the rectangle while making the count. The plot has the blue and black points looking like circles, but they are just standard points/coordinates and hence, much smaller than the circles. In a special case, the rectangle can also be a square.
Try this,
x <- seq(0,10,by=2)
y <- seq(0, 30, by=10)
grid <- expand.grid(x, y)
N <- 100
points <- cbind(runif(N, 0, 10), runif(N, 0, 30))
plot(grid, t="n", xaxs="i", yaxs="i")
points(points, col="blue", pch="+")
abline(v=x, h=y)
binxy <- data.frame(x=findInterval(points[,1], x),
y=findInterval(points[,2], y))
(results <- table(binxy))
d <- as.data.frame.table(results)
xx <- x[-length(x)] + 0.5*diff(x)
d$x <- xx[d$x]
yy <- y[-length(y)] + 0.5*diff(y)
d$y <- yy[d$y]
with(d, text(x, y, label=Freq))
A more general approach (may be overkill for this case, but if you generalize to arbitrary polygons it will still work) is to use the over function in the sp package. This will find which polygon each point is contained in (then you can count them up).
You will need to do some conversions up front (to spatial objects) but this method will work with more complicated polygons than rectangles.
If all the rectangles are exactly the same size, then you could use k nearest neighbor techniques using the centers of the rectangles, see the knn and knn1 functions in the class package.
I feel like this can not be too hard. I know hclust() and cutree() but how do I obtain the coordinates of the centroids where no points distance from it is higher than a given radius? I know that points within range of the centroid may be already belong to a centroid not within range. I am fine with that.
set.seed(1)
data <- matrix(runif(100),ncol=2)
plot(data)
dclust <- hclust(dist(data),method="centroid")
cutree(dclust,h=0.1)
cutree(...,h=0.1) will already fail as the height of dclust is not ordered.
Using your data and running kmeans with 25 groups produces the following results. Is this what you are getting at?
Example <- kmeans(data, 25)
plot(data, type="n")
text(Example$centers, unlist(dimnames(Example$centers)), col="red")
text(data, as.character(Example$cluster), cex=.75)
cdist <- sqrt((data[,1] - Example$centers[Example$cluster, 1])^2 +
(data[, 2] - Example$centers[Example$cluster, 2])^2)
names(cdist) <- 1:50
cdist
The last three lines compute and display the distance of each point to the centroid to which it has been assigned.