Selecting overlapping points on a plot - r

I have two matrices which are built as follows
x1=cbind(V1,V2,ID)
X2=cbind(V1,V2,ID)
X3=rbind(X1,X2)
ID takes only the values "red" and "blue"
when I plot X1 and X2 I have the following plot
I want to select the data points which are within 1 unit distance (euclidian distance) basically filtering only the red points which are overlapping or quasi-overlapping a blue point or vice versa.
Red overlapping red and blue overlapping blue is not interesting for me.
Thanks a lot for your assistance.

You definitely need to provide a reproducible example for this one to get the best answer; however, I think below script will serve the purpose:
library(spatstat)
# setting seeds
set.seed(222)
# two different point patterns
X <- runifpoint(15)
Y <- runifpoint(20)
plot(X, pch=19, main="")
plot(Y, col="red", pch=19, add=T)
#you can get both which and dist from nncross
#N.which <- nncross(X,Y, k=1:20, what="which")
#N.dist <- nncross(X,Y, k=1:20, what="dist")
out <- subset(X, nncross(X,Y, k=1:20, what="dist") < 0.1) # you may change 0.1
plot(out, col="blue", pch=19, add=T)
For the above plot, black points represent X and red points represent Y. Blue are intersecting points which are within 0.1 unit distance. This distance can be further modified. For more detaild please see spatstat to compute distances between two different datasets using nncross.

Related

Shade area under a curve [duplicate]

This question already has answers here:
Shading a kernel density plot between two points.
(5 answers)
Closed 6 years ago.
I'm trying to shade an area under a curve in R. I can't quite get it right and I'm not sure why. The curve is defined by
# Define the Mean and Stdev
mean=1152
sd=84
# Create x and y to be plotted
# x is a sequence of numbers shifted to the mean with the width of sd.
# The sequence x includes enough values to show +/-3.5 standard deviations in the data set.
# y is a normal distribution for x
x <- seq(-3.5,3.5,length=100)*sd + mean
y <- dnorm(x,mean,sd)
The plot is
# Plot x vs. y as a line graph
plot(x, y, type="l")
The code I'm using to try to color under the curve where x >= 1250 is
polygon(c( x[x>=1250], max(x) ), c(y[x==max(x)], y[x>=1250] ), col="red")
but here's the result I'm getting
How can I correctly color the portion under the curve where x >= 1250
You need to follow the x,y points of the curve with the polygon, then return along the x-axis (from the maximum x value to the point at x=1250, y=0) to complete the shape. The final vertical edge is drawn automatically, because polygon closes the shape by returning to its start point.
polygon(c(x[x>=1250], max(x), 1250), c(y[x>=1250], 0, 0), col="red")
If, rather than dropping the shading all the way down to the x-axis, you prefer to have it at the level of the curve, then you can use the following instead. Although, in the example given, the curve drops almost to the x-axis, so its hard to see the difference visually.
polygon(c(x[x>=1250], 1250), c(y[x>=1250], y[x==max(x)]), col="red")

Statistical distribution plot with shaded rejection areas

I found this R code online:
stdDev <- 0.75;
x <- seq(-5,5,by=0.01)
y <- dnorm(x,sd=stdDev)
right <- qnorm(0.95,sd=stdDev)
plot(x,y,type="l",xaxt="n",ylab="p",
xlab=expression(paste('Assumed Distribution of ',bar(x))),
axes=FALSE,ylim=c(0,max(y)*1.05),xlim=c(min(x),max(x)),
frame.plot=FALSE)
axis(1,at=c(-5,right,0,5),
pos = c(0,0),
labels=c(expression(' '),expression(bar(x)[cr]),expression(mu[0]),expression('')))
axis(2)
xReject <- seq(right,5,by=0.01)
yReject <- dnorm(xReject,sd=stdDev)
polygon(c(xReject,xReject[length(xReject)],xReject[1]),
c(yReject,0, 0), col='red')
It is doing what I need, which is plotting the normal distribution, and shading a right rejection area according to some number (0.95). What I want to ask is:
How can I change this code to shade a two sided rejection area?
How do I change it for a left side one sided area?
And assume that I want a chi square or F distribution instead, is it enough to just change the dnorm & qnorm commands accordingly?
Another question: In this plot, the plot itself is higher than the y-axis. How do I fix it that the axis matches the height of the plot?
Thank you!
You can start with a polygon covering the whole area under the curve and removing the part that is not rejected:
## Calculate the 5th percentile
left <- qnorm(0.05,sd=stdDev)
## x and y for the whole area
xReject <- c(seq(-5,5,by=0.01))
yReject <- dnorm(xReject,sd=stdDev)
## set y = 0 for the area that is not rejected
yReject[xReject > left & xReject < right] <- 0
## Plot the red areas
polygon(c(xReject,xReject[length(xReject)],xReject[1]),
c(yReject,0, 0), col='red')
As before but set to zero the not rejected areas
yReject[xReject > left] <- 0
Almost. For example for the chi squared distribution you have to give the df (degrees of freedom and not sd). And also the xlim has to be changed. But apart from that the code would be the same.
The line axis(2) draws the y-axis. You can give some extra arguments to have it the way you want. You can try for example:
s <- seq(0,0.55,0.05)
axis(2, at = s, labels = s)
Hope it helps,
alex
Take the polygon calls which shade the right-side rejection area and repeat those lines, substituting the coordinates of the left-side area.
i think this will do it
left <- qnorm(0.05,sd=stdDev)
xLeject <- seq(left,-5,by=-0.01)
yLeject <- dnorm(xLeject,sd=stdDev)
polygon(c(xLeject,xLeject[length(xLeject)],xLeject[1]),
c(yLeject,0, 0), col='red')
As to graph extent, see plot(..., ylim=(lower,upper))

R: Counting points on a grid of rectangles:

I have a grid of rectangles, whose coordinates are stored in the variable say, 'gridPoints' as shown below:
gridData.Grid=GridTopology(c(min(data$LATITUDE),min(data$LONGITUDE)),c(0.005,0.005),c(32,32));
gridPoints = as.data.frame(coordinates(gridData.Grid))[1:1000,];
names(gridPoints) = c("LATITUDE","LONGITUDE");
plot(gridPoints,col=4);
points(data,col=2);
When plotted, these are the black points in the image,
Now, I have another data set of points called say , 'data', which when plotted are the blue points above.
I would want a count of how many blue points fall within each rectangle in the grid. Each rectangle can be represented by the center of the rectangle, along with the corresponding count of blue points within it in the output. Also, if the blue point lies on any of the sides of the rectangle, it can be considered as lying within the rectangle while making the count. The plot has the blue and black points looking like circles, but they are just standard points/coordinates and hence, much smaller than the circles. In a special case, the rectangle can also be a square.
Try this,
x <- seq(0,10,by=2)
y <- seq(0, 30, by=10)
grid <- expand.grid(x, y)
N <- 100
points <- cbind(runif(N, 0, 10), runif(N, 0, 30))
plot(grid, t="n", xaxs="i", yaxs="i")
points(points, col="blue", pch="+")
abline(v=x, h=y)
binxy <- data.frame(x=findInterval(points[,1], x),
y=findInterval(points[,2], y))
(results <- table(binxy))
d <- as.data.frame.table(results)
xx <- x[-length(x)] + 0.5*diff(x)
d$x <- xx[d$x]
yy <- y[-length(y)] + 0.5*diff(y)
d$y <- yy[d$y]
with(d, text(x, y, label=Freq))
A more general approach (may be overkill for this case, but if you generalize to arbitrary polygons it will still work) is to use the over function in the sp package. This will find which polygon each point is contained in (then you can count them up).
You will need to do some conversions up front (to spatial objects) but this method will work with more complicated polygons than rectangles.
If all the rectangles are exactly the same size, then you could use k nearest neighbor techniques using the centers of the rectangles, see the knn and knn1 functions in the class package.

Defining color of 3D points plot based on distance in R

Suppose I generate some three-dimensional Gaussian samples and I plot these with plot3D. I want to color the points depending on their distance to the center of the cloud. By this I mean that I want to give them a color between white (= far away from the center) and somecolor (very close to the center).
I am aware of functions like colorRamp and colorRampPalette but I'm not sure how to use these in this specific situation. Any help would be appreciated!
Edit This is what I have so far:
library(rgl)
#generate two 3D point clouds
cloud1 <- rmnorm(100,mean=c(1,1,1),varcov=diag(.25,3))
cloud2 <- rmnorm(75, mean=c(3,3,3),varcov=diag(.5,3))
plot3d(cloud1,box=F)
points3d(cloud2,col="red")
The resulting plot:
But now I want to let points that are further away from the center to be less black/red.
You can try something like that:
cloud1 <- rmnorm(100,mean=c(1,1,1),varcov=diag(.25,3))
# for an euclidean distance but a manhalobis distance should be more appropriated
aux <- colSums((t(cloud1)-colMeans(cloud1))^2)
col1 <- colorRampPalette(c("red", "white"))
# i used quantiles but equal interval could be used to
cols1 <- col1(11)[findInterval(aux, quantile(aux, seq(0,1,0.1)), right=T)]
# with equal interval
cols1 <- col1(11)[findInterval(aux, seq(min(aux), max(aux), le=10))]
plot3d(cloud1,box=F, col=cols1)
HTH

Combine points within given radius in R to a centroid

I feel like this can not be too hard. I know hclust() and cutree() but how do I obtain the coordinates of the centroids where no points distance from it is higher than a given radius? I know that points within range of the centroid may be already belong to a centroid not within range. I am fine with that.
set.seed(1)
data <- matrix(runif(100),ncol=2)
plot(data)
dclust <- hclust(dist(data),method="centroid")
cutree(dclust,h=0.1)
cutree(...,h=0.1) will already fail as the height of dclust is not ordered.
Using your data and running kmeans with 25 groups produces the following results. Is this what you are getting at?
Example <- kmeans(data, 25)
plot(data, type="n")
text(Example$centers, unlist(dimnames(Example$centers)), col="red")
text(data, as.character(Example$cluster), cex=.75)
cdist <- sqrt((data[,1] - Example$centers[Example$cluster, 1])^2 +
(data[, 2] - Example$centers[Example$cluster, 2])^2)
names(cdist) <- 1:50
cdist
The last three lines compute and display the distance of each point to the centroid to which it has been assigned.

Resources