Finding and labelling candidates/ouliers outside a curve in R plot - r

I am stuck in simple problem. I have a scatter plot.
I am plotted confidence lines around it using my a custom formula. Now, i just want only the names outside the cutoff lines to be displayed nothing inside. But, I can't figure out how to subset my data on the based of the line co-ordinates.
The line is plotted using the lines function which is a vector of 128 x and y values. Now, how do I subset my data (x,y points) based on these 2 values. I can apply a static limit of a single number of sub-setting data like 1,2 or 3 but how to use a vector to subset data, got me stuck.
For an reproducible example, consider :
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,3],pos=3,col="red",cex=0.75)
Now, I need just the labels, which are outside or intersecting the line.
What I was trying to subset my dataframe with the values used for the lines, but I cant make it right.
Now, static sub-setting can be done for single values like
df[which(df[,1]>8 & df[,2]>8),] but how to do it for whole list.
I also tried sapply, to cycle over all the values of x and y used for lines on the df iteratively, but most values become +ve for a limit but false for other values.
Thanks

I will speak about your initial volcano-type-graph problem and not the made up one because they are totally different.
So I really thought this a lot and I believe I reached a solid conclusion. There are two options:
1. You know the equations of the lines, which would be really easy to work with.
2. You do not know the equation of the lines which means we need to work with an approximation.
Some geometry:
The function shows the equation of a line. For a given pair of coordinates (x, y), if y > the right hand side of the equation when you pass x in, then the point is above the line else below the line. The same concept stands if you have a curve (as in your case).
If you have the equations then it is easy to do the above in my code below and you are set. If not you need to make an approximation to the curve. To do that you will need the following code:
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
make_vector <- function(df) {
lab <- vector()
for (i in 1:nrow(df)) {
this_row <- df[i,] #this will contain the three elements per row
if ( (this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1])
|
(this_row[1] > min(line2x) & this_row[2] > max(line2y) & this_row[2] > a + b*this_row[1]) ) {
lab[i] <- this_row[3]
} else {
lab[i] <- <NA>
}
}
return(lab)
}
#this_row[1] = your x
#this_row[2] = your y
#this_row[3] = your label
df$labels <- make_vector(df)
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,4],pos=3,col="red",cex=0.75)
The important bit is the function. Imagine that you have df as you created it with x,y and labs. You also will have a vector with the x,y coordinates for line1 and x,y coordinates for line2.
Let's see the condition of line1 only (the same exists for line 2 which is implemented on the code above):
this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1]
#translates to:
#this_row[1] < max(line1x) = your x needs to be less than the max x (vertical line in graph below
#this_row[2] > max(line1y) = your y needs to be greater than the max y (horizontal line in graph below
#this_row[2] < a + b*this_row[1] = your y needs to be less than the right hand side of the equation (to have a point above i.e. left of the line)
#check below what the line is
This will make something like the below graph (this is a bit horrible and also magnified but it is just a reference. Visualize it approximating your lines):
The above code would pick all the points in the area above the triangle and within the y=1 and x=1 lines.
Finally the equation:
Having 2 points' coordinates you can figure out a line's equation solving a system of two equations and 2 parameters a and b. (y = a +bx by replacing y,x for each point)
The 2 points to pick are the two points closest to the tangent of the first line (line1). Chose those arbitrarily according to your data. The closest to the tangent the better. Just plot the spots and eyeball.
Having done all the above you have your points with your labels (approximately at least).
And that is the only thing you can do!
Long talk but hope it helps.
P.S. I haven't tested the code because I have no data.

Related

R - draw line between two specific values in plot in R

I need to draw a line between two specific values from a plot in R. That's what I want. If it is possible to draw a line between those two consecutive values which the difference between values is higher than 3. Else, draw it knowing the values from the dataset. Also, I would like to add a number under or above the line. Thanks.
Here the link where you can find the image "ImageR.png"
https://www.dropbox.com/sh/blnr3jvius8f3eh/AACOhqyzZGiDHAOPmyE__873a?dl=0
Something like this should do it. You may have to play with pos and offset in text to get it to look good on your data.
x <- rnorm(20, sd=3)
d <- diff(x)
plot(x)
for (i in which(d>3)) {
lines(c(i,i+1), x[i:(i+1)])
text(i+.5, mean(x[i:(i+1)]), round(d[i],1), pos=2)
}

Find correct 2D translation of a subset of coordinates

I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.

How to count line segment occurrences by pixel in R?

I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")

R Find the point of intersection of two lines connecting 4 coordinates

I apologize in advance if my code looks very amateurish.
I'm trying to assign quadrants to 4 measurement stations approximately located on the edges of a town.
I have the coordinates of these 4 stations:
a <- c(13.2975,52.6556)
b <- c(14.0083,52.5583)
c <- c(13.3722,52.3997)
d <- c(12.7417,52.6917)
Now my idea was to create lines connecting the north-south and east-west stations:
line.1 <- matrix(c(d[1],b[1],d[2],b[2]),ncol=2)
line.2 <- matrix(c(a[1],c[1],a[2],c[2]),ncol=2)
Plotting all the stations the connecting lines looks allright, however not very helpful for analyzing it on a computer.
So I calculated the eucledian vectors for the two lines:
vec.1 <- as.vector(c((b[1]-d[1]),(b[2]-d[2])))
vec.2 <- as.vector(c((c[1]-a[1]),(c[2]-a[2])))
which allowed me to calculate the angle between the two lines in degrees:
alpha <- acos((vec.1%*%vec.2) / (sqrt(vec.1[1]^2+vec.1[2]^2)*
sqrt(vec.2[1]^2+vec.2[2]^2)))) * 180/pi
The angle I get for alpha is 67.7146°. This looks fairly good. From this angle I can easily calculate the other 3 angles of the intersection, however I need values relative to the grid so I can assign values from 0°-360° for the wind directions.
Now my next planned step was to find the point where the two lines intersect, add a horizontal and vertical abline through that point and then calculate the angle relative to the grid. However I can't find a proper example that does that and I don't think I have a nice linear equation system I could solve.
Is my code way off? Or maybe anyone knows of a package which could help me? It feels like my whole approach is a bit wrong.
Okay I managed to calculate the intersection point, using line equations. Here is how.
The basic equation for two points is like this:
y - y_1 = (y_2-y_1/x_2-x_1) * (x-x_1)
If you make one for each of the two lines, you can just substitute the fractions.
k.1 <- ((c[2]-a[2])/(c[1]-a[1]))
k.2 <- ((b[2]-d[2])/(b[1]-d[1]))
Reshaping the two functions you get a final form for y:
y <- (((-k.1/k.2)*d[2]+k.1*d[1]-k.1*c[1]+d[2])/(1-k.1/k.2))
This one you can now use to calculate the x-value:
x <- ((y-d[2])+d[1]*k.2)/k.2
In my case I get
y = 52.62319
x = 13.3922
I'm starting to really enjoy this program!
Wikipedia has a good article on finding the intersection between two line segments with an explicit formula. However, you don't need to know the point of intersection to calculate the angle to the grid (or axes of coordinate system.) Just compute the angles from your vec.1 and vec.2 to the basis vectors:
e1 <- c(1, 0)
e2 <- c(0, 1)
as you have done.

How to generate a list of segments from a list of random self-intercepting lines (psp in R)?

I'm using X=rpoisline(4) to generate lines and plot them with plot(X).
With X$ends I have their coordinates and their intersection points with selfcrossing.psp(X) (In R with spatstat : library(spatstat)).
I need to get a list of segments and their coordinates and be able to manipulate them (change their orientation, position, intersection...). Those segments have to be defined by the intersection of a line with an other line and with the window.
So, am I missing a simple way to convert a psp of few intersecting lines in a psp of non intersecting segments (I hope it's clear) ?
If you have a non-simple way, I'm interested to !
Thanks for your time !
edit :
Here are the lines I have :
And here are the kind of random stuff I think I can produce if I manage to handle each segments (one by one). So I need to get a list of segments from my list of random lines.
Ok, several coffeebreaks later, here's some buggy code that does what you want. The cleanup I'll leave to you.
ranpoly <- function(numsegs=10,plotit=TRUE) {
require(spatstat)
# temp fix: put the first seg into segset. Later make it a constrained random.
segset<-psp(c(0,1,1,0,.25),c(0,0,1,1,0),c(1,1,0,0,1),c(0,1,1,0,.75),owin(c(0,1),c(0,1)) ) #frame the frame
for (jj in 1: numsegs) {
# randomly select a segment to start from, a point on the seg, the slope,and direction
# later... watch for slopes that immediately exit the frame
endx <-sample(c(-0.2,1.2),1) #force 'x1' outside the frame
# watch that sample() gotcha
if(segset$n<=5) sampset <- c(5,5) else sampset<-5:segset$n
startseg<-sample(sampset,1) #don't select a frame segment
# this is slope of segment to be constructed
slope <- tan(runif(1)*2*pi-pi) # range +/- Inf
# get length of selected segment
seglen<-lengths.psp(segset)[startseg]
startcut <- runif(1)
# grab the coords of starting point (similar triangles)
startx<- segset$ends$x0[startseg] + (segset$ends$x1[startseg]-segset$ends$x0[startseg])*startcut #seglen
starty<- segset$ends$y0[startseg] + (segset$ends$y1[startseg]-segset$ends$y0[startseg])*startcut #seglen
# make a psp object with that startpoint and slope; will adjust it after finding intersections
endy <- starty + slope*(endx-startx)
newpsp<-psp(startx,starty,endx,endy,segset$window,check=FALSE)
# don't calc crossing for current element of segset
hits <- crossing.psp(segset[-startseg],newpsp)
segdist <- dist(cbind(c(startx,hits$x),c(starty,hits$y)))
# dig back to get the crosspoint desired -- have to get matrixlike object out of class "dist" object
# And, as.matrix puts a zero in location 1,1 kill that row.
cutx <- hits$x[ which.min( as.matrix(segdist)[-1,1] )]
cuty <- hits$y[which.min(as.matrix(segdist)[-1,1] )]
segset <- superimpose(segset,psp(startx,starty,cutx,cuty,segset$window))
} #end jj loop
if(plotit) plot(segset,col=rainbow(numsegs))
return(invisible(segset))
}
The spatstat function selfcut.psp is designed for exactly this purpose.
Y <- selfcut.psp(X)
For further information about manipulating line segment patterns, see section 4.4 in the spatstat book.

Resources