R: group XY point cluster, convert to density & save - r

Goal: change a set of point clusters into a density distribution.
Specifics: point clusters are well separated, and I'm interested in the density values of each sampling site (by count).I've been converting the counts by hand and an algorithm to allocate points into densities would be invaluable.
I'm not sure how to go about doing this and am very open to creative input!
Here's what the entire dataset looks like:
> head(markers)
x y
1 -494.5768 300.6698
2 -494.4280 300.7582
3 -494.5812 300.8424
4 -494.4000 300.9146
5 -494.8554 300.9102
6 -494.8038 300.9974
https://www.dropbox.com/s/ewcggnp3p29vhjh/datapoints.csv
I'd like to get an output in this format
x y density
1 6 1 0.0
2 7 1 17.6
3 8 1 11.2
4 12 1 14.4
5 13 1 0.0
6 14 1 8.0
7 14 2 0.0
etc
the x y points would be much larger, like -494.5768
I think it'd have to do something along the lines of ...
calculate distances between all point combinations
group the rows that have distances under a set threshold
subset/split clusters with plyr
find the average XY coordinates of the cluster
assign length(cluster) to the XY point.
recombine all the rows

Related

To determine whether a set of points fall inside a polygon, using point.in.polygon in R not working

Given a bunch of 2D points and a polygon, I want to evaluate which points are on the boundary of the polygon, and which are strictly inside/outside of the polygon.
The 2D points are:
> grp2
x2 y2
1 -5.233762 1.6213203
2 -1.107843 -7.9349705
3 4.918313 8.9073019
4 7.109651 -3.9571781
5 7.304966 -4.3280168
6 6.080564 -3.5817545
7 8.382685 0.4638735
8 6.812215 6.1610483
9 -4.773094 -3.4260797
10 -3.269638 1.1299852
and the vertices of the polygon are:
> dfC
px py
1 7.304966 -4.3280167
2 8.382685 0.4638735
3 6.812215 6.1610483
4 5.854366 7.5499780
5 2.385478 7.0895268
6 -5.233762 1.6213203
7 -4.773094 -3.4260797
8 -1.107843 -7.9349705
The plot of the situation looks like following:
Clearly, there are 3 points inside the polygon, 1 point outside and 6 points on the edge (as is also evident from the data points).
Now I am using point.in.polygon to estimate this. According to the documentation of package sp, this should return 'integer array; values are: 0: point is strictly exterior to pol; 1: point is strictly interior to pol; 2: point lies on the relative interior of an edge of pol; 3: point is a vertex of pol.'
But my code is not being able to detect the points which are vertices of the polygon:
> point.in.polygon(grp2$x2,grp2$y2,dfC$px,dfC$py)
[1] 0 0 0 1 0 1 0 0 0 1
How can I resolve this problem?
The points are not equal. For example, grp2$x2[1] == -5.23376158438623 but for fpC$px[6] == -5.23376157160271. These are not equal. As the comments suggest, you will have more luck if you round the values:
grp3 <- round(grp2, 3)
dfC3 <- round(dfC, 3)
point.in.polygon(grp3$x2,grp3$y2,dfC3$px,dfC3$py)
# [1] 3 3 0 1 3 1 3 3 3 1
Now
grp3[1, ]
# x2 y2
# 1 -5.234 1.621
fpc3[6, ]
# px py
# 6 -5.234 1.621
Changing the number of decimals to 4 or 5 gives the same results as 3. For floating point numbers to be equal, they must match exactly over all 14 decimal places.

Is there any way that can convert RGB/CIElab value into image in R?

I have a dataframe that contains pixels coordinates and its RGB and CIElab value for each.
These values are abstracted from a certain image. After changing some RGB/CIElab value in this dataframe, I would like to let the 'data' go back to an 'image'.
I include a sample with variable r, g, b, x, and y. r, g, and b contain the RGB value of each pixel.x and y indicate the pixel's coordinate.
So basically, I would like to create a picture with three color channels(rgb) with this dataframe. But I have no idea how to implement the process. Abstracting RGB value from image is easy. However, inversing the process is quite difficult.
r g b x y
1 0.91373 0.72157 0.45098 1 1
2 0.86275 0.59216 0.21961 2 1
3 0.84314 0.56471 0.18039 3 1
4 0.83922 0.56078 0.17647 4 1
5 0.84314 0.56471 0.18039 5 1
6 0.84706 0.56863 0.18431 6 1
7 0.85098 0.57255 0.18824 7 1
8 0.85490 0.57647 0.19216 8 1
9 0.85490 0.57647 0.19216 9 1
10 0.85098 0.57255 0.18824 10 1
Update:
I tried to use as.cimg function
my_cimg <- as.cimg(unlist(rgb_image[1:3]), x=length(unique(rgb_image$x)), y=length(unique(rgb_image$y)),cc = 3)
And it works!!!
Thanks!

Segmenting a data frame by row based on previous rows values

I have a data frame in R that contains 2 columns named x and y (co-ordinates). The data frame represents a journey with each line representing the position at the next point in time.
x y seconds
1 0.0 0.0 0
2 -5.8 -8.5 1
3 -11.6 -18.2 2
4 -16.9 -30.1 3
5 -22.8 -40.8 4
6 -29.0 -51.6 5
I need to break the journey up into segments where each segment starts once the distance from the start of the previous segment crosses a certain threshold (e.g. 200).
I have recently switched from using SAS to R, and this is the first time I've come across anything I can do easily in SAS but can't even think of the way to approach the problem in R.
I've posted the SAS code I would use below to do the same job. It creates a new column called segment.
%let cutoff=200;
data segments;
set journey;
retain segment distance x_start y_start;
if _n_=1 then do;
x_start=x;
y_start=y;
segment=1;
distance=0;
end;
distance + sqrt((x-x_start)**2+(y-y_start)**2);
if distance>&cutoff then do;
x_start=x;
y_start=y;
segment+1;
distance=0;
end;
keep x y seconds segment;
run;
Edit: Example output
If the cutoff were 200 then an example of required output would look something like...
x y seconds segment
1 0.0 0.0 0 1
2 40.0 30.0 1 1
3 80.0 60.0 2 1
4 120.0 90.0 3 1
5 160.0 120.0 4 2
6 120.0 150.0 5 2
7 80.0 180.0 6 2
8 40.0 210.0 7 2
9 0.0 240.0 8 3
If your data set is dd, something like
cutoff <- 200
origin <- dd[1,c("x","y")]
cur.seg <- 1
dd$segment <- NA
for (i in 1:nrow(dd)) {
dist <- sqrt(sum((dd[i,c("x","y")]-origin)^2))
if (dist>cutoff) {
cur.seg <- cur.seg+1
origin <- dd[i,c("x","y")]
}
dd$segment[i] <- cur.seg
}
should work. There are some refinements (it might be more efficient to compute distances of the current origin to all rows, then use which(dist>cutoff)[1] to jump to the first row that goes beyond the cutoff), and it would be interesting to try to come up with a completely vectorized solution, but this should be OK. How big is your data set?

Filter between threshold

I am working with a large dataset and I am trying to first identify clusters of values that meet specific threshold values. My aim then is to only keep clusters of a minimum length. Below is some example data and my progress thus far:
Test = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
Sequence = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
Value = c(3,2,3,4,3,4,4,5,5,2,2,4,5,6,4,4,6,2,3,2)
Data <- data.frame(Test, Sequence, Value)
Using package evd, I have identified clusters of values >3
C1 <- clusters(Data$Value, u = 3, r = 1, cmax = F, plot = T)
Which produces
C1
$cluster1
4
4
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6
My problem is twofold:
1) I don't know how to relate this back to the original dataframe (for example to Test A & B)
2) How can I only keep clusters with a minimum size of 3 (thus excluding Cluster 1)
I have looked into various filtering options etc. however they do not cluster data according to a desired threshold, with no options for the minimum size of the cluster either.
Any help is much appreciated.
Q1: relate back to original dataframe: Have a look at Carl Witthoft's answer. He wrote a variant of rle() (seqle() because it allows one to look for integer sequences rather than repetitions): detect intervals of the consequent integer sequences
Q2: only keep clusters of certain length:
C1[sapply(C1, length) > 3]
yields the 2 clusters that are long enough:
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6

approx() without duplicates?

I am using approx() to interpolate values.
x <- 1:20
y <- c(3,8,2,6,8,2,4,7,9,9,1,3,1,9,6,2,8,7,6,2)
df <- cbind.data.frame(x,y)
> df
x y
1 1 3
2 2 8
3 3 2
4 4 6
5 5 8
6 6 2
7 7 4
8 8 7
9 9 9
10 10 9
11 11 1
12 12 3
13 13 1
14 14 9
15 15 6
16 16 2
17 17 8
18 18 7
19 19 6
20 20 2
interpolated <- approx(x=df$x, y=df$y, method="linear", n=5)
gets me this:
interpolated
$x
[1] 1.00 5.75 10.50 15.25 20.00
$y
[1] 3.0 3.5 5.0 5.0 2.0
Now, the first and last value are duplicates of my real data, is there any way to prevent this or is it something I don't understand properly about approx()?
You may want to specify xout to avoid this. For instance, if you want to always exclude the first and the last points, here's how you can do that:
specify_xout <- function(x, n) {
seq(from=min(x), to=max(x), length.out=n+2)[-c(1, n+2)]
}
plot(df$x, df$y)
points(approx(df$x, df$y, xout=specify_xout(df$x, 5)), pch = "*", col = "red")
It does not prevent from interpolating the existing point somewhere in the middle (exactly what happens on the picture below).
approx will fit through all your original datapoints if you give it a chance (change n=5 to xout=df$x to see this). Interpolation is the process of generating values for y given unobserved values of x, but should agree if the values of x have been previously observed.
The method="linear" setup is going to 'draw' linear segments joining up your original coordinates exactly (and so will give the y values you input to it for integer x). You only observe 'new' y values because your n=5 means that for points other than the beginning and end the x is not an integer (and therefore not one of your input values), and so gets interpolated.
If you want observed values not to be exactly reproduced, then maybe add some noise via rnorm ?

Resources