Grouping Set of Points to a Pre Defined Point - r

I'm looking to create a model that classifies a set of points that are near a pre-defined point.
For example, let's say I have points:
X
Y
1
1
1
2
1
3
2
1
2
3
3
1
3
2
3
3
6
6
8
7
8
5
9
3
10
7
My goal is to identify which points are closest to predefined point (2,2) and ideally output which points those are.
I tried using KNN, but I could not figure out how to get the KNN model to train results near (2,2). Any guidance to how I may accomplish this would be awesome. :)
Plot of Points
df <- data.frame( x = c(1,1,1,2,2,2,3,3,3,6,8,8,9,10), y = c(1,2,3,1,2,3,1,2,3,6,7,5,3,7))
df
goal_point <- c(x=2,y=2)
goal_point

You might approach this by calculating distance from goal as a feature.
df$dist = sqrt((df$x - goal_point["x"])^2 +
(df$y - goal_point["y"])^2)
df$clust = kmeans(df, 2)$cluster
library(ggplot2)
ggplot(df, aes(x, y, color = clust)) +
geom_point()
In this case kmeans is using x, y, and distance from goal. You could also use just distance from goal by using df$clust = kmeans(df[,3], 2)$cluster, which would lead here to the same clustering.

Related

How to interpolate a single point where line crosses a baseline between two points [duplicate]

This question already has answers here:
get x-value given y-value: general root finding for linear / non-linear interpolation function
(2 answers)
Closed 3 years ago.
I am new to R but I am trying to figure out an automated way to determine where a given line between two points crosses the baseline (in this case 75, see dotted line in image link below) in terms of the x-coordinate. Once the x value is found I would like to have it added to the vector of all the x values and the corresponding y value (which would always be the baseline value) in the y value vectors. Basically, have a function look between all points of the input coordinates to see if there are any linear lines between two points that cross the baseline and if there are, to add those new coordinates at the baseline crossing to the output of the x,y vectors. Any help would be most appreciated, especially in terms of automating this between all x,y coordinates.
https://i.stack.imgur.com/UPehz.jpg
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(75,53,37,25,95,35,50,75,75,75)
Edit: added creation of combined data frame with original data + crossing points.
Adapted from another answer related to two intersecting series with uniform X spacing.
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
Y1 <- rep(baseline, 10)
Y2 <- c(75,53,37,25,95,35,50,75,75,75)
# Find points where x1 is above x2.
above <- Y1>Y2
# Points always intersect when above=TRUE, then FALSE or reverse
intersect.points<-which(diff(above)!=0)
# Find the slopes for each line segment.
Y2.slopes <- (Y2[intersect.points+1]-Y2[intersect.points]) /
(X[intersect.points+1]-X[intersect.points])
Y1.slopes <- rep(0,length(Y2.slopes))
# Find the intersection for each segment
X.points <- intersect.points + ((Y2[intersect.points] - Y1[intersect.points]) / (Y1.slopes-Y2.slopes))
Y.points <- Y1[intersect.points] + (Y1.slopes*(X.points-intersect.points))
# Plot.
plot(Y1,type='l')
lines(Y2,type='l',col='red')
points(X.points,Y.points,col='blue')
library(dplyr)
combined <- bind_rows( # combine rows from...
tibble(X, Y2), # table of original, plus
tibble(X = X.points,
Y2 = Y.points)) %>% # table of interpolations
distinct() %>% # and drop any repeated rows
arrange(X) # and sort by X
> combined
# A tibble: 12 x 2
X Y2
<dbl> <dbl>
1 1 75
2 2 53
3 3 37
4 4 25
5 4.71 75
6 5 95
7 5.33 75
8 6 35
9 7 50
10 8 75
11 9 75
12 10 75

Color the individuals of a R PCoA plot by groups

Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)

Create heatmap in R using stat_density2d

I have several (x,y) coordinates, and each one is associated with a binary value (either 1 or 0). I want to create a heatmap showing what the probability is at each point that a given point in that location will have a 1 associated with it.
Sample data:
data = read.table(header=TRUE,
text="x y value
7 3 0
4 5 0
3 7 1
3 6 0
4 5 1
5 6 0")
And so on. I can create a plot showing where the points are concentrated using the following:
ggplot(data, aes(x=x,y=y)) + stat_density2d(aes(fill=..level..), geom="polygon")
But when I try to set fill = value, I get the following error:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
How do I do this?
Edit: I should add that I can easily accomplish this using stat_summary2d or even geom_tile, but it looks much more boxy and less smooth, which I want it to be.

Plot time series with confidence intervals in R

Here is a plot of several different time series that I made in R:
I made these using a simple loop:
for(i in 1:ngroups){
x[paste0("Group_",i)] = apply(x[,group == i],1,mean)
}
plot(x$Group_1,type="l",ylim=c(0,300))
for(i in 2:ngroups){
lines(x[paste0("Group_",i)],col=i)
}
I also could have made this plot using matplot. Now, as you can see, each group is the mean of several other columns. What I would like to do is plot the series as in the plot above, but additionally show the range of the underlying data contributing to that mean. For example, the purple line would be bounded by a region shaded light purple. At any given time index, the purple region will extend from the lowest value in the purple group to the highest value (or, say, the 5 to 95 percentiles). Is there an elegant/clever way to do this?
Here is an answer using the graphics package (graphics that come with R). I also try to explain how it is that the polygon (which is used to generate the CI) is created. This can be repurposed to solve your problem, for which I do not have the exact data.
# Values for noise and CI size
s.e. <- 0.25 # standard error of noise
interval <- s.e.*qnorm(0.975) # standard error * 97.5% quantile
# Values for Fake Data
x <- 1:10 # x values
y <- (x-1)*0.5 + rnorm(length(x), mean=0, sd=s.e.) # generate y values
# Main Plot
ylim <- c(min(y)-interval, max(y)+interval) # account for CI when determining ylim
plot(x, y, type="l", lwd=2, ylim=ylim) # plot x and y
# Determine the x values that will go into CI
CI.x.top <- x # x values going forward
CI.x.bot <- rev(x) # x values backwards
CI.x <- c(CI.x.top, CI.x.bot) # polygons are drawn clockwise
# Determine the Y values for CI
CI.y.top <- y+interval # top of CI
CI.y.bot <- rev(y)-interval # bottom of CI, but rev Y!
CI.y <- c(CI.y.top,CI.y.bot) # forward, then backward
# Add a polygon for the CI
CI.col <- adjustcolor("blue",alpha.f=0.25) # Pick a pretty CI color
polygon(CI.x, CI.y, col=CI.col, border=NA) # draw the polygon
# Point out path of polygon
arrows(CI.x.top[1], CI.y.top[1]+0.1, CI.x.top[3], CI.y.top[3]+0.1)
arrows(CI.x.top[5], CI.y.top[5]+0.1, CI.x.top[7], CI.y.top[7]+0.1)
arrows(CI.x.bot[1], CI.y.bot[1]-0.1, CI.x.bot[3], CI.y.bot[3]-0.1)
arrows(CI.x.bot[6], CI.y.bot[6]-0.1, CI.x.bot[8], CI.y.bot[8]-0.1)
# Add legend to explain what the arrows are
legend("topleft", legend="Arrows indicate path\nfor drawing polygon", xjust=0.5, bty="n")
And here is the final result:
I have made a df using some random data.
Here's the df
df
x y
1 1 3.1667912
2 1 3.5301539
3 1 3.8497014
4 1 4.4494311
5 1 3.8306889
6 1 4.7681518
7 1 2.8516945
8 1 1.8350802
9 1 5.8163498
10 1 4.8589443
11 2 0.3419090
12 2 2.7940851
13 2 1.9688636
14 2 1.3475315
15 2 0.9316124
16 2 1.3208475
17 2 3.0367743
18 2 3.2340156
19 2 1.8188969
20 2 2.5050162
When you plot using stat_summary with mean_cl_normal and geom smooth
ggplot(df,aes(x=x,y=y))+geom_point() +
stat_summary(fun.data=mean_cl_normal, geom="smooth", colour="red")
As someone commented, maybe mean_cl_boot was better so I used it.
ggplot(df,aes(x=x,y=y))+geom_point() +
stat_summary(fun.data=mean_cl_boot, geom="smooth", colour="red")
They are indeed a little different. Also you could play with confint parameter depending on your need.

Creating 3D scatter plot in R?

I have my data in the variable data:
data = read.csv("datafile.csv")
datafile.csv is of the form:
x1,y1,z1
x2,y2,z2
.....
xn,yn,zn
How do I create a 3D scatter plot? (the scale etc. should be automatically taken care of).
Let's simulate a data example.
#create data observations for x, y and z
x = c(10,09,03,04,05)
y = c(08,04,07,08,09)
z = c(15,10,11,09,09)
#join vectors x, y and z directly into a data.frame as suggested by #thelatemail.
data=data.frame(x,y,z)
The object data is supposed to simulate the data you have. See it below
data
x y z
1 10 8 15
2 9 4 10
3 3 7 11
4 4 8 9
5 5 9 9
The answer:
library(scatterplot3d)
scatterplot3d(data$x,data$y,data$z)
See ?scatterplot3d to explore other arguments inside this function.

Resources