Assignability diagrams - plotting probability of belonging to three groups - r

I am trying to plot an "assignability plot" in which the probability of belonging to three groups is shown on a triangle with dot as each observation. The center of the triangle will represent observations with equal probability (which is 33%) to belonging to each of the three groups (Group1_prob, Group2_Prob, Group3_Prob), and other areas in the triangle will represent varying degrees of group membership. These observations or dots have colours according to Another_category.
I have generated a random dataset with variables I have mentioned above to clarify:
s <- seq(1, 100, 1)
a1 <- matrix(rbeta(100*3,2,2), nc=3)
a1 <- sweep(a1, 1, rowSums(a1), FUN="/")
category2 <- sample(c(1,2,3), 100, replace = TRUE)
df = data.frame(s, a1, category2)
colnames(df) <- c('observation', 'Group1_prob', 'Group2_prob', 'Group3_prob', 'Another_category')
The data frame will look something like the below:
observation Group1_prob Group2_prob Group3_prob Another_category
1 0.20692290 0.5259100 0.2671671 1
2 0.32271247 0.4352754 0.2420121 3
3 0.26894997 0.2367609 0.4942891 2
4 0.51197553 0.2400177 0.2480067 3
5 0.29448485 0.3002781 0.4052370 2
6 0.39686890 0.1370191 0.4661120 2
7 0.33881746 0.2946256 0.3665570 3
8 0.36083040 0.3123024 0.3268672 1
9 0.05739799 0.1207381 0.8218639 3
Would this be at all possible with ggplot2 in R?
A Nature Communications paper by Young et al shows this plot perfectly:

Related

Grouping Set of Points to a Pre Defined Point

I'm looking to create a model that classifies a set of points that are near a pre-defined point.
For example, let's say I have points:
X
Y
1
1
1
2
1
3
2
1
2
3
3
1
3
2
3
3
6
6
8
7
8
5
9
3
10
7
My goal is to identify which points are closest to predefined point (2,2) and ideally output which points those are.
I tried using KNN, but I could not figure out how to get the KNN model to train results near (2,2). Any guidance to how I may accomplish this would be awesome. :)
Plot of Points
df <- data.frame( x = c(1,1,1,2,2,2,3,3,3,6,8,8,9,10), y = c(1,2,3,1,2,3,1,2,3,6,7,5,3,7))
df
goal_point <- c(x=2,y=2)
goal_point
You might approach this by calculating distance from goal as a feature.
df$dist = sqrt((df$x - goal_point["x"])^2 +
(df$y - goal_point["y"])^2)
df$clust = kmeans(df, 2)$cluster
library(ggplot2)
ggplot(df, aes(x, y, color = clust)) +
geom_point()
In this case kmeans is using x, y, and distance from goal. You could also use just distance from goal by using df$clust = kmeans(df[,3], 2)$cluster, which would lead here to the same clustering.

How to interpolate a single point where line crosses a baseline between two points [duplicate]

This question already has answers here:
get x-value given y-value: general root finding for linear / non-linear interpolation function
(2 answers)
Closed 3 years ago.
I am new to R but I am trying to figure out an automated way to determine where a given line between two points crosses the baseline (in this case 75, see dotted line in image link below) in terms of the x-coordinate. Once the x value is found I would like to have it added to the vector of all the x values and the corresponding y value (which would always be the baseline value) in the y value vectors. Basically, have a function look between all points of the input coordinates to see if there are any linear lines between two points that cross the baseline and if there are, to add those new coordinates at the baseline crossing to the output of the x,y vectors. Any help would be most appreciated, especially in terms of automating this between all x,y coordinates.
https://i.stack.imgur.com/UPehz.jpg
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(75,53,37,25,95,35,50,75,75,75)
Edit: added creation of combined data frame with original data + crossing points.
Adapted from another answer related to two intersecting series with uniform X spacing.
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
Y1 <- rep(baseline, 10)
Y2 <- c(75,53,37,25,95,35,50,75,75,75)
# Find points where x1 is above x2.
above <- Y1>Y2
# Points always intersect when above=TRUE, then FALSE or reverse
intersect.points<-which(diff(above)!=0)
# Find the slopes for each line segment.
Y2.slopes <- (Y2[intersect.points+1]-Y2[intersect.points]) /
(X[intersect.points+1]-X[intersect.points])
Y1.slopes <- rep(0,length(Y2.slopes))
# Find the intersection for each segment
X.points <- intersect.points + ((Y2[intersect.points] - Y1[intersect.points]) / (Y1.slopes-Y2.slopes))
Y.points <- Y1[intersect.points] + (Y1.slopes*(X.points-intersect.points))
# Plot.
plot(Y1,type='l')
lines(Y2,type='l',col='red')
points(X.points,Y.points,col='blue')
library(dplyr)
combined <- bind_rows( # combine rows from...
tibble(X, Y2), # table of original, plus
tibble(X = X.points,
Y2 = Y.points)) %>% # table of interpolations
distinct() %>% # and drop any repeated rows
arrange(X) # and sort by X
> combined
# A tibble: 12 x 2
X Y2
<dbl> <dbl>
1 1 75
2 2 53
3 3 37
4 4 25
5 4.71 75
6 5 95
7 5.33 75
8 6 35
9 7 50
10 8 75
11 9 75
12 10 75

divide not rectangle plot into subplots within spatstat package in R

I have data that contains information about sub-plots with different numbers and their corresponding species types (more than 3 species within each subplot). Every species have X & Y coordinates.
> df
subplot species X Y
1 1 Apiaceae 268675 4487472
2 1 Ceyperaceae 268672 4487470
3 1 Vitaceae 268669 4487469
4 2 Ceyperaceae 268665 4487466
5 2 Apiaceae 268662 4487453
6 2 Magnoliaceae 268664 4487453
7 3 Magnoliaceae 268664 4487453
8 3 Apiaceae 268664 4487456
9 3 Vitaceae 268664 4487458
with these data, I have created ppp for the points of each subplot within a window of general plot (big).
grp <- factor(data$subplot)
win <- ripras(data$X, data$Y)
p.p <- ppp(data$X, data$Y, window = window, marks = grp)
Now I want to divide a plot into equal 3 x 3 sub-plots because there are 9 subplots. The genetal plot is not rectangular looks similar to rombo shape when I plot.
I could use quadrats() funcion as below but it has divided my plot into unequal subplots. Some are quadrat, others are traingle etc which I don't want. I want all the subplots to be equal sized quadrats (divide it by lines that paralel to each sides). Can you anyone guide me for this?
divide <-quadrats(p.patt,3,3)
plot(divide)
Thank you!
Could you break up the plot canvas into 3x3, then run each plot?
> par(mfrow=c(3,3))
> # run code for plot 1
> # run code for plot 2
...
> # run code for plot 9
To return back to one plot on the canvas type
> par(mfrow=c(1,1))
This is a question about the spatstat package.
You can use the function quantess to divide the window into tiles of equal area. If you want the tile boundaries to be vertical lines, and you want 7 tiles, use
B <- quantess(Window(p.patt), "x", 7)
where p.patt is your point pattern.

Color the individuals of a R PCoA plot by groups

Should be a simple question, but I haven't found exactly how to do it so far.
I have a matrix as follow:
sample var1 var2 var3 etc.
1 5 7 3 1
2 0 1 6 8
3 7 6 8 9
4 5 3 2 4
I performed a PCoA using Vegan and plotted the results. Now my problem is that I want to color the samples according to a pre-defined group:
group sample
1 1
1 2
2 3
2 4
How can I import the groups and then plot the points colored according to the group tey belong to? It looks simple but I have been scratching my head over this.
Thanks!
Seb
You said you used vegan PCoA which I assume to mean wcmdscale function. The default vegan::wcmdscale only returns a scores matrix similarly as standard stats::cmdscale, but if you added some special arguments (such as eig = TRUE) you get a full wcmdscale result object with dedicated plot and points methods and you can do:
plot(<pcoa-result>, type="n") # no reproducible example: edit like needed
points(<pcoa-result>, col = group) # no reproducible example: group must be visible
If you have a modern vegan (2.5.x) the following also works:
library(magrittr)
plot(<full-pcoa-result>, type = "n") %>% points("sites", col = group)

Is there to way to color code different Scatter plot in pairs depending on the number of points in individual plots

I have a data frame(mappedUn) of the structure:
C1 C2 C3 C4 C5 C6
1 1 1 3 1 1
3 3 3 16 3 3
10 NA 10 NA 6 6
11 NA 11 NA 10 11
NA NA NA NA 11 NA
NA NA NA NA 12 NA
note :I have stripped the entries in the above example to fit it here ,also I have replaced the column names to make it simpler
I was wondering if there is a way to color code scatter plots in R, I am using the pairs method to plot different scatter plots, The method I run is :
pairs(mappedUn[1:6])
Here is what I get:
Notice some graphs have two points some have 3 and so on...Is there a way to add different background color to each of the plot in the above graph based on how many point it has ,
for instance 4 points- red, 3-yellow,2 green etc
My ultimate goal is to visually distinguish the plots with high number of common points
The key here is to customize the parameter panel inside pairs(). Try the following to see whether it meets your requirement.
n.notNA <- function(x){
# define the function that returns the number of non-NA values
return(length(x) - sum(is.na(x)))
}
myscatterplot <- function(x, y){
# ll is used for storing the parameters for plotting region
ll <- par("usr")
# bg is used for storing the color (an integer) of the background of current panel, which depends on the number of points. When x and y have different numbers of non-NA values, use the smaller one as the value of bg.
bg <- min(n.notNA(x), n.notNA(y))
# plot a rectangle framework whose dimension and background color are given by ll and bg
rect(ll[1], ll[3], ll[2], ll[4], col = bg)
# fill the rectangle with points
points(x, y)
}
# "panel = myscatterplot" means in each panel, the plot is given by "myscatterplot()" using appropriate combination of variables
pairs(data, panel = myscatterplot)
A related question : R: How to colorize the diagonal panels in a pairs() plot?

Resources