drawing scatterplot 3D in r - r

I would like to visualize my data in a scatterplot3d
On my X and Y axis, I would like the same lables. Something like this:
x<-c("A","B","C","D")
y<-c("A","B","C","D")
on the Z axis, I would like to show the comparision between lables in X and Y
A with A
A with B
A with c
A with D
B with B
B with C
B with D
C with C
C with D
D with D
#altogether 10 values in Z
z<-c(0.25, 0.7, 0.35, 1.14, 0.85, 0.36, 0.69, 0.73, 0.023, 0.85)
Now I want to draw all of of these infos on scatterplot3d. How can I implement this concept on scatterplot3d?

If you want to plot points, you need to match triplets of (x,y,z) values. You can create x and y values matching the positions in z with
xx <- factor(rep(x, 4:1), levels=x)
yy <- factor(unlist(sapply(1:4, function(i) y[i:4])), levels=y)
Then you can draw the plot with
library(scatterplot3d)
scatterplot3d(xx,yy,z,
x.ticklabs=c("",x,""), y.ticklabs=c("",y,""),
type="h", lwd=2,
xlim=c(0,5), ylim=c(0,5))
to get
But honestly this doesn't seem like a particularly effective visualization.

Related

scatter plot of two groups with 2 vectors each

I have two groups, each one with 2 vectors that represent "attributes".
Group1
2-element Vector{Any}:
[0.557, -0.13, 0.34, 0.62]
[0.62, -1.20, -0.79, 0.48]
Group2
2-element Vector{Any}
[-1.20, -0.58, 1.07, -0.89]
[1.31, -1.58, -1.27, -0.16]
I want to make a scatter plot that shows Group 1 Attribute 1, Group 1 Attribute 2, Group 2 Attribute 1, and so on. Each category with a different color.
How can i do that?
You can do e.g.:
julia> using Plots
julia> scatter(vec1..., legend=nothing, color="red")
julia> scatter!(vec2..., color="blue")
if vec1 and vec2 are variables storing vectors of vectors of per group information (as in the output you have shown)

Identify all cells above or below a diagonal in an asymmetric matrix in R

I have looked around for solutions to this problem and the closest I found was: Get upper triangular matrix from nonsymmetric matrix, but this did not work for me.
I have a matrix in R with 2 columns and over 3000 rows.
head(diag_calc)
X Y
[1,] 0.4991733 0.05358506
[2,] 1.1758962 0.70707194
[3,] 0.2197383 -0.00148791
[4,] 0.6389240 0.24411083
[5,] 0.8708275 0.16959840
[6,] 0.9784328 0.10341456
When I plot them against each other they look like this:
I want to identify all the rows containing points on either extreme of the diagonals. I have tried labeling points by being in the 3rd quartile of X and 1st quartile of Y and colored them orange. I did the reverse and colored them purple. However, this metric does not capture the true biological variability in my system and it appears that identifying cells at the extremes off a diagonal (which starts at the inflection point of the quartile that I labeled) would provide a better result.
I have tried using diag, upper.tri, and lower.tri from base R, but these do not work, I think due the the asymmetrical nature of my matrix. Diag does work to calculate the inflection point where each diagonal line passes through though. As such:
diag_calc <- Ad_SF7_fc_scored_NK %>%
select(one_of("X", "Y")) %>%
as.matrix(.)
diag(diag_calc) -> diag_test
diag_test
[1] 0.4991733 0.7070719
I can get the other inflection point by swapping the X and Y variables when generating my matrix.
Does anyone have a solution or advice on potential approaches to use?
Thanks!
Here is a way to proceed making an assumption about how you defined your diagonal lines. First create reproducible data and get the quantiles:
set.seed(42)
X <- rnorm(500, 1.5, .5)
Y <- rnorm(500)
Xq <- quantile(X)
Yq <- quantile(Y)
df <- data.frame(X, Y)
Now plot the data and identify a line that passes through the lower left quantile intersection and the upper right quantile intersection. Then use the slope to identify parallel lines that pass through the upper left and lower right intersections:
plot(X~Y, df, pch=20)
abline(v=Yq[2:4], lty=3)
abline(h=Xq[2:4], lty=3)
diag <- lm(Xq[c(2, 4)]~Yq[c(2, 4)])
points(Yq[c(2, 4)], Xq[c(2, 4)], cex=2, col="red", lwd=2)
abline(diag)
b <- coef(diag)[2]
a1 <- Xq[4] - b * Yq[2]
a2 <- Xq[2] - b * Yq[4]
abline(a1, b)
abline(a2, b)
Now identify the points above and below these two lines:
res1 <- X - (a1 + b * Y)
res2 <- (a2 + b * Y) - X
clr <- c("black", "purple", "darkorange")
idx <- ifelse(res1 > 0, 3, ifelse(res2 > 0, 2, 1))
plot(X~Y, pch=20, col=clr[idx])
abline(a1, b, col="red")
abline(a2, b, col="red")
Finally add the identification of the outliers to the data:
position <- c("inside", "below", "above")
df$outlier <- position[idx]
head(df)
# X Y outlier
# 1 2.185479 1.029140719 inside
# 2 1.217651 0.914774868 below
# 3 1.681564 -0.002456267 inside
# 4 1.816431 0.136009552 inside
# 5 1.702134 -0.720153545 inside
# 6 1.446938 -0.198124330 inside
# 7 2.255761 -1.029208806 above
# 8 1.452670 -0.966955896 inside
# 9 2.509212 -1.220813089 above
# 10 1.468643 0.836207704 inside

How to "jitter" a vector of numbers?

The concept of jittering in graphical plotting is intended to make sure points do not overlap. I want to do something similar in a vector
Imagine I have a vector like this:
v <- c(0.5, 0.5, 0.55, 0.60, 0.71, 0.71, 0.8)
As you can see, it is a vector that is ordered by increasing numbers, with the caveat that some of the numbers are exactly the same. How can I "jitter" them through adding a very small value, so that they can be ordered strictly in increasing order? I would like to achieve something like this:
0.5, 0.50001, 0.55, 0.60, 0.71, 0.71001, 0.8
How can I achieve this in R?
If the solution allows me to adjust the size of the "added value" it's a bonus!
Jitter and then sort:
sort(jitter(z))
The function rle gets you the run length of repeated elements in a vector. Using this information, you can then create a sequence of the repeats, multiply this by your verySmallNumber and add it to v.
# New vector to illustrate a triplet
v <- c(0.5, 0.5, 0.55, 0.60, 0.71, 0.71, 0.71, 0.8)
# Define the amount you wish to add
verySmallNumber <- 0.00001
# Get the rle
rv <- rle(v)
# Create the sequence, multiply and subtract the verySmallNumber, then add
sequence(rv$lengths) * verySmallNumber - verySmallNumber + v
# [1] 0.50000 0.50001 0.55000 0.60000 0.71000 0.71001 0.71002 0.80000
Of course, eventually, a very long sequence of repeats might lead to a value equal to the next real value. Adding a check to see what the longest repeated value is would possibly solve that.

Plot polygon in R

I want to plot a polygon from a sample of points (in practice, the polygon is a convex hull) whose coordinates are
x <- c(0.66, 0.26, 0.90, 0.06, 0.94, 0.37)
y <- c(0.99, 0.20, 0.38, 0.77, 0.71, 0.17)
When I apply the polygon function I get the following plot:
plot(x,y,type="n")
polygon(x,y)
text(x,y,1:length(x))
But it is not what I expect... What I want is the following plot:
I obtained this last plot by doing:
good.order <- c(1,5,3,6,2,4)
plot(x,y,type="n")
polygon(x[good.order], y[good.order])
text(x,y,1:length(x))
My question
Basically, my question is: how to obtain the vector of indices (called good order in the code above)
which will allow to get the polygon I want?
Assuming a convex polygon, just take a central point and compute the angle, then order in increasing angle.
> pts = cbind(x,y)
> polygon(pts[order(atan2(x-mean(x),y-mean(y))),])
Note that any cycle of your good.order will work, mine gives:
> order(atan2(x-mean(x),y-mean(y)))
[1] 6 2 4 1 5 3
probably because I've mixed x and y in atan2 and so its thinking about it rotated by 90 degrees, like that matters here.
Here is one possibility. The idea is to use the angle around the center for ordering:
x <- c(0.66, 0.26, 0.90, 0.06, 0.94, 0.37)
y <- c(0.99, 0.20, 0.38, 0.77, 0.71, 0.17)
xnew <- x[order(Arg(scale(x) + scale(y) * 1i))]
ynew <- y[order(Arg(scale(x) + scale(y) * 1i))]
plot(xnew, ynew, type = "n")
polygon(xnew ,ynew)
text(x, y, 1:length(x))
Just use the geometry package with the function convhulln
Here the example they provide (see ?convhulln)
ps <- matrix(rnorm(3000), ncol=3) # generate points on a sphere
ps <- sqrt(3)*ps/drop(sqrt((ps^2) %*% rep(1, 3)))
ts.surf <- t(convhulln(ps)) # see the qhull documentations for the options
rgl.triangles(ps[ts.surf,1],ps[ts.surf,2],ps[ts.surf,3],col="blue",alpha=.2)
For plotting you need the rgl-package
Result:

interactively work with xy point plot clusters - group manipulation in r

I have a large number of pair of X and Y variables along with their cluster membership column. Cluster membership (group) may not be always right (limitation in perfection of clustering algorithm), I want to interactively visualize the clusters and manipulate the cluster memberships to identified points.
I tried rggobi and the following is the point I was able to get to (I do not mean that I need to use rggobi / ggobi, if better options are available you are welcome to suggest).
# data
set.seed (1234)
c1 <- rnorm (40, 0.1, 0.02); c2 <- rnorm (40, 0.3, 0.01)
c3 <- rnorm (40, 0.5, 0.01); c4 <- rnorm (40, 0.7, 0.01)
c5 <- rnorm (40, 0.9, 0.03)
Yv <- 0.3 + rnorm (200, 0.05, 0.05)
myd <- data.frame (Xv = round (c(c1, c2, c3, c4, c5), 2), Yv = round (Yv, 2),
cltr = factor (rep(1:5, each = 40)))
require(rggobi)
g <- ggobi(myd)
display(g[1], vars=list(X="Xv", Y="Yv"))
You can see five clusters, colored differently with cltr variable. I manually identified the points that are outliers and I want to make their value to NA in the cltr variable. Is their any easy way to disassociate such membership and write to file.
You could try identify to get the indices of the points manually:
## use base::plot
plot(myd$Xv, myd$Yv, col=myd$cltr)
exclude <- identify(myd$Xv, myd$Yv) ## left click on the points you want to exclude (right click to stop/finish)
myd$cltr[exclude] <- NA

Resources