I have two groups, each one with 2 vectors that represent "attributes".
Group1
2-element Vector{Any}:
[0.557, -0.13, 0.34, 0.62]
[0.62, -1.20, -0.79, 0.48]
Group2
2-element Vector{Any}
[-1.20, -0.58, 1.07, -0.89]
[1.31, -1.58, -1.27, -0.16]
I want to make a scatter plot that shows Group 1 Attribute 1, Group 1 Attribute 2, Group 2 Attribute 1, and so on. Each category with a different color.
How can i do that?
You can do e.g.:
julia> using Plots
julia> scatter(vec1..., legend=nothing, color="red")
julia> scatter!(vec2..., color="blue")
if vec1 and vec2 are variables storing vectors of vectors of per group information (as in the output you have shown)
Related
I am using the baysout function for outlier detection from the 'dprep' package in R. The returned value is supposed to be a 2 column matrix according to the R documentation. The first column contains the indexes of the top num.out (user defined number of outliers to return) and the second, the outlyingness measure for each index.
The problem is that I want to access the index number separately but I am not able to do this. The function is actually returning an num.out x 1 matrix as opposed to a num.out x 2 matrix. The index value and the outlyingness measure are there but I cannot access them separately. Please see sample code below:
# Install and load the dprep library
install.packages("dprep")
library(dprep)
# Create 5x3 matrix for input to baysout function
A = matrix(c(0.8, 0.4, 1.2, 0.4, 1.2, 1.1, 0.3,
0.1, 1.9, 1.1, 0.9, 1.4, 0.3, 1.5, 0.5), nrow=5, ncol=3)
# Run the baysout function on matrix A and store result in outliers
outliers <- baysout(A, blocks = 3, nclass=0, k = 3, num.out = 3)
# print out result
print(outliers)
# attempt to access the index
print(outliers[1,1])
Output is as follows:
print out result
print(outliers) [,1] 4 3.625798 3 2.901654 2 2.850419
attempt to access the index
print(outliers[1,1]) 4 3.625798
This is not the real data I am using which is much larger and I would like to gain access to the index. In the example above I would like to be able to access the number 4 on its own. It is coupled with the 3.625798 and I am not able to access each figure separately. Would anyone have any advice on how I could do this?
solution by ekstroem
Use:
index <- as.numeric(rownames(outliers))
The documentation may not be entirely correct. In any case the index is stored in the row names.
The concept of jittering in graphical plotting is intended to make sure points do not overlap. I want to do something similar in a vector
Imagine I have a vector like this:
v <- c(0.5, 0.5, 0.55, 0.60, 0.71, 0.71, 0.8)
As you can see, it is a vector that is ordered by increasing numbers, with the caveat that some of the numbers are exactly the same. How can I "jitter" them through adding a very small value, so that they can be ordered strictly in increasing order? I would like to achieve something like this:
0.5, 0.50001, 0.55, 0.60, 0.71, 0.71001, 0.8
How can I achieve this in R?
If the solution allows me to adjust the size of the "added value" it's a bonus!
Jitter and then sort:
sort(jitter(z))
The function rle gets you the run length of repeated elements in a vector. Using this information, you can then create a sequence of the repeats, multiply this by your verySmallNumber and add it to v.
# New vector to illustrate a triplet
v <- c(0.5, 0.5, 0.55, 0.60, 0.71, 0.71, 0.71, 0.8)
# Define the amount you wish to add
verySmallNumber <- 0.00001
# Get the rle
rv <- rle(v)
# Create the sequence, multiply and subtract the verySmallNumber, then add
sequence(rv$lengths) * verySmallNumber - verySmallNumber + v
# [1] 0.50000 0.50001 0.55000 0.60000 0.71000 0.71001 0.71002 0.80000
Of course, eventually, a very long sequence of repeats might lead to a value equal to the next real value. Adding a check to see what the longest repeated value is would possibly solve that.
I want to plot a polygon from a sample of points (in practice, the polygon is a convex hull) whose coordinates are
x <- c(0.66, 0.26, 0.90, 0.06, 0.94, 0.37)
y <- c(0.99, 0.20, 0.38, 0.77, 0.71, 0.17)
When I apply the polygon function I get the following plot:
plot(x,y,type="n")
polygon(x,y)
text(x,y,1:length(x))
But it is not what I expect... What I want is the following plot:
I obtained this last plot by doing:
good.order <- c(1,5,3,6,2,4)
plot(x,y,type="n")
polygon(x[good.order], y[good.order])
text(x,y,1:length(x))
My question
Basically, my question is: how to obtain the vector of indices (called good order in the code above)
which will allow to get the polygon I want?
Assuming a convex polygon, just take a central point and compute the angle, then order in increasing angle.
> pts = cbind(x,y)
> polygon(pts[order(atan2(x-mean(x),y-mean(y))),])
Note that any cycle of your good.order will work, mine gives:
> order(atan2(x-mean(x),y-mean(y)))
[1] 6 2 4 1 5 3
probably because I've mixed x and y in atan2 and so its thinking about it rotated by 90 degrees, like that matters here.
Here is one possibility. The idea is to use the angle around the center for ordering:
x <- c(0.66, 0.26, 0.90, 0.06, 0.94, 0.37)
y <- c(0.99, 0.20, 0.38, 0.77, 0.71, 0.17)
xnew <- x[order(Arg(scale(x) + scale(y) * 1i))]
ynew <- y[order(Arg(scale(x) + scale(y) * 1i))]
plot(xnew, ynew, type = "n")
polygon(xnew ,ynew)
text(x, y, 1:length(x))
Just use the geometry package with the function convhulln
Here the example they provide (see ?convhulln)
ps <- matrix(rnorm(3000), ncol=3) # generate points on a sphere
ps <- sqrt(3)*ps/drop(sqrt((ps^2) %*% rep(1, 3)))
ts.surf <- t(convhulln(ps)) # see the qhull documentations for the options
rgl.triangles(ps[ts.surf,1],ps[ts.surf,2],ps[ts.surf,3],col="blue",alpha=.2)
For plotting you need the rgl-package
Result:
I would like to visualize my data in a scatterplot3d
On my X and Y axis, I would like the same lables. Something like this:
x<-c("A","B","C","D")
y<-c("A","B","C","D")
on the Z axis, I would like to show the comparision between lables in X and Y
A with A
A with B
A with c
A with D
B with B
B with C
B with D
C with C
C with D
D with D
#altogether 10 values in Z
z<-c(0.25, 0.7, 0.35, 1.14, 0.85, 0.36, 0.69, 0.73, 0.023, 0.85)
Now I want to draw all of of these infos on scatterplot3d. How can I implement this concept on scatterplot3d?
If you want to plot points, you need to match triplets of (x,y,z) values. You can create x and y values matching the positions in z with
xx <- factor(rep(x, 4:1), levels=x)
yy <- factor(unlist(sapply(1:4, function(i) y[i:4])), levels=y)
Then you can draw the plot with
library(scatterplot3d)
scatterplot3d(xx,yy,z,
x.ticklabs=c("",x,""), y.ticklabs=c("",y,""),
type="h", lwd=2,
xlim=c(0,5), ylim=c(0,5))
to get
But honestly this doesn't seem like a particularly effective visualization.
I have a set of 2 curves (each with a few hundreds to a couple thousands datapoints) that I want to compare and get some similarity "score". Actually, I have >100 of those sets to compare... I am familiar with R (or at least bioconductor) and would like to use it.
I tried the ccf() function but I'm not too happy about it.
For example, if I compare c1 to the following curves:
c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)
c1b <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5) # perfect match! ideally score of 1
c1c <- c(1, 0.2, 0.1, 0.1, 0.5, 0.9, 0.5) # total opposite, ideally score of -1? (what would 0 be though?)
c2 <- c(0, 0.9, 0.9, 0.9, 0, 0.3, 0.3, 0.9) #pretty good, score of ???
Note that the vectors don't have the same size and it needs to be normalized, somehow... Any idea?
If you look at those 2 lines, they are fairly similar and I think that in a first step, measuring the area under the 2 curves and subtracting would do. I look at the post "Shaded area under 2 curves in R" but that is not quite what I need.
A second issue (optional) is that for lines that have the same profile but different amplitude, I would like to score those as very similar even though the area under them would be big:
c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)
c4 <- c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) # very good, score of ??
I hope that a biologist pretending to formulate problem to programmer is OK...
I'd be happy to provide some real life examples if needed.
Thanks in advance!
They don't form curves in the usual meaning of paired x.y values unless they are of equal length. The first three are of equal length and after packaging in a matrix the rcorr function in HMisc package returns:
> rcorr(as.matrix(dfrm))[[1]]
c1 c1b c1c
c1 1 1 -1
c1b 1 1 -1
c1c -1 -1 1 # as desired if you scaled them to 0-1
The correlation of the c1 and c4 vectors:
> cor( c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5),
c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) )
[1] 0.9874975
I do not have a very good answer, but I did face similar question in the past, probably on more than 1 occasion. My approach is to answer to myself what makes my curves similar when I subjectively evaluate them (the scientific term here is "eye-balling" :). Is it the area under the curve? Do I count linear translation, rotation, or scaling (zoom) of my curves as contributing to dissimilarity? If not, I take out all the factors that I do not care about by selected normalization (e.g. scale the curves to cover the same ranges in x and y).
I am confident that there is a rigorous mathematical theory for this topic, I would search for the words "affinity" "affine". That said, my primitive/naive methods usually sufficed for the work I was doing.
You may want to ask this question on some math forum.
If the proteins you compare are reasonably close orthologs, you should be able to obtain alignments for either each pair you want to score the similarity of, or a multiple alignment for the entire bunch. Depending on the application, I think the latter will be more rigorous. I would then extract the folding score of only those amino acids that are aligned so that all profiles have the same length, and calculate correlation measures or squared normalized dot-products of the profiles as a similarity measure. The squared normalized dot product or the spearman rank correlation will be less sensitive to amplitude differences, which you seem to want. That will make sure you are comparing elements which are reasonable paired (to the extent the alignment is reasonable), and will let you answer questions like: "Are corresponding residues in the compared proteins generally folded to a similar extent?".