image comparison in R - r

I am looking for the best way to compare 2 or more images.
The images I have are now in matrix format, so basically I am comparing matrices.
They aren't square (but this isn't a problem).
This is an example of what I have with only two matrices:
#Original data
M1<-cbind(c(0,0,20,40,50,35),c(0,0,5,20,90,80),c(0,0,10,25,85,0),c(58,70,20,50,0,5))
#Data to be compared with M1
M2<-cbind(c(0,5,25,25,60,15),c(0,30,15,10,116,67),c(0,2,9,20,90,1),c(69,50,22,30,0,2))
I can check for the differences and the correlation, but I also want to be able to say for example, if:
high values in M2 occur in the same positions that M1
high values in M2 occur close to the positions in M1
high values in M2 occur far away
Same thing for low values.
By high values I mean maximum values, for example if the max value in M1 is in position (M1_maxvalue(x,y)), than I M2 max value should be a similar value observed in M1 as well as in the same or close position M1_maxvalue(x,y).
I can extract the positions, the variation of the positions of the maximum values, however I am looking for existent methods where I can base my comparisons.
What type of calculations can I use to do such type of analysis?
I can use both image processing packages as well as matrices algorithms.

Sounds like a job better handled with ImageJ or SAODS9 at http://hea-www.harvard.edu/RD/ds9/ .
IIRC those apps have built-in tools for spot and blob-finding, which may save you a lot of time and pain.

Related

Forcing discrete time series to be monotonous decreasing

I've an evaluations series. Each evaluation could have discrete values ranging from 0 to 4. Series should decrease in time. However, since values are inserted manually, errors could happen.
Therefore, I would like to modify my series to be monotonous decreasing. Moreover, I would minimize the number of evaluations modified. Finally, if two or more series would satisfy these criteria, would choose the one with the higher overall values sum.
E.g.
Recorded evaluation
4332422111
Ideal evaluation
4332222111
Recorded evaluation
4332322111
Ideal evaluation
4333322111
(in this case, 4332222111 would have satisfied criteria too, but I chose with the higher values)
I tried with brutal force approach by generating all possible combinations, selecting those monotonous decreasing and finally comparing each one of these with that recorded.
However, series could be even 20-evaluations long and combinations would too many.
x1 <- c(4,3,3,2,4,2,2,1,1,1)
x2 <- c(4,3,3,2,3,2,2,1,1,1)
You could almost certainly break this algorithm, but here's a first try: replace locations with increased values by NA, then fill them in with the previous location.
dfun <- function(x) {
r <- replace(x,which(c(0,diff(x))>0),NA)
zoo::na.locf(r)
}
dfun(x1)
dfun(x2)
This gives the "less-ideal" answer in the second case.
For the record, I also tried
dfun2 <- function(x) {
s <- as.stepfun(isoreg(-x))
-s(seq_along(x))
}
but this doesn't handle the first example as desired.
You could also try to do this with discrete programming (about which I know almost nothing), or a slightly more sophisticated form of brute force -- use a stochastic algorithm that strongly penalizes non-monotonicity and weakly penalizes the distance from the initial sequence ... (e.g. optim(..., method="SANN") with a candidate function that adds or subtracts 1 from an element at random)

Inner Products in Principal Component Analysis in R

For this, I am using the banknote data in R given by data(banknote), which shows measurements of 200 Swiss banknotes. My data matrix is called X, and I have performed PCA by pca.banknote<-prcomp(X).
I am trying to show that the inner product between each observation X[i,] and Principal Component Loading 3 given by pca.banknote$rot[,3] is the same as the 3rd PC scores given by pca.banknote$x[,3].
I have attempted:
all.equal(as.matrix(X[,])%*%banknote.pca$rot[,3], as.matrix(banknote.pca$x[,3]), check.attributes=FALSE)
but this simply gives a mean difference of 1, i.e. they are not equal.
Do I need to change the format of one of these to a vector/data frame etc for this to work? Or any ideas at all as to where the issue is?
Any feedback would be much appreciated. Thanks.

adehabitat compana() doesn't work or returns lambda=NaN

I'm trying to do the compositional analysis of habitat use with the compana() function in the adehabitatHS package (I use adehabitat because I can't install adehabitatHS).
Compana() needs two matrices: one of habitat use and one of avaiable habitat.
When I try to run the function it doesn't work (it never stops), so I have to abort the RStudio session.
I read that one problem could be the 0-values in some habitat types for some animals in the 'avaiable' matrix, whereas other animals have positive values for the same habitat. As done by other people, I replaced 0-values with small values (0,001), ran compana and it worked BUT the lambda values returned me NaN.
The problem is similar to the one found here
adehabitatHS compana test returns lambda = NaN?
They said they resolved using as 'used' habitat matrix the counts (integers) and not the proportions.
I tried also this approach, but never changed (it freezes when there are 0-values in the available matrix, or returns NaN value for Lambda if I replace 0- values wit small values).
I checked all matrices and they are ok, so I'm getting crazy.
I have 6 animals and 21 habitat types.
Can you resolve this BIG problem?
PARTIALLY SOLVED: Asking to some researchers, they told me that the number of habitats shouldn't be higher than the number of animals.
In fact I merged some habitats in order to have six animals per six habitats and now the function works when I replace 0-values in the 'avaiable' matrix with small values (e.d. 0.001).
Unfortunately this is not what I wanted, because I needed to find values (rankings, Log-ratios, etc..) for each habitat type (originally they were 21).

R: how to divide a vector of values into fixed number of groups, based on smallest distance?

I think I have a rather simple problem but I can't figure out the best approach. I have a vector with 30 different values. Now I need to divide the vector into 10 groups in such a way that the mean within group variance is as small as possible. the size of the groups is not important, it can anything between one and 21.
Example. Let's say I have vector of six values, that I have to split into three groups:
Myvector <- c(0.88,0.79,0.78,0.62,0.60,0.58)
Obviously the solution would be:
Group1 <-c(0.88)
Group2 <-c(0.79,0.78)
Group3 <-c(0.62,0.60,0.58)
Is there a function that gives the same outcome as the example and that I can use for my vector withe 30 values?
Many thanks in advance.
It sounds like you want to do k-means clustering. Something like this would work
kmeans(Myvector,3, algo="Lloyd")
Note that I changed the default algorithm to match your desired output. If you read the ?kmeans help page you will see that there are different algorithms to calculate the different clusters because it's not a trivial computational problem. They might necessarily guarantee optimality.

Clustering big data

I have a list like this:
A B score
B C score
A C score
......
where the first two columns contain the variable name and third column contains the score between both. Total number of variables is 250,000 (A,B,C....). And the score is a float [0,1]. The file is approximately 50 GB. And the pairs of A,B where scores are 1, have been removed as more than half the entries were 1.
I wanted to perform hierarchical clustering on the data.
Should I convert the linear form to a matrix with 250,000 rows and 250,000 columns? Or should I partition the data and do the clustering?
I'm clueless with this. Please help!
Thanks.
Your input data already is the matrix.
However hierarchical clustering usually scales O(n^3). That won't work with your data sets size. Plus, they usually need more than one copy of the matrix. You may need 1TB of RAM then... 2*8*250000*250000is a lot.
Some special cases can run in O(n^2): SLINK does. If your data is nicely sorted, it should be possible to run single-link in a single pass over your file. But you will have to implement this yourself. Don't even think of using R or something fancy.

Resources