R: select region on matrix - r

Assume X is a 5x5 matrix (which represents pixel luminosity from a picture):
I would like to select the element that fit within a given simple geometrical shape (e.g. square, circle, ovale) superposed to this matrix. In this example I would like to select the elements from the matrix that fall (even slightly) within the ovale.
Ultimately those elements position would be return into a vector logical vector elementInOvale on which I could perform a simple operation such as X[elementInOvale] <- 0
I have the feeling that this is a common problem which has already been solved, I just don't know how to formulate or where to find information about it, and to do with R.

Related

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

scale along a vector in new coordinate system

I want to use V1 to scale something in the 2nd coordinates, but I don't know how to calculate V1', the first coordinates use Transform T to the 2nd coordinate.
what should I do?
thanks.
If I understand your question properly,
V1' = V * T-1
where T-1 is inverse matrix to transform T
About multiply order - there are both "left multiply" and "right multiply" conventions. Vectors for them are row and column. And transformation matrices are transposed. For example, DirectX and OpenGL use different conventions. I don't know what one is used in cocos.

In R: sort the maximum dissimilarity between rows in a matrix

I have a matrix, which includes 100 rows and 10 columns, here I want to compare the diversity between rows and sort them. And then, I want to select the 10 maximum dissimilarity rows from it, Which method can I use?
set.seed(123)
mat <- matrix(runif(100 * 10), nrow = 100, ncol = 10)
My initial method is to calculate the similarity (e.g. saying tanimoto coefficient or others: http://en.wikipedia.org/wiki/Jaccard_index ) between two rows, and dissimilairty = 1 - similarity, and then compare the dissimilarty values. At last I will sort all dissimilarity value, and select the 10 maximum dissimilarity values. But it seems that the result is a 100 * 100 matrix, maybe need efficient method to such calculation if there are a large number of rows. However, this is just my thought, maybe not right, so I need help.
[update]
After looking for some literatures. I find the one definition for the maximum dissimilarity method.
Maximum dissimilarity method: It begins by randomly choosing a data record as the first cluster center. The record maximally distant from the first point is selected as the next cluster center. The record maximally distant from both current points is selected after that . The process repeats itself until there is a sufficient number of cluster centers.
Here in my question, the sufficient number should be 10.
Thanks.
First of all, the Jacard Index is not right for you. From the wikipedia page
The Jaccard coefficient measures similarity between finite sample sets...
Your matrix has samples of floats, so you have a different problem (note that the Index in question is defined in terms of intersections; that should be a red flag right there :-).
So, you have to decide what you mean by dissimilarity. One natural interpretation would be to say row A is more dissimilar from the data set than row B if it has a greater Euclidean distance to the center of mass of the data set. You can think of the center of mass of the data set as the vector you get by taking the mean of each of the colums and putting them together (apply(mat, 2, mean)).
With this, you can take the distance of each row to that central vector, and then get an ordering on those distances. From that you can work back to the rows you desire from the original matrix.
All together:
center <- apply(mat, 2, mean)
# not quite the distances, actually, but their squares. That will work fine for us though, since the order
# will still be the same
dists <- apply(mat, 1, function(row) sum((row - center) ** 2))
# this gives us the row indices in order of least to greaest dissimiliarity
dist.order <- order(dists)
# Now we just grab the 10 most dissimilar of those
most.dissimilar.ids <- dist.order[91:100]
# and use them to get the corresponding rows of the matrix
most.dissimilar <- mat[most.dissimilar.ids,]
If I was actually writing this, I probably would have compressed the last three lines as most.dissimilar <- mat[order(dists)[91:100],], but hopefully having it broken up like this makes it a little easier to see what's going on.
Of course, if distance from the center of mass doesn't make sense as the best way of thinking of "dissimilarity" in your context, then you'll have to amend with something that does.

How to select multiple cells in a matrix and perform an operation on corresponding cells in another matrix of the same size?

I am trying to write an R script to do pollution routing in world rivers, and need some help on selecting matrix cell coordinates and applying these to other matrices of the same dimension.
My data: I have several matrices corresponding to hydrological parameters of world rivers on a half degree grid (360 rows, 720 columns). These matrices represent flow accumulation (how many cells flow into this cell), flow direction (which of the 8 surrounding cells does the load of certain cell flow to) and pollutant load.
My idea: compute pollutant load in each grid cell from the start to the end of a river. I can base this on flow accumulation (low to high). However, each river basin can have multiple cells with the same flow accumulation value.
The problem: I need to select all matrix cells of each value of flow accumulation (low to high), find their coordinates (row,column), and transfer the corresponding pollutant load to the correct adjacent cell using the flow direction matrix. I have tried various ways, but selecting the coordinates of the correct cells and applying these to another matrix I cannot get to work.
I will give an example of what I have tried, using two for loops on one single river basin. In this example, a flow direction value of 1 means that the pollutant load needs to be transferred to the adjacent cell to the right (row is the same, column +1):
BasinFlowAccumulation <-FlowAccumulation[Basin]
BasinFlowAccumulationMaximum <- max(BasinFlowAccumulation)
BasinFlowDirection <-FlowDirection[Basin]
BasinPollutant <-Pollutant[Basin]
b<-0
for(i in 0:BasinFlowAccumulationMaximum){
cells.index<-which(BasinFlowAccumulation[]==b, arr.ind=TRUE)
for (j in 1:length(cells.index)){
print(BasinFlowDirection[cells[j]])
Row<-BasinPollutant[cells[j[1]]]
Column<-BasinPollutant[cells[j[2]]]
ifelse(BasinFlowDirection[cells.index[j]]==1, BasinPollutant[Row,(Column+1)]<-BasinPollutant[Row,(Column+1)]+Basinpollutant[Row,Column]
}
b<-b+1
}
Any advice would be greatly appreciated!

Make a matrix full-ranked?

How can I turn a regular matrix into a matrix full-ranked in R? Is there an available method for that?
I have a matrix that may have linearly dependent columns and I need to
pass it to a function that requires its argument to be a matrix with
full rank. Since linearly dependent columns are not of interest
anyway, I am looking for a function that removes such columns until
the matrix is full rank. There may be several solutions of course, but
any one of them should be fine.
Right now I am just constructing the matrix column by column and only
add a column if its the resulting matrix is still fullrank, but it
feels like there should be a better way to do this.
Another approach is to minimize |y - Ax|2 + c |x|2,
by tacking an identity matrix on to A and zeros to y.
The parameter c (a.k.a. λ)
trades off fitting y - Ax, and keeping |x| small.
Then run a second fit with the r largest components of x,
r = rank(A) (or any number you please).

Resources