transforming a matrix in R by - r

I want to make a new matrix B from a previous matrix A, where the length of rows and columns are the same in B and every position corresponds to a ranking of A.
In particular, for any x of a location [i,j] in A, I want to find how many values are greater than [i,j] (which sum(A>x), which I can find when x is discrete, but not for any x), followed by division by the total number of observations*variables in the matrix A.
I think using the apply function would be able to create matrix B as I wish, but I'm having trouble finding a way to apply use of "sum" for each position (i.e., sum(A>x)/# of positions in A.
I think I could use apply(A, c(1,2), FUN(X...)), but I do not know what function I can use.
Thanks for any suggestions.

Short version: matrix((length(M) - rank(M))/length(M), nrow=nrow(M), ncol=ncol(M))
Long version:
length(M) will give you the number of elements in the matrix.
length(M) - rank(M) will give the number of elements greater than each element.
So you want (length(M) - rank(M)) / length(M) but formatted into a matrix like M, so
matrix((length(M) - rank(M))/length(M), nrow=nrow(M), ncol=ncol(M))

Related

Find n closest non-NA values to position t in vector

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...
I have two examples of vectors that are common to the problem I am trying to solve, A and B:
A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)
#and three scalars
R <- 4
t <- 5
N <- 3
There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.
I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.
The expected answer can be one of two results for A:
answerA1 <- c(3,4,1)
OR
answerA2 <- c(3,4,3)
The expected answer for B:
answerB <- c(1,3)
How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!
Thanks in advance!
Note:
For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).
A relatively short solution is:
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
res
#[1] 3 4 3
Breaking this down a little more the steps are:
Order A, by the absolute distance from the position of interest, t.
Code is: A[order(abs(seq_len(length(A)) - t))]
Subset to the first R*2 elements (so this will get the elements on either side of t within R.
Code is: [seq_len(R*2)]
Get the first min(N, # of non-NA, len of non-NA) elements
Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
Drop NA
Code is: na.omit()
Take first elements determined in step 3 (whichever is smaller)
Code is: [seq_len(n_obj)]
Something like this?
thingfinder <- function(A,R,t,n) {
left <- A[t:(t-R-1)]
right <- A[t:(t+R+1)]
leftrightmat <- cbind(left,right)
raw_ans <- as.vector(t(leftrightmat))
ans <- raw_ans[!is.na(raw_ans)]
return(ans[1:n])
}
thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
## [1] 3 4 3
This would give priority to the left side, of course.
In case it is helpful to others, #Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:
A <- setNames(A, seq_len(length(A)))
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
positions <- as.numeric(names(res))

Averaging random elements of a vector in R

I have a list lets say of 4 items, I need to cut down that list to 2 by taking any two elements of that list and finding the average of that list.
This is the algorithm I came up with, I do not know how to write this in R.
choose an x_i
choose an x_j not equal to x_i
find the average of x_i and x_j
choose a new x_(i+1) and x_(j+1) as long as they are not equal to x_i or x_j
for example:
x <- c(2,4,6,8)
y <- c((2+4)/2,(6+8)/2) or c((2+6)/2,(2+8)/2) or anything similar to that.
For the sake of closing this question as answered, we can use the following syntax to do what we need to do: replicate(2, mean(sample(x, 2)))

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

Matlab or R: replace elements in matrix by values from another matrix in order

I have a problem to solve in either Matlab or R (preferably in R).
Imagine I have a vector A with 10 elements.
I have also a vector B with 30 elements, of which 10 have value 'x'.
Now, I want to replace all the 'x' in B by the corresponding values taken from A, in the order that is established in A. Once a value in A is taken, the next one is ready to be used when the next 'x' in B is found.
Note that the sizes of A and B are different, it's the number of 'x' cells that coincides with the size of A.
I have tried different ways to do it. Any suggestion on how to program this?
As long as the number of x entries in B matches the length of A, this will do what you want:
B[B=='x'] <- A
(It should be clear that this is the R solution.)
MATLAB Solution
In MATLAB it's quite simple, use logical indexing:
B(B == 'x') = A;

A more generalized expand.grid function?

expand.grid(a,b,c) produces all the combinations of the values in a,b, and c in a matrix - essentially filling the volume of a three-dimensional cube. What I want is a way of getting slices or lines out of that cube (or higher dimensional structure) centred on the cube.
So, given that a,b, c are all odd-length vectors (so they have a centre), and in this case let's say they are of length 5. My hypothetical slice.grid function:
slice.grid(a,b,c,dimension=1)
returns a matrix of the coordinates of points along the three central lines. Almost equivalent to:
rbind(expand.grid(a[3],b,c[3]),
expand.grid(a,b[3],c[3]),
expand.grid(a[3],b[3],c))
almost, because it has the centre point repeated three times. Furthermore:
slice.grid(a,b,c,dimension=2)
should return a matrix equivalent to:
rbind(expand.grid(a,b,c[3]), expand.grid(a,b[3],c), expand.grid(a[3],b,c))
which is the three intersecting axis-aligned planes (with repeated points in the matrix at the intersections).
And then:
slice.grid(a,b,c,dimension=3)
is the same as expand.grid(a,b,c).
This isn't so bad with three parameters, but ideally I'd like to do this with N parameters passed to the function expand.grid(a,b,c,d,e,f,dimension=4) - its unlikely I'd ever want dimension greater than 3 though.
It could be done by doing expand.grid and then extracting those points that are required, but I'm not sure how to build that criterion. And I always have the feeling that this function exists tucked in some package somewhere...
[Edit] Right, I think I have the criterion figured out now - its to do with how many times the central value appears in each row. If its less than or equal to your dimension+1...
But generating the full matrix gets big quickly. It'll do for now.
Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

Resources