A more generalized expand.grid function? - r

expand.grid(a,b,c) produces all the combinations of the values in a,b, and c in a matrix - essentially filling the volume of a three-dimensional cube. What I want is a way of getting slices or lines out of that cube (or higher dimensional structure) centred on the cube.
So, given that a,b, c are all odd-length vectors (so they have a centre), and in this case let's say they are of length 5. My hypothetical slice.grid function:
slice.grid(a,b,c,dimension=1)
returns a matrix of the coordinates of points along the three central lines. Almost equivalent to:
rbind(expand.grid(a[3],b,c[3]),
expand.grid(a,b[3],c[3]),
expand.grid(a[3],b[3],c))
almost, because it has the centre point repeated three times. Furthermore:
slice.grid(a,b,c,dimension=2)
should return a matrix equivalent to:
rbind(expand.grid(a,b,c[3]), expand.grid(a,b[3],c), expand.grid(a[3],b,c))
which is the three intersecting axis-aligned planes (with repeated points in the matrix at the intersections).
And then:
slice.grid(a,b,c,dimension=3)
is the same as expand.grid(a,b,c).
This isn't so bad with three parameters, but ideally I'd like to do this with N parameters passed to the function expand.grid(a,b,c,d,e,f,dimension=4) - its unlikely I'd ever want dimension greater than 3 though.
It could be done by doing expand.grid and then extracting those points that are required, but I'm not sure how to build that criterion. And I always have the feeling that this function exists tucked in some package somewhere...
[Edit] Right, I think I have the criterion figured out now - its to do with how many times the central value appears in each row. If its less than or equal to your dimension+1...
But generating the full matrix gets big quickly. It'll do for now.

Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

Related

How to find the common members between different vectors by a majority voting rule

I am trying to find the common elements between multiple vectors. The current situation is a little tricky, where the common elements do not need to be the totally same, but could have some errors, say +/- 1, even the common elements do not need to show in all of these vectors, which are chosen by a majority voting rule. Besides, these vectors have different lengths. Here is an example,
a <- c(5,7,11,18,27,30);
b <- c(5,8,18,26);
c <- c(6,7,10,26,30)
5 in a, 5 in b, 6 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 5;
7 in a, 8 in b, 7 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 7;
11 in a, 10 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 10;
The same rules apply to 18,26,30
Therefore, the final result that I should get is c(5,7,10,18,26,30)
You can do this via a moving window approach. First we can combine the values into a single vector, and then we use a moving window of size 'majority' (in this case, 2). If the values in the moving window are within the error range (in this case, 1), add the lowest value to our list of common values, then remove those values. Then continue checking the values.
Here it is as a function:
#vec_list -> list of the vectors to compare
#error -> margin of error to tolerate
#returns a vector of the common numbers
get_common_values = function(vec_list, error){
all = sort(unlist(vec_list)) #put all the numbers in one vector and sort them
majority = ceiling(length(vec_list)/2) #get the number of values that constitute a majority
common = c() #initialize an empty vector to store the values
#now we'll loop over 'all' and find common values - and we'll remove them as we go
#this is basically a moving window approach
while(length(all) >= majority){ #keep going until we've gone through all the elements
vals_i = all[1:majority] #get the values at the front
if(diff(range(vals_i)) <= error){ #if the range is <= to the error, then this value can be considered one of the 'common' values
new_val = min(vals_i) #get the minimum of the values
common = c(common, new_val) #add it to our vector of common values
all = all[!(all %in% new_val:(new_val+error))] #remove any of the values that fall within this range
} else {
all = all[2:length(all)] #if the range isn't greater than the error, just remove the first element
}
}
return(common)
}
We can use it like this:
a = c(5,7,11,18,27,30)
b = c(5,8,18,26)
c = c(6,7,10,26,30)
get_common_values(list(a,b,c),1)
This returns 5 7 10 18 26 30.
It also works with more than three vectors, and with different error tolerances:
set.seed(0)
a = sample(1:100,8)
b = sample(1:100,10)
c = sample(1:100,7)
d = sample(1:100,11)
e = sample(1:100,9)
get_common_values(list(a,b,c,d,e),2)
Note that this assumes that there are no duplicates within each vector, which seems to be a valid assumption based on the info you've provided.

transforming a matrix in R by

I want to make a new matrix B from a previous matrix A, where the length of rows and columns are the same in B and every position corresponds to a ranking of A.
In particular, for any x of a location [i,j] in A, I want to find how many values are greater than [i,j] (which sum(A>x), which I can find when x is discrete, but not for any x), followed by division by the total number of observations*variables in the matrix A.
I think using the apply function would be able to create matrix B as I wish, but I'm having trouble finding a way to apply use of "sum" for each position (i.e., sum(A>x)/# of positions in A.
I think I could use apply(A, c(1,2), FUN(X...)), but I do not know what function I can use.
Thanks for any suggestions.
Short version: matrix((length(M) - rank(M))/length(M), nrow=nrow(M), ncol=ncol(M))
Long version:
length(M) will give you the number of elements in the matrix.
length(M) - rank(M) will give the number of elements greater than each element.
So you want (length(M) - rank(M)) / length(M) but formatted into a matrix like M, so
matrix((length(M) - rank(M))/length(M), nrow=nrow(M), ncol=ncol(M))

Find n closest non-NA values to position t in vector

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...
I have two examples of vectors that are common to the problem I am trying to solve, A and B:
A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)
#and three scalars
R <- 4
t <- 5
N <- 3
There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.
I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.
The expected answer can be one of two results for A:
answerA1 <- c(3,4,1)
OR
answerA2 <- c(3,4,3)
The expected answer for B:
answerB <- c(1,3)
How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!
Thanks in advance!
Note:
For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).
A relatively short solution is:
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
res
#[1] 3 4 3
Breaking this down a little more the steps are:
Order A, by the absolute distance from the position of interest, t.
Code is: A[order(abs(seq_len(length(A)) - t))]
Subset to the first R*2 elements (so this will get the elements on either side of t within R.
Code is: [seq_len(R*2)]
Get the first min(N, # of non-NA, len of non-NA) elements
Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
Drop NA
Code is: na.omit()
Take first elements determined in step 3 (whichever is smaller)
Code is: [seq_len(n_obj)]
Something like this?
thingfinder <- function(A,R,t,n) {
left <- A[t:(t-R-1)]
right <- A[t:(t+R+1)]
leftrightmat <- cbind(left,right)
raw_ans <- as.vector(t(leftrightmat))
ans <- raw_ans[!is.na(raw_ans)]
return(ans[1:n])
}
thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
## [1] 3 4 3
This would give priority to the left side, of course.
In case it is helpful to others, #Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:
A <- setNames(A, seq_len(length(A)))
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
positions <- as.numeric(names(res))

How to match elements in 2 matrices and create a new matrix with specific values dependent on previous matrices in R?

I have two matrices, A and B both are 100*100. Both these matrices have positive and negative numbers. I want to create a matrix C with the same dimensions and the elements in this matrix are dependent on the elements at the same position in matrices A and B.
For example if I have 22 and 1 in position [1,1] in matrix A and B, I want to assign a value 1 at the same position in matrix C because both A and B have values above 0. Similarly for every values in C they are all dependent on whether values in matrices A AND B (in the same position) are above 0 or not. This is how my code looks like at the moment,
C<-matrix(0,100,100) #create a matrix with same dimensions and populated with 0
C[A>0 && B>0] = 1
My matrix A satisfies the condition A>0 as there are some negative and some positive values, matrix B also satisfies the condition B>0 as some values are negative and some positive. However my code does not result in a matrix C with values of 0 and 1, even when I know there are some positions which meet the requirement of both matrix A and B being above 0. Instead the matrix C always contains 0 for some reason.
Could any one let me know what I am doing wrong and how do I correct it or perhaps a different way to achieve this? Thanks
Does C[A>0 & B>0] = 1 work? && returns a single value, but & is vectorized so it will work on each cell individually.
This may not be the most efficient way to do it, but it works.
C <- matrix(0, 100, 100)
for (i in seq_along(C))
if (A[i] > 0 && B[i] > 0)
C[i] <- 1
When you create a sequence along a matrix using seq_along(), it goes through all elements in column-major order. (Thereby avoiding a double for loop.) And since the elements of A, B, and C match up, this should give you what you want.

Distances between two lists of position vectors

I am trying to get a matrix that contains the distances between the points in two lists.
The vector of points contain the latitude and longitude, and the distance can be calculated between any two points using the function distCosine in the geosphere package.
> Points_a
lon lat
1 -77.69271 45.52428
2 -79.60968 43.82496
3 -79.30113 43.72304
> Points_b
lon lat
1 -77.67886 45.48214
2 -77.67886 45.48214
3 -77.67886 45.48214
4 -79.60874 43.82486
I would like to get a matrix out that would look like:
d_11 d_12 d_13
d_21 d_22 d_23
d_31 d_32 d_33
d_41 d_42 d_43
I am struggling to think of a way to generate the matrix without just looping over Points_a and Points_b and calculating each combination, can anyone suggest a more elegant solution?
You can use this:
outer(seq(nrow(Points_a)),
seq(nrow(Points_b)),
Vectorize(function(i, j) distCosine(Points_a[i,], Points_b[j,]))
)
(based on tip by #CarlWitthoft)
According to the desired output you post, maybe you'll want the transpose t() of this, or simply replace _a with _b above.
EDIT: some explanation:
seq(nrow(Points_x)): creates a sequence from 1 to the number of rows of Points_x;
distCosine(Points_a[i,], Points_b[j,]): expression to compute the distance between points given by row i of Points_a and row j of Points_b;
function(i, j): makes the above an unnamed function in two parameters;
Vectorize(...): ensure that, given inputs i and j of length greater than one, the unnamed function above is called only once for each element of the vectors (see this for more info);
outer(x, y, f): creates "expanded" vectors x and y such that all combinations of its elements are present, and calls f using this input (see link above). The result is then reassembled into a nice matrix.

Resources