permutations of choosing one element or other from two column matrix - r

Let's say we have a hypothetical complete schedule of potential outcomes from an experiment.
Y0<-c(10,15,20,20,10,15,15)
Y1<-c(15,15,30,15,20,15,30)
budgets<-matrix(data=c(Y0,Y1),nrow=7,ncol=2)
I would like to list all of the ways to choose two elements from Y1 and the remaining 5 from Y0. Ideally, this would look like an array of 21 lists, each with two elements labeled Y0 and two elements labeled Y1.
edit: These are matched pairs, so choosing y0[1] removes y1[1] from consideration.
Thanks in advance! I think there are many ways to approach this (sapply?) but would appreciate help on the details.

Here is a longer method, there is probably a more compact solution out there:
# get within group combinations as matrix
grp0 <-t(combn(Y0, 5))
grp1 <-t(combn(Y1, 2))
# get all possible combos of these rows
grpCombos <- expand.grid(1:nrow(grp1), 1:nrow(grp2))
# get all combinations as a matrix
allGroups <- cbind(grp0[grpCombos[,1],], grp1[grpCombos[,2],])

To get all the combinations of 2 elements from Y1 and and remaining 5 elements from Y0 and only choose one element from each position, try the following code:
cb <- as.data.frame(combn(1:7, 2))
sapply(cb, FUN = function(x) c(Y1[x], Y0[-x]))
previous: If you want all the combination of choose 2 from 7 within Y1 and choose 5 from 7 within Y0, the total combination number would be 21 * 21.

Related

Adding a row upwards in R

I'm using a recursive algorithm to generate samples and include them in a list. For that I was using rbind (since I dont know the final number of rows, so I cant just declare it and access trough list[i, ] to attribute the values).
The problem is I start sampling from the last value to the first, so my list is upside down.
Is there a way to use rbind to create a row upwards instead of downwards?
Example for ilustration:
Suppose you have x1 = c(1, 2) and x2 = c(3, 4)
if you do: rbind(x1, x2) you get:
1 2
3 4
But what I need is:
3 4
1 2
Remember that I cant just do rbind(x2, x1), because I'm sampling backwards, so I don't have all values before binding.

Find n closest non-NA values to position t in vector

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...
I have two examples of vectors that are common to the problem I am trying to solve, A and B:
A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)
#and three scalars
R <- 4
t <- 5
N <- 3
There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.
I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.
The expected answer can be one of two results for A:
answerA1 <- c(3,4,1)
OR
answerA2 <- c(3,4,3)
The expected answer for B:
answerB <- c(1,3)
How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!
Thanks in advance!
Note:
For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).
A relatively short solution is:
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
res
#[1] 3 4 3
Breaking this down a little more the steps are:
Order A, by the absolute distance from the position of interest, t.
Code is: A[order(abs(seq_len(length(A)) - t))]
Subset to the first R*2 elements (so this will get the elements on either side of t within R.
Code is: [seq_len(R*2)]
Get the first min(N, # of non-NA, len of non-NA) elements
Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
Drop NA
Code is: na.omit()
Take first elements determined in step 3 (whichever is smaller)
Code is: [seq_len(n_obj)]
Something like this?
thingfinder <- function(A,R,t,n) {
left <- A[t:(t-R-1)]
right <- A[t:(t+R+1)]
leftrightmat <- cbind(left,right)
raw_ans <- as.vector(t(leftrightmat))
ans <- raw_ans[!is.na(raw_ans)]
return(ans[1:n])
}
thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
## [1] 3 4 3
This would give priority to the left side, of course.
In case it is helpful to others, #Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:
A <- setNames(A, seq_len(length(A)))
orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]
positions <- as.numeric(names(res))

Averaging random elements of a vector in R

I have a list lets say of 4 items, I need to cut down that list to 2 by taking any two elements of that list and finding the average of that list.
This is the algorithm I came up with, I do not know how to write this in R.
choose an x_i
choose an x_j not equal to x_i
find the average of x_i and x_j
choose a new x_(i+1) and x_(j+1) as long as they are not equal to x_i or x_j
for example:
x <- c(2,4,6,8)
y <- c((2+4)/2,(6+8)/2) or c((2+6)/2,(2+8)/2) or anything similar to that.
For the sake of closing this question as answered, we can use the following syntax to do what we need to do: replicate(2, mean(sample(x, 2)))

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

What is the fastest way to lookup a large number of values using R?

I have a list of over 1,000,000 numbers. I have a lookup table that has a range of numbers and a category. For example, 0-200 is category A, 201-650 is category B (the ranges are not of equal length)
I need to simply iterate over the list of 1,000,000 numbers and get a list of the 1,000,000 corresponding categories.
EDIT:
For example, the first few elements of my list are - 100, 125.5, 807.5, 345.2, and it should return something like 1,1,8,4 as categories. The logic for the mapping is implemented in a function - categoryLookup(cd) and I'm using the following command to get the categories
cats <- sapply(list.cd, categoryLookup)
However, while this seems to be working quickly on lists of size up to 10000, it is taking a lot of time for the whole list.
What is the fastest way to do the same? Is there any form of indexing that can help speed up the process?
The numbers:
numbers <- sample(1:1000000)
groups:
groups <- sort(rep(letters, 40000))
lookup:
categories <- groups[numbers]
EDIT:
If you don't yet have the vector of "groups" you can create it first.
Assume you have data-frame with range info:
ranges <- data.frame(group=c("A","B","C"),
start=c(0,300001,600001),
end=c(300000,600000,1000000)
)
ranges
group start end
1 A 1 3e+05
2 B 300001 6e+05
3 C 600001 1e+06
# if groups are sorted and don't overlap:
groups <- rep(ranges$group, (ranges$end-ranges$start)+1)
Then continue as before
categories <- groups[numbers]
EDIT: as #jbaums said - you will have to add +1 to the (ranges$end-ranges$start) in this case. (already edited in the example above). Also in this case your starting coordinate should be 1 and not a 0

Resources