I would like to sample a given vector with different sets of probabilities without a loop. Is there a way to do this?
For example in this code I would like to replace the loop with some sort of apply() structure or anything really ..
a <- c(1,2,3)
p <- matrix(c(.1,.1,.8,.1,.8,.1,.8,.1,.1), nrow=3)
s <- matrix(ncol=5, nrow=3)
for(i in 1:nrow(p)){
s[i,] <- sample(a, size=5, replace=T, prob=p[i,])
}
thanks for the help!
apply on p itself:
t(apply(p, 1, sample, x=a, size=5, replace=TRUE))
[,1] [,2] [,3] [,4] [,5]
[1,] 3 3 1 3 3
[2,] 2 2 1 1 2
[3,] 1 1 1 1 1
Edit I had a functional::Curry in here, until flodel pointed out that it wasn't necessary, as apply gives an automatic curry by allowing named arguments via ....
Related
I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1
I'm trying to turn
df<-matrix(1:4,nrow = 2,ncol = 2)
df
[,1] [,2]
[1,] 1 3
[2,] 2 4
into
matrix(c(2,4,1,3),nrow = 1,ncol = 4)
2 4 1 3
so that i can run it through a for loop to rbind many entries.
I've been trying
cbind(df[row 2,],df[row 1,])
but it's not working. Is there a simple way to do this that won't require me to separate the matrix and then bring it back together?
Here is another way. Without the call to matrix it returns a vector, not a matrix.
df <- matrix(1:4, 2)
matrix(c(t(df[nrow(df):1,])), 1)
# [,1] [,2] [,3] [,4]
#[1,] 2 4 1 3
We can use
t(c(t(df[nrow(df):1, ])))
# [,1] [,2] [,3] [,4]
#[1,] 2 4 1 3
Turning a comment into an answer, a fourth option is
rev(t(m[, ncol(m):1]))
# [1] 2 4 1 3
with
m <- matrix(1:4, 2)
Maybe you can try the code below
r <- unlist(rev(data.frame(t(df))))
or
r <- do.call(c,rev(split(df,1:nrow(df))))
or
r <- unlist(rev(split(df,1:nrow(df))))
I would like to replace the loops in the following code.
Test<-function(j){
card<-5
#matrix s is to hold the results
s <- matrix(rep(0,j*card),nrow=j,ncol=card,byrow=TRUE)
# Loop1
for (k in 1:j)
{
#A vector should be drawn from another matrix,
#for simplicity, I define a vector "sol" to be modified in Loop2
sol<-rep(1,card)
#Given the vector "sol", select a vector position randomly
#for a given no. of times (i.e. steps), say 10.
step<-10
# Loop2 - Modify value in sol
for (i in seq_len(step))
{
#Draw a position
r<-sample(seq_len(card),1)
#Each position has specific probabilities for
#assignment of possible values, meaning p is related to
#the position.
#For simplicity, just define the probabilities by random here.
p<-runif(3,0,1) # just create p for each step
p<-p/sum(p) #
#Finally, draw a value for the selected position and
#value of sol within this loop is kept changing.
sol[r]<-sample(1:3,1,prob=p)
}
# keep the result in matrix s.
s[k,]<-sol }
return(s)}
Given an input vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
It is expected to output a matrix like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 3 2 3
[2,] 1 1 1 1 3
[3,] 2 2 2 2 3
[4,] 2 1 2 2 1
[5,] 1 1 3 1 1
Each step in Loop2 depends on a probability vector, which is then used to change value in the sol. Then I tried to replace Loop2 with sapply as follows:
sapply(seq_len(steps), function(x){
r<-runif(seq_len(card),1)
sol[r]<-sample(1:3,1,prob=p) #Try to modify value in sol
})
s[k,]<-sol #Actually, no change in sol.
However, values in sol has no changed keeping all 1s, i.e. 1,1,1,1,1.
How can Loop2 be replaced by other apply family or other functions?
Thank you.
If I understand correctly what you're trying to achieve, you don't need apply() functions for this:
Test <- function(j) {
card <- 5
p<-runif(3,0,1)
p<-p/sum(p)
out <- matrix(sample(1:3, j*card, replace=T, prob=p), ncol=card, nrow=j)
return(out)
}
Test(5)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 2 2 1 1
[2,] 1 2 3 2 2
[3,] 2 3 1 1 2
[4,] 1 2 1 2 1
[5,] 2 1 1 2 2
In order to refactor this function, notice that all the r <- sample(card,1) are independent draws from the multinomial distribution. This can be pulled out of the loop.
The second thing to note is that the conditional distribution of s[i,j] given r is 1 if the multinomial draw is zero, otherwise it is sample(3,1,prob=runif(3)). (The distribution does not change if a cell is selected repeatedly).
Put those two facts together, and we have this:
Test2 <- function(j,card=5,step=10) {
r <- t(rmultinom(j,step,rep(1,card)))
s <- apply(r, 1:2, function(x) if(x > 0) sample(3,1,prob=runif(3)) else 1)
return(s)
}
What about that:
test2 <- function(j) {
card <- 5
# Create a matrix where each of the j*card row is a p as defined in your original function.
p <- matrix(runif(3*j*card), ncol=3)
p <- t(apply(p, 1, function(x) x/sum(x)))
# For each row of p, draw a single value at random
draws <- apply(p, 1, function(x) sample(1:3, 1, prob=x))
# Format the output as a j*card matrix
out <- matrix(draws, ncol=card, byrow=TRUE)
return(out)
}
If test2() does what you want, it's roughly 300 times faster than Test() on my machine.
Sorry, people, I can't see the forest for the trees. I searched a lot but couldn't find a solution. I want, e.g., the mean for every unit (potentially the rowMeans) of a subset of variables in a matrix (or potentially a dataframe) in R. I would like to select the columns using an indexing vector as in tapply, which I called a1 in the example below.
> set.seed(23958)
> (dat <- matrix(sample(0:3, 10, replace = TRUE), ncol = 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 0 2 1
[2,] 2 1 1 2 1
> set.seed(6112)
> (a1 <- sample(1:2, 5, replace = TRUE))
[1] 1 1 2 2 1
The solution in this example should look like this, but of course I would like to do it in a more comprehensive way. I was thinking I should use a function from the apply family, but I could not find out which one.
> cbind(rowMeans(dat[, a1 == 1]), rowMeans(dat[, a1 == 2]))
[,1] [,2]
[1,] 2.000000 1.0
[2,] 1.333333 1.5
You can still use tapply here:
do.call(rbind,
tapply(seq_len(ncol(dat)),a1,
function(i)rowMeans(dat[,i])))
If you transpose your data, you can use by:
t(do.call(rbind,by(t(dat),a1,colMeans)))
1 2
V1 2.000000 1.0
V2 1.333333 1.5
You could also use the aggregate function:
t(aggregate(t(dat), list(a1), mean))
I have a matrix of indices I where some of the indices are repeated. I put an example below.
I have another matrix A with dimensions compatible with the indices and initiated to 0 everywhere. I would like to do something like
A[I] += 1
I face two issues:
A[I] = A[I] + 1 is too inefficient
matrix I has redundant indices. For example rows 2 & 6 are identical and I would like to obtain A[1,2] = 2
A partial answer would be to create a 3 columns matrix with the two first columns being the product of unique(I) and the third column with the counts, but I don't see any solution for that either. Any pointer or help would be greatly appreciated!
> I is:
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 1
[6,] 1 2
[7,] 1 3
This may be quickest using sparse matrix methods (see the Matrix package and others).
Using standard matricies you could collapse the identical rows using the xtabs function then matrix assignment (edited based on comment):
I <- cbind(1, c(1:4,1:3))
tmp <- as.data.frame(xtabs( ~I[,1]+I[,2] ))
A <- matrix(0, nrow=5, ncol=5)
tmp2 <- as.matrix(tmp[,1:2])
tmp3 <- as.numeric(tmp2)
dim(tmp3) <- dim(tmp2)
A[ tmp3 ] <- tmp[,3]
A
You could probably make it a little quicker by pulling the core functionality out of as.data.frame.table rather than converting to data frame and back again.
Here is another version that may be more efficient. It will overwrite some 0's with other 0's computed by xtabs:
I <- cbind(1:5,1:5)
A <- matrix(0, 5, 5)
tmp <- xtabs( ~I[,2]+I[,1] )
A[ as.numeric(rownames(tmp)), as.numeric(colnames(tmp)) ] <- c(tmp)
A
If the A matrix has dimnames and the I matrix has the names instead of the indexes, then this later one will also work (just remove the as.numerics.
Here you go:
## Reproducible versions of your A and I objects
A <- matrix(0, nrow=2, ncol=5)
## For computations that follow, you'll be better off having this as a data.frame
## (Just use `I <- as.data.frame(I)` to convert a matrix object I).
I <- read.table(text=" 1 1
1 2
1 3
1 4
1 1
1 2
1 3", header=FALSE)
## Create data.frame with number of times each matrix element should
## be incremented
I$count <- ave(I[,1], I[,1], I[,2], FUN=length)
I <- unique(I)
## Replace desired elements, using a two column matrix (the "third form of
## indexing" mentioned in "Matrices and arrays" section" of ?"[").
A[as.matrix(I[1:2])] <- I[[3]]
A
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 2 2 1 0
# [2,] 0 0 0 0 0