I have got a three dimensional array outmv. I dissolve its third dimension with the help of alply, because I want a list of two dimensional matrices, whose numbers of columns will depend on v (see mapply) and whose number of rows is given by r.
I have got two problems:
1. can I keep alply from dropping the r-dimension (if r = 1)? (although my workaround with the help of matrix() seems to work)
2. if vend = 5, mapply degenerates my desired output: in case fun(1,5) the result is transposed, in case fun(2,5) the result is even mixed. I really would like to have length(v) matrices, each of having got r rows, for further steps.
library(plyr)
fun <- function(r,vend){
m <- matrix(c(1:20),nrow = r, byrow = TRUE)
v <- c(5:vend)
outmv <- outer(m,v) # dim(outmv) = c(dim(m),length(v))
#outmv_dropped_dim <- alply(outmv,3,function(x) x)
#if r = 1, unfortunately, alply drops a dimension automatically,
#therefore, I cannot use function(x) = x, but
outmv_kept_dim <- alply(outmv,3,function(x) matrix(x,nrow = r,
ncol = ncol(m)))
mapply(function(x,y) {x[,c(1:y), drop = FALSE]},
x = outmv_kept_dim,
y = v)
}
#11 > vend > 5 works as desired
fun(2,6) #list elements have desired dimension
fun(1,6) #list elements have desired dimension
#v_end = 5 does not work as desired
fun(1,5) #result is transposed
fun(2,5) #result is transposed and mixed and therefore worse
Related
Below, I'm wondering how to use BASE R function quantile() separately across elements in L that are named EFL and ESL?
Note: this is a toy example, L could contain any number of similarly named elements.
foo <- function(X) {
X <- as.matrix(X)
tab <- table(row(X), factor(X, levels = sort(unique(as.vector(X)))))
w <- diag(ncol(tab))
rosum <- rowSums(tab)
obs_oc <- tab * (t(w %*% t(tab)) - 1)
obs_c <- colSums(obs_oc)
max_oc <- tab * (rosum - 1)
max_c <- colSums(max_oc)
SA <- obs_c / max_c
h <- names(SA)
h[is.na(h)] <- "NA"
setNames(SA, h)
}
DAT <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/X.csv", row.names = 1)
L <- replicate(50, foo(DAT[sample(1:nrow(DAT), replace = TRUE),]), simplify = FALSE)
# How to use `quantile()` separately across all similarly named elements (e.g., EFL, ESL) in `L[[i]]` i = 1,... 5
# quantile(all EFL elements across `L`)
# quantile(all ESL elements across `L`)
The previous solution I used do.call to rbind each list into a matrix and array and then calculate the quantile over each data.frame row.
sapply(as.data.frame(do.call(rbind, L)), quantile)
However, when there is a missing row, it does not take that into account. To accurately get the rows you need to fill the missing rows. I used data.table's rbindlist (you could also use plyr::rbind.fill) with fill=TRUE to fill the missing values. It requires each to be a data.frame/table/list, so I converted each to a data.frame, but before doing so you need to transpose (t()) the data so that the rows line up to each element. It could be written in a single line, but it's easier read what is happening in multiple lines.
L2 = lapply(L, function(x){as.data.frame(t(x))})
df = data.table::rbindlist(L2, fill=TRUE) # or plyr::rbind.fill(L2)
sapply(df, quantile, na.rm = TRUE)
You can also use purrr::transpose:
Lt <- purrr::tranpose(L)
quantile(unlist(Lt$EFL),.8)
quantile(unlist(Lt$ESL),.8)
I have the following problem.
I have multiple subarrays (say 2) that I have populated with character labels (1, 2, 3, 4, 5). My algorithm selects labels at random based on occurrence probabilities.
How can I get R to instead select labels 1:3 for subarray 1 and 4:5 for subarray 2, say, without using subsetting (i.e., []). That is, I want a random subset of labels to be selected for each subarray, instead of all labels assigned to each subarray manually using [].
I know sample() should help.
Using subsetting (which I don't want) one would do
x <- 1:5
sample(x[1:3], size, prob = probs[1:3])
but this assigns labels 1:3 to ALL subarrays.
Would
sample(sample(x), size, replace = TRUE, prob = probs)
work?
Any ideas? Please let me know if this is unclear.
Here is a small example, which selects labels from 1:5 for each of 10 subarrays.
set.seed(1)
N <- 10
K <- 2
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
perms <- 5
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs) # I would like each subarray to contain a random subset of 1:5.
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
pop
Hopefully this helps.
I think what you actually want is something like that
num.specs <- 3
haps[sample(seq(haps),size = num.specs,replace = F)]
[1] 3 5 4
That is a random subset of your vector haps ?
Not quite what you want (returns list of matrices instead of 3D array) but this might help
lapply(split(1:5, cut(1:5, breaks=c(0, 2, 5))), function(i) matrix(sample(i, 25, replace=TRUE), ncol=5))
Use cut and split to partition your vector of character labels before sampling them. Here I split your character labels at the value 2. Also, rather than sampling 5 numbers 5 times, you can sample 25 numbers once, and convert to matrix.
I want to multiply and then sum the unique pairs of a vector, excluding pairs made of the same element, such that for c(1:4):
(1*2) + (1*3) + (1*4) + (2*3) + (2*4) + (3*4) == 35
The following code works for the example above:
x <- c(1:4)
bar <- NULL
for( i in 1:length(x)) { bar <- c( bar, i * c((i+1) : length(x)))}
sum(bar[ 1 : (length(bar) - 2)])
However, my actual data is a vector of rational numbers, not integers, so the (i+1) portion of the loop will not work. Is there a way to look at the next element of the set after i, e.g. j, so that I could write i * c((j : length(x))?
I understand that for loops are usually not the most efficient approach, but I could not think of how to accomplish this via apply etc. Examples of that would be welcome, too. Thanks for your help.
An alternative to a loop would be to use combn and multiply the combinations using the FUN argument. Then sum the result:
sum(combn(x = 1:4, m = 2, FUN = function(x) x[1] * x[2]))
# [1] 35
Even better to use prod in FUN, as suggested by #bgoldst:
sum(combn(x = 1:4, m = 2, FUN = prod))
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k). distMat is a huge matrix and indSpam is a long vector. In particular to my situation, length(distMat[1,]) is 2412. When I enter in k as a vector of length one, I get a vector of length 2412. I want to be able to enter in k as a vector of length two and get a matrix of 2412x2. I am trying to use a while loop to let it go through the length of k, but it only returns to me a vector of length 2412. What am I doing wrong?
predNeighbor = function(k, distMat, indSpam){
counter = 1
while (counter<(length(k)+1))
{
preMatrix = apply(distMat, 1, order)
orderedMatrix = t(preMatrix)
truncate = orderedMatrix[,1:k[counter]]
checking = indSpam[truncate]
checking2 = matrix(checking, ncol = k[counter])
number = apply(checking2, 1, sum)
return(number[1:length(distMat[1,])] > (k[counter]/2))
counter = counter + 1
}
}
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k)
Here's a function that does this.
foo <- function(k, distMat) {
return(matrix(0, nrow = length(distMat[1, ]), ncol = length(k)))
}
If you have other requirements, please describe them in words.
Based on your comment, I think I understand better your goal. You have a function that returns a vector of length k and you want to save it's output as rows in a matrix. This is a pretty common task. Let's do a simple example where k starts out as 1:10, and say we want to add some noise to it with a function foo() and see how the rank changes.
In the case where the input to the function is always the same, replicate() works very well. It will automatically put everything in a matrix
k <- 1:10
noise_and_rank <- function(k) {
rank(k + runif(length(k), min = -2, max = 2))
}
results <- replicate(n = 8, expr = {noise_and_rank(k)})
In the case where you want to iterate, i.e., the output from the one go is the input for the next, a for loop is good, and we just pre-allocate a matrix with 0's, to fill in one column/row at a time
k <- 1:10
n.sim <- 8
results <- matrix(0, nrow = length(k), ncol = n.sim)
results[, 1] <- k
for(i in 2:n.sim) {
results[, i] <- noise_and_rank(results[, i - 1])
}
What your original question seems to be about is how to do the pre-allocation. If the input is always the same, using replicate() means you don't worry about it. If the input is is different each time, then pre-allocate using matrix(), you don't need to write any special function.
Assume a matrix that contains all bit strings of length r and is in order.
library(gtools)
mat<-permutations(n = 2, r = 5, v = c(0,1), repeats.allowed = TRUE)
mat<-cbind(mat, round(runif(nrow(mat)), digits = 2))
and several vectors each with r elements:
r=5
vec<-t(replicate(100,sample(c(0,1),5,replace=T)))
For each vector (i.e, row in vec) I would like to identify the corresponding row in mat
Note: I would like to list the result for each row, not just the unique elements.
Is there an efficient way to do this without using a for loop?
Try
indx1 <- do.call(`paste0`,as.data.frame(mat[,-6]))
indx2 <- do.call(`paste0`, as.data.frame(vec))
sapply(indx2, function(x) mat[indx1 %in% x,6])