R - any function for multiple different-sized resampling - r

Is there any function can solve this kind of different-sized random resampling problem? For example, given a vector, data = c('a','a','b','c','d','e'). I want to randomly resample this vector into 3 groups with different sizes 1 ,3 ,2 respectively. Like
input: samplefunc(data,size = c(1,3,2))
output: c('a') c('a','d','e') c('b','c')
I only found this "sample" function, but it is only for one size sample:
sample(x, size, replace = FALSE, prob = NULL)
size: a non-negative integer giving the number of items to choose.
Since I have to divide the data into many groups(not just 3), if there is an existed function can do that, it will be much easier without the for-loop.

You can easily write your own function using, say, lapply, which would return a list of your samples:
samplefunc <- function(vec, size, ...) lapply(size, function(x) sample(vec, x, ...))
Usage would be as you imagined:
samplefunc(data, c(1, 3, 2))
As #thelatemail suggests, if you wanted to do sampling without replacement, you can try defining samplefunc as:
samplefunc <- function(vec, size) {
temp <- split(vec, sample(rep(size, size)))
temp[match(names(temp), as.character(size))]
}

Related

how to use apply (or sapply) with columns of matrix or dataframe as function args

I know this is a bonehead newbie question, but I've been trying to figure it out for quite awhile and need some input. Basically, I'm trying to learn how to use the apply family to omit for loops, specifically how to set up the call so that columns of a matrix serve as arguments to the function. I'll use a simple call to the rbinom function as an example.
Example: this for loop works fine. The data are a set of integers and a set of probabilities
success <- rep(-1, times=10) # initialize result var
num <- sample.int(20, 10) # get 10 random integers
p <- runif(10) # get 10 random probabilities
for (i in 1:10) {
success[i]= rbinom(n=1, size=num[i],prob=p[i]) # number successes in 1 trial
}
But how to do the same thing with the apply family? I first put the data into 2 columns of a matrix, thinking that was the right start. However, the following does NOT work, obviously due to my
poor understanding of how to set up a call to apply.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
success <- apply(myData, rbinom, n=1, size=myData[,1], prob=myData[,2])
Any tips are greatly appreciated! I'm coming to R from Fortran, and trying to port over a lot of code that is loaded with DO loops, so I really need to get my head around this.
lapply, sapply, apply only deal with one vector/list at a time. That is, apply will only call its function for one column at a time. What you need is mapply or Map.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
mapply(rbinom, n = 1, myData[,1], myData[,2])
# [1] 5 4 11 8 3 3 17 8 0 11
Just like lapply returns a list, so does Map; similarly, just like sapply, mapply will return a vector or array if all return values are compatible, otherwise it returns a list as well.
These calls are equivalent:
sapply(1:3, function(z) z + 1)
mapply(function(z) z + 1, 1:3)
but mapply and Map allow arbitrary number of lists/vectors, so for instance
func <- function(X,Y,Z) X^2+2*Y-Z
Map(func, 1:9, 11:19, 21:29)
## effectively the same as
list(
func(1, 11, 21),
func(2, 12, 22),
func(3, 13, 33),
...,
func(9, 19, 29)
)
The equivalent call of that with sapply for your data would be
sapply(seq_len(nrow(myData)), function(ind) {
rbinom(n = 1, size = myData[ind,1], prob = myData[ind,2])
})
though I personally feel that mapply is easier to read.

All combinations of two-way tables

How can I generate all two way tables from a data frame in R?
some_data <- data.frame(replicate(100, base::sample(1:4, size = 50, replace = TRUE)))
combos <- combn(names(some_data), 2)
The following does not work, was planning to wrap a for loop around it and store results from each iteration somewhere
i=1
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Why does this not work? individual arguments evaluate as expected:
some_data[combos[, i][1]]
some_data[combos[, i][2]]
Calling it with the variable names directly yields the desired result, but how to loop through all combos in this structure?
table(some_data$X1, some_data$X2)
With combn, there is the FUN argument, so we can use that to extract the 'some_data' and then get the table output in an array
out <- combn(names(some_data), 2, FUN = function(i) table(some_data[i]))
Regarding the issue in the OP's post
table(some_data[combos[, i][1]], some_data[combos[, i][2]])
Both of them are data.frames, we can extract as a vector and it should work
table(some_data[, combos[, i][1]], some_data[, combos[, i][2]])
^^ ^^
or more compactly
table(some_data[combos[, i]])
Update
combn by default have simplify = TRUE, that is it would convert the output to an array. Suppose, if we have combinations that are not symmetric, then this will result in different dimensions of the table output unless we convert it to factor with levels specified. An array can hold only a fixed dimensions. If some of the elements changes in dimension, it result in error as it is an array. One way is to use simplify = FALSE to return a list and list doesn't have that restriction.
Here is an example where the previous code fails
set.seed(24)
some_data2 <- data.frame(replicate(5, base::sample(1:10, size = 50,
replace = TRUE)))
some_data <- data.frame(some_data, some_data2)
out1 <- combn(names(some_data), 2, FUN = function(i)
table(some_data[i]), simplify = FALSE)
is.list(out1)
#[1] TRUE
length(out1)
#[1] 5460

Generating random Vectors in R

I am concerned with the following programming exercise in R:
Generate 10.000 4 dimensional vectors.
The components of the vector are generated from Bernoulli distribution with probability 0.5.
Detect all vectors with at least 3 '1'.
In order to generate one such sample I employ
sample(0:1, 4, replace = TRUE)
In order to generate vectors I use
x <- c(sample(0:1, 4, replace = TRUE))
Since I need 10.000 vectors, I use a for loop:
for(i in 1:10000){c(sample(0:1, 4, replace = TRUE))}
So, now I have 10.000 vectors.
In order to continue with the task, I should put all of the into a list.
Then, using a suitable if condition, I think it should be possible to conclude the task.
Can anyone help me?
Here is a solution for your problem:
set.seed(135)
n <- 10000
X <- matrix(rbinom(4*n, size=1, prob=0.5), nrow=n)
apply(X, 1, function(x) sum(x)>2)
#MarcoSandri's solution will be faster, but you could modify your solution this way to make it work
num = 0
for(i in 1:10000){
x = c(sample(0:1, 4, replace = TRUE))
if(sum(x) >= 3){
num = num + 1
}
}

How to skip an error in a loop

I want to skip an error (if there is any) in a loop and continue the next iteration. I want to compute 100 inverse matrices of a 2 by 2 matrix with elements randomly sampled from {0, 1, 2}. It is possible to have a singular matrix (for example,
1 0
2 0
Here is my code
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
repeat {
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
At the third iteration, the matrix is singular and the code stops running with an error message. In practice, I would like to bypass this error and continue to the next loop. I know I need to use a try or tryCatch function but I don't know how to use them. Similar questions have been asked here, but they are all really complicated and the answers are far beyond my understanding. If someone can give me a complete code specifically for this question, I really appreciate it.
This would put NULLs into inverses for the singular matrices:
inverses[[count]] <- tryCatch(solve(x), error=function(e) NULL)
If the first expression in a call to tryCatch raises an error, it executes and returns the value of the function supplied to its error argument. The function supplied to the error arg has to take the error itself as an argument (here I call it e), but you don't have to do anything with it.
You could then drop the NULL entries with inverses[! is.null(inverses)].
Alternatively, you could use the lower level try. The choice is really a matter of taste.
count <- 0
repeat {
if (count == 100) break
count <- count + 1
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
x.inv <- try(solve(x), silent=TRUE)
if ('try-error' %in% class(x.inv)) next
else inverses[[count]] <- x.inv
}
If your expression generates an error, try returns an object with class try-error. It will print the message to screen if silent=FALSE. In this case, if x.inv has class try-error, we call next to stop the execution of the current iteration and move to the next one, otherwise we add x.inv to inverses.
Edit:
You could avoid using the repeat loop with replicate and lapply.
matrices <- replicate(100, matrix(sample(0:2, 4, replace=T), 2, 2), simplify=FALSE)
inverses <- lapply(matrices, function(mat) if (det(mat) != 0) solve(mat))
It's interesting to note that the second argument to replicate is treated as an expression, meaning it gets executed afresh for each replicate. This means you can use replicate to make a list of any number of random objects that are generated from the same expression.
Instead of using tryCatch you could simply calculate the determinant of the matrix with the function det. A matrix is singular if and only if the determinant is zero.
Hence, you could test whether the determinant is different from zero and calculate the inverse only if the test is positive:
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
repeat {
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
# if (det(x)) inverses[[count]] <- solve(x)
# a more robust replacement for the above line (see comment):
if (is.finite(determinant(x)$modulus)) inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
Update:
It is, however, possible to avoid generating singular matrices. The determinant of a 2-by-2 matrix mat is definded as mat[1] * mat[4] - mat[3] * mat[2]. You could use this knowledge for sampling random numbers. Just do not sample numbers which will produce a singular matrix. This, of course, depends on the numbers sampled before.
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
set <- 0:2 # the set of numbers to sample from
repeat {
# sample the first value
x <- sample(set, 1)
# if the first value is zero, the second and third one are not allowed to be zero.
new_set <- ifelse(x == 0, setdiff(set, 0), set)
# sample the second and third value
x <- c(x, sample(new_set, 2, replace = T))
# calculate which 4th number would result in a singular matrix
not_allowed <- abs(-x[3] * x[2] / x[1])
# remove this number from the set
new_set <- setdiff(0:2, not_allowed)
# sample the fourth value and build the matrix
x <- matrix(c(x, sample(new_set, 1)), 2, 2)
inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
This procedure is a guarantee that all generated matrices will have an inverse.
try is just a way of telling R: "If you commit an error inside the following parentheses, then skip it and move on."
So if you're worried that x <- matrix(sample(0:2, 4, replace = T), 2, 2) might give you an error, then all you have to do is:
try(x <- matrix(sample(0:2, 4, replace = T), 2, 2))
However, keep in mind then that x will be undefined if you do this and it ends up not being able to compute the answer. That could cause a problem when you get to solve(x) - so you can either define x before try or just "try" the whole thing:
try(
{
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
inverses[[count]] <- solve(x)
}
)
The documentation for try explains your problem pretty well. I suggest you go through it completely.
Edit: The documentation example looked pretty straightforward and very similar to the op's question. Thanks for the suggestion though. Here goes the answer following the example in the documentation page:
# `idx` is used as a dummy variable here just to illustrate that
# all 100 entries are indeed calculated. You can remove it.
set.seed(1)
mat_inv <- function(idx) {
print(idx)
x <- matrix(sample(0:2, 4, replace = T), nrow = 2)
solve(x)
}
inverses <- lapply(1:100, function(idx) try(mat_inv(idx), TRUE))

R transition matrix into List of Lists

I would like to convert a vector into a transitions matrix first (which I managed). As a second step I would like apply the resulting function to a dataset where different respondents did different tasks.
As a result I would like to get a List which is nested on Respondent and Task.
Here is an example data frame:
Data <- data.frame(
respondent = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
task = c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,2,2,2,2,2),
acquisition = sample(1:5, replace = TRUE)
)
and here my result vector and function that takes the acquisition vector and generates a transition matrix:
result <- matrix(data = 0, nrow = 5, ncol = 5)
gettrans <- function(invec){
for (i in 1:length(invec)-1){
result[invec[i],invec[i+1]] <- result[invec[i], invec[i+1]] + 1
}
return(result)
}
Now, I get a flattened result with
with(Data,aggregate(acquisition,by=list(respondent=respondent,task=task),gettrans))
However what I would like would look something like:
$respondent
[1]$task[1]
result
$respondent
[1]$task[2]
result
...
I played around with dlply but could not get that to work ...
Any suggestions appreciated!
dlply naturally gives you a list (rather than a list of lists). The standard way of calling it would be
(ans_as_list <- dlply(
Data,
.(respondent, task),
summarise,
res = gettrans(acquisition)
))
This should be suitable for most purposes, but if you really must have a list of lists, use llply (or equivalently, lapply) to restructure.
(ans_as_list_of_lists <- llply(levels(factor(Data$respondent)), function(lvl)
{
ans_as_list[grepl(paste("^", lvl, sep = ""), names(ans_as_list))]
}))

Resources