Creating a matrix with random entries with given probabilities in R - r

I want to create a 100x100 matrix A with entry a_ij being randomly selected from the set {0,1} with P(a_ij=1)=0.2 and P(a_ij=0)=0.8.
This is what I’ve tried so far:
n<-100
matrix<-matrix(0,100,100)
mynumbers<-c(1,0)
myprobs<-c(0.2,0.8)
for(i in 1:100){
for (j in 1:100){
matrix[i,j]<-sample(mynumbers, 1, replace=TRUE, prob=myprobs)
}
}
matrix
I’m not sure about the sample size being 1, but this way only seems to work if I choose size=1... Is this the correct way to do it? Thank you in advance!

As #akrun noted there are much easier ways. A matrix of 100 x 100 means 10,000 entries. prob = .2 is saying success = 1 = P(a_ij=1)=0.2, size in this case means one trial at a time. The matrix parameters should be pretty self-evident.
set.seed(2020)
trials <- rbinom(n = 10000, size = 1, prob = .2)
my.matrix <- matrix(trials, nrow = 100, ncol = 100)
or to more closely resemble your code
n <- 10000
mynumbers<-c(1,0)
myprobs<-c(0.2,0.8)
trials2 <- sample(x = mynumbers,
size = n,
replace = TRUE,
prob = myprobs)
my.matrix2 <- matrix(trials2, nrow = 100, ncol = 100)

Related

Finding index of array of matrices, that is closest to each element of another matrix in R

I have an array Q which has size nquantiles by nfeatures by nfeatures. In this, essentially the slice Q[1,,] would give me the first quantile of my data, across all nfeatures by nfeatures of my data.
What I am interested in, is using another matrix M (again of size nfeatures by nfeatures) which represents some other data, and asking the question to which quantile do each of the elements in M lie in Q.
What would be the quickest way to do this?
I reckon I could do double for loop across all rows and columns of the matrix M and come up with a solution similar to this: Finding the closest index to a value in R
But doing this over all nfeatures x nfeatures values will be very inefficient. I am hoping that there might exist a vectorized way of approaching this problem, but I am at a lost as to how to approach this.
Here is a reproducible way of the slow way I can approach the problem with O(N^2) complexity.
#Generate some data
set.seed(235)
data = rnorm(n = 100, mean = 0, sd = 1)
list_of_matrices = list(matrix(data = data[1:25], ncol = 5, nrow = 5),
matrix(data = data[26:50], ncol = 5, nrow = 5),
matrix(data = data[51:75], ncol = 5, nrow = 5),
matrix(data = data[76:100], ncol = 5, nrow = 5))
#Get the quantiles (5 quantiles here)
Q <- apply(simplify2array(list_of_matrices), 1:2, quantile, prob = c(seq(0,1,length = 5)))
#dim(Q)
#Q should have dims nquantiles by nfeatures by nfeatures
#Generate some other matrix M (true-data)
M = matrix(data = rnorm(n = 25, mean = 0, sd = 1), nrow = 5, ncol = 5)
#Loop through rows and columns in M to find which index of the array matches up closest with element M[i,j]
results = matrix(data = NA, nrow = 5, ncol = 5)
for (i in 1:nrow(M)) {
for (j in 1:ncol(M)) {
true_value = M[i,j]
#Subset Q to the ith and jth element (vector of nqauntiles)
quantiles = Q[,i,j]
results[i,j] = (which.min(abs(quantiles-true_value)))
}
}
'''

build matrix in a for loop automatically in R

Suppose I have a code like this
probv=c(0.5,0.1,0.2,0.3)
N=c(1,2,3,4)
g1=matrix(rbinom(n = 10, size = N[1], prob = probv[1]), nrow=5)
g2=matrix(rbinom(n = 10, size = N[2], prob = probv[2]), nrow=5)
g3=matrix(rbinom(n = 10, size = N[3], prob = probv[3]), nrow=5)
g4=matrix(rbinom(n = 10, size = N[4], prob = probv[4]), nrow=5)
I want to use a for loop
for i in (1:J)
{......} J=4 in this case
use one line function to return the same output like this, I want to know
how I create a matrix g_ in the loop
which is also benefit for me when I increase the length
of my vector into 5,6,7......
for example N=c(1,2,3,4,5) probv=c(0.5,0.1,0.2,0.3,0.5)
I do not change my code to create another matrix called g5.The code can create it and I just need to change my input to achieve my goal
Thanks Akrun
what is my N is a three dimensional array, I want to map the last dimension of it? How to change in the map method?
probv=c(0.5,0.1,0.2,0.3)
N=array(1:24,c(3,2,4))
g1=matrix(rbinom(n = 10, size = N[,,1], prob = probv[1]), nrow=5)
g2=matrix(rbinom(n = 10, size = N[,,2], prob = probv[2]), nrow=5)
g3=matrix(rbinom(n = 10, size = N[,,3], prob = probv[3]), nrow=5)
g4=matrix(rbinom(n = 10, size = N[,,4], prob = probv[4]), nrow=5)
We can use Map to loop over the 'N' and 'probv' vector, get the corresponding values into rbinom and create a matrix. It returns a list of matrices
lst1 <- Map(function(x, y) matrix(rbinom(n = 10,
size = x, prob = y), nrow = 5), N, probv)
Or using for loop
lst2 <- vector('list', length(N))
for(i in seq_along(N)) {
lst2[[i]] <- matrix(rbinom(n = 10, size = N[i], prob = probv[i]), nrow = 5)
}
names(lst2) <- paste0("g", seq_along(lst2))
For the updated question to extract from an array
mnLength <- min(length(probv), dim(N)[3])
lst2 <- vector('list', mnLength)
for(i in seq_len(mnLength)) {
lst2[[i]] <- matrix(rbinom(n = 10, size = N[,,i], prob = probv[i]), nrow = 5)
}
names(lst2) <- paste0("g", seq_along(lst2))
lst2$g1
lst2$g2

How to simulate a probability event function?

I have an event that follows the below code (see previous question) that outputs the total number of success from n, binomial trials.
successes <- function(n, size = 1, prob = 0.01){
event <- function(n, size = 1, prob = 1/100){
trials <- rbinom(n = n, size = size, prob = prob)
sum(trials)
}
event(1000)
Where event(n) tells how many times the event did happen.
Now I would want to simulate the function (when n=1000) 300000 times and know how many times the event happened. (So not when n=300000 but what values does the above function return when it is repeated 300000 times).
Original function:
successes <- function(n, size = 1, prob = 0.01){
trials <- rbinom(n = n, size = size, prob = prob)
sum(trials)
}
Use the replicate function:
results <- replicate(n = 300000 , successes(1000, prob = .1), simplify = TRUE)
Which returns a vector with the function run 3e6 times.

Sample from one of two distributions

I want to repeatedly sample values based on a certain condition. For example I want to create a sample of 100 values.
With probability of 0.7 it will be sampled from one distribution, and from another probability, otherwise.
Here is a way to do what I want:
set.seed(20)
A<-vector()
for (i in 1:100){
A[i]<-ifelse(runif(1,0,1)>0.7,rnorm(1, mean = 100, sd = 20),runif(1, min = 0, max = 1))
}
I am sure there are other more elegant ways, without using for loop.
Any suggestions?
You can sample an indiactor, which defines what distribution you draw from.
ind <- sample(0:1, size = 100, prob = c(0.3, 0.7), replace = TRUE)
A <- ind * rnorm(100, mean = 100, sd = 20) + (1 - ind) * runif(100, min = 0, max = 1)
In this case you don't use a for-loop but you need to sample more random variables.
If the percentage of times is not random, you can draw the right amount of each distribution then shuffle the result :
n <- 100
A <- sample(c(rnorm(0.7*n, mean = 100, sd = 20), runif(0.3*n, min = 0, max = 1)))

how can I set the bin centre values of histogram myself?

Lets say I have a data frame like below
mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
Which then I can calculate the histogram on each of them columns using
matAllCols <- apply(mat, 2, hist)
Now if you look at matAllCols$breaks , you can see sometimes 11, sometimes 12 etc.
what I want is to set a threshold for it. for example it should always be 12 and the distances between each bin centre (which is stored as matAllCols$mids) be 0.01
Doing it for one column at the time seems to be simple, but when I tried to do it for all columns, it does not work. also this is only breaks, how to set the mids is also not straightforward
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = 12))
is there anyway to do this ?
You can solve the probrem by giving the all breakpoints between histogram cells as breaks. (But this is written in stat.ethz.ch/R-manual/R-devel/library/graphics/html/hist.html as #Colonel Beauvel said)
set.seed(1); mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
# You need to check the data range to decide the breakpoints.
range(mat) # [1] 0.002025041 0.483281274
# You can set the breakpoints manually.
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = seq(0, 0.52, 0.04)))
You are looking for
set.seed(1)
mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = seq(0, 0.5, 0.05)))
or simply
x <- rexp(200, rate = 10)
hist(x[x>=0 & x <=0.5] , breaks = seq(0, 0.5, 0.05))

Resources