Sampling in Matlab - r

So let me start off by saying that I do not have the statistics toolbox for Matlab so I am trying to find a way to work around this. In any case, what I am trying to do is to replicate the R sample function. For example, in R
> x = sample(1:5,20,replace=T,prob=c(.1,.1,.1,.1,.6))
> x
[1] 5 5 5 4 5 2 5 5 1 5 5 5 5 5 5 3 5 1 5 5
so I am sampling the integers 1,2,3,4,5 with replacement. But furthermore, I am sampling each integer with a certain proportion, i.e., the integer 5 should be sampled about 60% of the time.
So my question that I would like to find a solution to is how to achieve this in Matlab?

Here's how you can perform weighted sampling with replacement (something Matlab's randsample doesn't support, btw);
function r = sample(pop,n,weights)
%# each weight creates a "bin" of defined size. If the value of a random number
%# falls into the bin, we pick the value
%# turn weights into a normed cumulative sum
csWeights = cumsum(weights(:))/sum(weights);
csWeights = [0;csWeights(1:end-1)];
%# for each value: pick a random number, check against weights
idx = sum(bsxfun(#ge,rand(1,n),csWeights),1);
r = pop(idx);

The unweighted case is easy using randi.
function r = sample(pop, n)
imax = length(pop);
index = randi(imax, n, 1);
r = pop(index);
end
In the weighted case, something like this should do the trick:
function r = sample(pop, n, prob)
cumprob = cumsum(prob);
r = zeros(1, n);
for i = 1:n
index = find(rand < cumprob, 1, 'last');
r(i) = pop(index);
end
end

Here's one way to make your own sample function:
function x = sample(v, n, p)
pc = cumsum(p) / sum(p);
r = rand(1,n);
x = zeros(1,n);
for i = length(pc):-1:1
x(r<pc(i)) = v(i);
end
It's not exactly efficient, but it does what you want. Call it like so:
v = [1 2 3 4 5];
p = [.1 .1 .1 .1 .6];
n = 20;
x = sample(v,n,p);

Related

R - partitionsSample - Error message saying n exceeds the maximum number of possible results

I am trying to split a number X into a defined number of random values so that the sum of this set of randomly generated values is X; this is done under the rule that none of the randomly generated values is greater than another value Y.
For example number 6 partitioned in 3 random values none of them greater than 4 (e.g. a possible results could either be 4,1,1 or 2,3,1 or 2,2,2)
For this I am using the partitionsSample function of the library RcppAlgos.
I have noticed that under some combinations, the function gives an error for example this works fine:
library(RcppAlgos)
goal <- 6
nPartitions <-3
MaxValue <- 4
m <- partitionsSample(v = MaxValue, m = nPartitions, repetition = TRUE, target = goal, n = 1)
m
[,1] [,2] [,3]
[1,] 1 1 4
but if I change the goal from 6 to 7 I get an error message:
goal <- 7
nPartitions <-3
MaxValue <- 4
m <- partitionsSample(v = MaxValue, m = nPartitions, repetition = TRUE, target = goal, n = 1)
Error: n exceeds the maximum number of possible results
that I can't understand because unless I am missing something, there should be valid solutions here, for example: (1,2,4) or (2,2,3) or (3,3,1)
Any help in understanding what is going on is highly appreciated!!

Generating a Random Permutation in R

I try to implement a example using R in Simulation (2006, 4ed., Elsevier) by Sheldon M. Ross, which wants to generate a random permutation and reads as follows:
Suppose we are interested in generating a permutation of the numbers 1,2,... ,n
which is such that all n! possible orderings are equally likely.
The following algorithm will accomplish this by
first choosing one of the numbers 1,2,... ,n at random;
and then putting that number in position n;
it then chooses at random one of the remaining n-1 numbers and puts that number in position n-1 ;
it then chooses at random one of the remaining n-2 numbers and puts it in position n-2 ;
and so on
Surely, we can achieve a random permutation of the numbers 1,2,... ,n easily by
sample(1:n, replace=FALSE)
For example
> set.seed(0); sample(1:5, replace=FALSE)
[1] 1 4 3 5 2
However, I want to get similar results manually according to the above algorithmic steps. Then I try
## write the function
my_perm = function(n){
x = 1:n # initialize
k = n # position n
out = NULL
while(k>0){
y = sample(x, size=1) # choose one of the numbers at random
out = c(y,out) # put the number in position
x = setdiff(x,out) # the remaining numbers
k = k-1 # and so on
}
out
}
## test the function
n = 5; set.seed(0); my_perm(n) # set.seed for reproducible
and have
[1] 2 2 4 5 1
which is obviously incorrect for there are two 2 . How can I fix the problem?
You have implemented the logic correctly but there is only one thing that you need to be aware which is related to R.
From ?sample
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x
So when the last number is remaining in x, let's say that number is 4, sampling would take place from 1:4 and return any 1 number from it.
For example,
set.seed(0)
sample(4, 1)
#[1] 2
So you need to adjust your function for that after which the code should work correctly.
my_perm = function(n){
x = 1:n # initialize
k = n # position n
out = NULL
while(k>1){ #Stop the while loop when k = 1
y = sample(x, size=1) # choose one of the numbers at random
out = c(y,out) # put the number in position
x = setdiff(x,out) # the remaining numbers
k = k-1 # and so on
}
out <- c(x, out) #Add the last number in the output vector.
out
}
## test the function
n = 5
set.seed(0)
my_perm(n)
#[1] 3 2 4 5 1
Sample size should longer than 1. You can break it by writing a condition ;
my_perm = function(n){
x = 1:n
k = n
out = NULL
while(k>0){
if(length(x)>1){
y = sample(x, size=1)
}else{
y = x
}
out = c(y,out)
x = setdiff(x,out)
k = k-1
}
out
}
n = 5; set.seed(0); my_perm(n)
[1] 3 2 4 5 1

Creating an array with values that obey a criteria of smallest distance to the mean

I am trying to build credible bands in Julia, however, there is a technical procedure that I am not aware of how to do. The code is the following:
#Significance level 95%
alpha_sign=0.05
#Genrate random values
N_1 = 100
Fs_1 = Array{Float64}(undef, length(x), N_1);
x = 0.0:0.01:1.0
for k in 1:100
f = rand(postΠ)
Fs_1[:,k] = f.(x)
end
# sup|theta_i(t)-average|
dif_b=Array{Float64}(undef, length(x),N);
for k in 1:100
dif_b[:,k] = Fs_1[:,k]-average_across
end
#Defining a function that allows to compute the n smallest values
using Base.Sort
function smallestn(a, n)
sort(a; alg=Sort.PartialQuickSort(n))[1:n]
end
#Compute the maximum of the difference across time
sup_b=Array{Float64}(undef, N_1)
for k in 1:100
sup_b[k]=(maximum(abs.(dif_b[:,k] )))
end
#Build a matrix with the smallest distances
N_min=(1-alpha_sign)*N
using Base.Sort
min_sup_b=smallestn(sup_b,95)
To simplify the problem I am creating this example:
Imagine I have the matrix down there and I want to create a matrix with the values that are closest to the mean. I am able to compute the distances and store into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
X=[1,2,7,4,5]
av_X=mean(X,dims=1)
Question:
I am able to compute the distances and store them into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
How do I do that?
Thanks in advance!
using Statistics
arr = rand(1:20, (4, 4))
colmeans = [mean(col) for col in eachcol(arr)]
deltas = map(cart -> abs(arr[cart] - colmeans[first(Tuple(cart))]) => cart, CartesianIndices(arr))
sorteddeltas = sort(deltas, lt = (x, y) -> first(x) < first(y), dims=1)
sarr = zeros(Int, (4, 4))
for (i, d) in enumerate(sorteddeltas)
sarr[i] = arr[last(d)]
end
println(arr) # [7 1 2 15; 18 7 14 10; 3 11 10 13; 7 14 20 8]
println(colmeans) # [8.75, 8.25, 11.5, 11.5]
println(sarr) # [7 11 10 13; 7 7 14 10; 3 14 2 8; 18 1 20 15]
println(sarr') # [7 7 3 18; 11 7 14 1; 10 14 2 20; 13 10 8 15]
This should give you a sorted list of pairs of the distances from the mean of each column, with the second part of the pair the Cartesian coordinates of the original matrix.
sarr is the original matrix sorted column-major by closeness to the mean for each column.
I think the function you are looking for is findmin(). It gives both the minimum value and its index.
julia> x = randn(5)
5-element Vector{Float64}:
-0.025159738348978562
-0.24720173332739662
-0.32508319212563325
0.9470582053428686
1.1467087893336048
julia> findmin(x)
(-0.32508319212563325, 3)
If you want to do this for every column in a matrix, you can do something like:
julia> X = randn(3, 5)
3×5 Matrix{Float64}:
1.06405 1.03267 -0.826687 -1.68299 0.00319586
-0.129021 0.0615327 0.0756477 1.05258 0.525504
0.569748 -0.0877886 -1.48372 0.823895 0.319364
julia> min_inds = [findmin(X[:, i]) for i = 1:5]
5-element Vector{Tuple{Float64, Int64}}:
(-0.12902069012799203, 2)
(-0.08778864856976668, 3)
(-1.4837211369655696, 3)
(-1.6829919363620507, 1)
(0.003195860366775878, 1)

Simulation loops for R Ping Pong

I need to find the probability Pr(X = i), i = 2, . . . , 6, by simulation using R when two players A and B agree that the winner of a game will get 1 point and the loser 0 points; the match ends as one of the players is ahead by 2 points or the number of games reaches 6. Suppose that the probabilities of A and B winning a game are 2 3 y 1 3 , respectively, and each game is independent. Let X denote the number of games needed to end the game.
I am applying the following code:
juegos<-rbinom(6,1,2/3)
juegos
A<-cumsum(juegos)
B<-cumsum(1-juegos)
K<-abs(A-B)==2
R<-rep(0,1000)
for(i in 1:1000)
{R[i]<-which.max(K)}
R
However I don´t know what is the next step to find the probabilities when i=2, 4 and 6.
Here is one way that uses a function to simulate a single match:
# Function to simulate one match
one_match = function(p = 2/3){
g = 0
score = 0
while (g < 6){
g = g + 1
# Play one game & update score
if (runif(1) < p)
score = score + 1
else
score = score - 1
if (abs(score) == 2) break
}
return(g)
}
# Simulate matches
n_sims = 100000
outcomes = replicate(n_sims, one_match())
# Or, with a different winning probability, say p = 1/2
# outcomes = replicate(n_sims, one_match(p = 1/2))
# Estimate probabilities
probs = table(outcomes)/n_sims
print(probs)
Cheers!

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

Resources