Creating an array with values that obey a criteria of smallest distance to the mean - julia

I am trying to build credible bands in Julia, however, there is a technical procedure that I am not aware of how to do. The code is the following:
#Significance level 95%
alpha_sign=0.05
#Genrate random values
N_1 = 100
Fs_1 = Array{Float64}(undef, length(x), N_1);
x = 0.0:0.01:1.0
for k in 1:100
f = rand(postΠ)
Fs_1[:,k] = f.(x)
end
# sup|theta_i(t)-average|
dif_b=Array{Float64}(undef, length(x),N);
for k in 1:100
dif_b[:,k] = Fs_1[:,k]-average_across
end
#Defining a function that allows to compute the n smallest values
using Base.Sort
function smallestn(a, n)
sort(a; alg=Sort.PartialQuickSort(n))[1:n]
end
#Compute the maximum of the difference across time
sup_b=Array{Float64}(undef, N_1)
for k in 1:100
sup_b[k]=(maximum(abs.(dif_b[:,k] )))
end
#Build a matrix with the smallest distances
N_min=(1-alpha_sign)*N
using Base.Sort
min_sup_b=smallestn(sup_b,95)
To simplify the problem I am creating this example:
Imagine I have the matrix down there and I want to create a matrix with the values that are closest to the mean. I am able to compute the distances and store into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
X=[1,2,7,4,5]
av_X=mean(X,dims=1)
Question:
I am able to compute the distances and store them into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
How do I do that?
Thanks in advance!

using Statistics
arr = rand(1:20, (4, 4))
colmeans = [mean(col) for col in eachcol(arr)]
deltas = map(cart -> abs(arr[cart] - colmeans[first(Tuple(cart))]) => cart, CartesianIndices(arr))
sorteddeltas = sort(deltas, lt = (x, y) -> first(x) < first(y), dims=1)
sarr = zeros(Int, (4, 4))
for (i, d) in enumerate(sorteddeltas)
sarr[i] = arr[last(d)]
end
println(arr) # [7 1 2 15; 18 7 14 10; 3 11 10 13; 7 14 20 8]
println(colmeans) # [8.75, 8.25, 11.5, 11.5]
println(sarr) # [7 11 10 13; 7 7 14 10; 3 14 2 8; 18 1 20 15]
println(sarr') # [7 7 3 18; 11 7 14 1; 10 14 2 20; 13 10 8 15]
This should give you a sorted list of pairs of the distances from the mean of each column, with the second part of the pair the Cartesian coordinates of the original matrix.
sarr is the original matrix sorted column-major by closeness to the mean for each column.

I think the function you are looking for is findmin(). It gives both the minimum value and its index.
julia> x = randn(5)
5-element Vector{Float64}:
-0.025159738348978562
-0.24720173332739662
-0.32508319212563325
0.9470582053428686
1.1467087893336048
julia> findmin(x)
(-0.32508319212563325, 3)
If you want to do this for every column in a matrix, you can do something like:
julia> X = randn(3, 5)
3×5 Matrix{Float64}:
1.06405 1.03267 -0.826687 -1.68299 0.00319586
-0.129021 0.0615327 0.0756477 1.05258 0.525504
0.569748 -0.0877886 -1.48372 0.823895 0.319364
julia> min_inds = [findmin(X[:, i]) for i = 1:5]
5-element Vector{Tuple{Float64, Int64}}:
(-0.12902069012799203, 2)
(-0.08778864856976668, 3)
(-1.4837211369655696, 3)
(-1.6829919363620507, 1)
(0.003195860366775878, 1)

Related

Outer product matrix multiplication

Trying to find the outer product of two matrices A and B
Here is what I have attempted:
function product(A, B)
n_a, m_a = size(A)
n_b, m_b = size(B)
AB = Array{Float64}(undef, n_a, m_b)
for i in 1:n_a
for j in 1:m_b
AB[i, j] = A[:, j] * B[j, :]'
end
end
return AB
end
product(A, B)
I get an error when attempting to run: Cannot `convert` an object of type Matrix{Float64} to an object of type Float64
I'm not exactly sure what you mean by "outer product of matrices", I'm only familiar with an outer product of vectors (which creates a matrix). Could you clarify what output you're looking for with an example?
In any event to address the immediate issue: The stacktrace is pointing to this line in your function:
AB[i, j] = A[:, j] * B[j, :]'
Let's take two example matrices and see what happens for i=j=1:
julia> x = [1 2; 3 4]
2×2 Matrix{Int64}:
1 2
3 4
julia> y = [2 3; 4 5]
2×2 Matrix{Int64}:
2 3
4 5
julia> x[:, 1] * y[1, :]'
2×2 Matrix{Int64}:
2 3
6 9
so the way you are slicing and transposing your matrices means you are calculating an outer product (as I know it) of two vectors, which gives you a matrix. Given that AB is defined as a matrix of Float64s, the location AB[i, j] can only hold a Float64 value, but you are trying to assign a Matrix{Float64} to it.
Again I'm not sure what exactly you are trying to achieve here, but if it's just a "normal" matrix multiplication, you should have
AB[i, j] = A[i, :]' * B[:, j]
in your inner loop. Making that change gives me:
julia> product(x, y) == x * y
true
for the x and y defined above.
Based on the image you linked to in the comment, you're calculating an array of matrices. This can be done as follows:
AB = [A[:,j]*B[j,:]' for j=1:size(A,2)]
Though, I don't believe that's the correct definition for matrix outer product. I think it should be like this:
AB = [r*c' for r in eachrow(A), c in eachcol(B)]
This will give you a [n_a*m_b] matrix of [m_a*n_b] matrices.
From the linked image and the text, I think the formula tries to show another representation for the usual matrix product, which is useful, especially when doing rank decomposition of matrices (as sums of rank one matrices).
As a concrete example, first defining a couple of matrices:
julia> using Random
julia> Random.seed!(1234);
julia> A = rand(1:10,(2,3))
2×3 Matrix{Int64}:
4 3 4
6 9 4
julia> B = rand(1:10,(3,4))
3×4 Matrix{Int64}:
10 8 1 7
8 6 2 10
5 8 5 7
Now define a product as a sum of outer products of vectors, and see it gives the same result as a usual matrix product:
julia> product(A, B) = sum( [ A[:,i] * B[i,:]' for i=1:size(A,2) ] )
product (generic function with 1 method)
julia> product(A,B)
2×4 Matrix{Int64}:
84 82 30 86
152 134 44 160
julia> A*B
2×4 Matrix{Int64}:
84 82 30 86
152 134 44 160
In this example, the resulting matrix can be at most rank 3, the number of columns in A (and rows in B). In fact, it is rank 2 because of number of rows, but in general this product is done on tall times flat matrices and then the rank restriction is meaningful.

Generating a Random Permutation in R

I try to implement a example using R in Simulation (2006, 4ed., Elsevier) by Sheldon M. Ross, which wants to generate a random permutation and reads as follows:
Suppose we are interested in generating a permutation of the numbers 1,2,... ,n
which is such that all n! possible orderings are equally likely.
The following algorithm will accomplish this by
first choosing one of the numbers 1,2,... ,n at random;
and then putting that number in position n;
it then chooses at random one of the remaining n-1 numbers and puts that number in position n-1 ;
it then chooses at random one of the remaining n-2 numbers and puts it in position n-2 ;
and so on
Surely, we can achieve a random permutation of the numbers 1,2,... ,n easily by
sample(1:n, replace=FALSE)
For example
> set.seed(0); sample(1:5, replace=FALSE)
[1] 1 4 3 5 2
However, I want to get similar results manually according to the above algorithmic steps. Then I try
## write the function
my_perm = function(n){
x = 1:n # initialize
k = n # position n
out = NULL
while(k>0){
y = sample(x, size=1) # choose one of the numbers at random
out = c(y,out) # put the number in position
x = setdiff(x,out) # the remaining numbers
k = k-1 # and so on
}
out
}
## test the function
n = 5; set.seed(0); my_perm(n) # set.seed for reproducible
and have
[1] 2 2 4 5 1
which is obviously incorrect for there are two 2 . How can I fix the problem?
You have implemented the logic correctly but there is only one thing that you need to be aware which is related to R.
From ?sample
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x
So when the last number is remaining in x, let's say that number is 4, sampling would take place from 1:4 and return any 1 number from it.
For example,
set.seed(0)
sample(4, 1)
#[1] 2
So you need to adjust your function for that after which the code should work correctly.
my_perm = function(n){
x = 1:n # initialize
k = n # position n
out = NULL
while(k>1){ #Stop the while loop when k = 1
y = sample(x, size=1) # choose one of the numbers at random
out = c(y,out) # put the number in position
x = setdiff(x,out) # the remaining numbers
k = k-1 # and so on
}
out <- c(x, out) #Add the last number in the output vector.
out
}
## test the function
n = 5
set.seed(0)
my_perm(n)
#[1] 3 2 4 5 1
Sample size should longer than 1. You can break it by writing a condition ;
my_perm = function(n){
x = 1:n
k = n
out = NULL
while(k>0){
if(length(x)>1){
y = sample(x, size=1)
}else{
y = x
}
out = c(y,out)
x = setdiff(x,out)
k = k-1
}
out
}
n = 5; set.seed(0); my_perm(n)
[1] 3 2 4 5 1

take a sample that has a specific mean

Let's say I have a population like {1,2,3, ..., 23} and I want to generate a sample so that the sample's mean equals 6.
I tried to use the sample function, using a custom probability vector, but it didn't work:
population <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)
mean(population)
minimum <- min(population)
maximum <- max(population)
amplitude <- maximum - minimum
expected <- 6
n <- length(population)
prob.vector = rep(expected, each=n)
for(i in seq(1, n)) {
if(expected > population[i]) {
prob.vector[i] <- (i - minimum) / (expected - minimum)
} else {
prob.vector[i] <- (maximum - i) / (maximum - expected)
}
}
sample.size <- 5
sample <- sample(population, sample.size, prob = prob.vector)
mean(sample)
The mean of the sample is about the mean of the population (oscillates around 12), and I wanted it to be around 6.
A good sample would be:
{3,5,6,8,9}, mean=6.2
{2,3,4,8,9}, mean=5.6
The problem is different from sample integer values in R with specific mean because I have a specific population and I can't just generate arbitrary real numbers, they must be inside the population.
The plot of the probability vector:
You can try this:
m = local({b=combn(1:23,5);
d = colMeans(b);
e = b[,d>5.5 &d<6.5];
function()sample(e[,sample(ncol(e),1)])})
m()
[1] 8 5 6 9 3
m()
[1] 6 4 5 3 13
breakdown:
b=combn(1:23,5) # combine the numbers into 5
d = colMeans(b) # find all the means
e = b[,d>5.5 &d<6.5] # select only the means that are within a 0.5 range of 6
sample(e[,sample(ncol(e),1)]) # sample the values the you need

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

Sampling in Matlab

So let me start off by saying that I do not have the statistics toolbox for Matlab so I am trying to find a way to work around this. In any case, what I am trying to do is to replicate the R sample function. For example, in R
> x = sample(1:5,20,replace=T,prob=c(.1,.1,.1,.1,.6))
> x
[1] 5 5 5 4 5 2 5 5 1 5 5 5 5 5 5 3 5 1 5 5
so I am sampling the integers 1,2,3,4,5 with replacement. But furthermore, I am sampling each integer with a certain proportion, i.e., the integer 5 should be sampled about 60% of the time.
So my question that I would like to find a solution to is how to achieve this in Matlab?
Here's how you can perform weighted sampling with replacement (something Matlab's randsample doesn't support, btw);
function r = sample(pop,n,weights)
%# each weight creates a "bin" of defined size. If the value of a random number
%# falls into the bin, we pick the value
%# turn weights into a normed cumulative sum
csWeights = cumsum(weights(:))/sum(weights);
csWeights = [0;csWeights(1:end-1)];
%# for each value: pick a random number, check against weights
idx = sum(bsxfun(#ge,rand(1,n),csWeights),1);
r = pop(idx);
The unweighted case is easy using randi.
function r = sample(pop, n)
imax = length(pop);
index = randi(imax, n, 1);
r = pop(index);
end
In the weighted case, something like this should do the trick:
function r = sample(pop, n, prob)
cumprob = cumsum(prob);
r = zeros(1, n);
for i = 1:n
index = find(rand < cumprob, 1, 'last');
r(i) = pop(index);
end
end
Here's one way to make your own sample function:
function x = sample(v, n, p)
pc = cumsum(p) / sum(p);
r = rand(1,n);
x = zeros(1,n);
for i = length(pc):-1:1
x(r<pc(i)) = v(i);
end
It's not exactly efficient, but it does what you want. Call it like so:
v = [1 2 3 4 5];
p = [.1 .1 .1 .1 .6];
n = 20;
x = sample(v,n,p);

Resources