matrix operations on matrix with "missing" elements - julia

I'm trying to analyse some experimental in a matrix and I'm having some issues.
For example I'd like to scale the columns of a matrix so that the first row of each column is 1.
I'd like to do it in the neat/clean julia way that I'm now starting to learn but I'm struggling to find a good solution.
The problem comes from the fact that each column is the result of some experimental test, and they have different lengths. I've "fixed" this by creating the matrix in excel, adding a missing in the empty cells at the bottom of the column and then copy pasting it in julia. I take this is probably not the best way to deal with the issue?
Example: (normal matrices are much bigger though)
A=[1 2 3
4 5 6
missing missing 9]
After that, I'd like to do some analysis, one of which is scaling the matrix so that the first row is = [1 1 1...1]. I tried both map
map((x,y)->x./y,A[2:end,:],A[1,:])
but it seems to apply the top row the the first N elements of the first column only.
Alternatively I tried with mapslices but I'm getting the following error MethodError: Cannot `convert` an object of type Missing to an object of type Float64
I have the feeling I'm missing something and my googlefoo is failing me... any help is much appreciated!
PS: Apologies if I missed some already answered question or if I missed some guideline, I'll try to improve my question if needed. It's the first time I post here!

I'm not sure what your first question is, it seems hard to answer without knowing what the data you're processing it looks like.
Your second question if I understand correctly should be as simple as:
julia> A ./ A[1, :]'
3×3 Matrix{Union{Missing, Float64}}:
1.0 1.0 1.0
4.0 2.5 2.0
missing missing 3.0
Edit to add:
Whether a matrix is or isn't a good idea here depends on the wider context, but if you have some vectors of numbers of different lengths, you can just put them in a vector of vectors rather than a matrix, which means they don't all have to have the same length:
julia> x = rand(3); y = rand(5);
julia> A = [x, y]
2-element Vector{Vector{Float64}}:
[0.2654489138174001, 0.8598585826482341, 0.43527866751212607]
[0.4702376843007643, 0.7890927390349933, 0.6073796489306595, 0.9178238662871376, 0.5917433487576529]
julia> A ./ first.(A)
2-element Vector{Vector{Float64}}:
[1.0, 3.2392620119731323, 1.6397831931290188]
[1.0, 1.6780721013637205, 1.2916439264833126, 1.9518296745015802, 1.2583920185758093]

Related

Julia: Return the entire vector if any of the element in vector is greater than 50

As I am new to Julia and I am trying to do some exercise
I have a vector
A = [[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
and I have B=[100,50, 50,100]
I wanted to get A with an entire row if any of the elements is greater than the value of B (in order)
when I use A[A.>B] I am getting only elements that are greater than the B value
Any help would be appreciated.
Assuming that A is (your code is incomplete):
A=[[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982] [101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
You could do something like:
julia> A[:,[any(col .> B) for col in eachcol(A)]]
4×2 Matrix{Float64}:
112.015 101.463
60.7645 45.0517
44.2842 29.8964
16.4525 9.86799
Since the OP states A is a vector, one can do an array comprehension,
[a for a in A if any(a.>B)]
or a direct indexing using broadcasting,
A[any.(A.>(B,))]
which both give a similar (same) vector:
2-element Vector{Vector{Float64}}:
[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.867985391578]
Surprisingly, the direct indexing with broadcasting is much faster at this short length.

About behaviour of / by vector in Julia

3/[2;2] gives
1×2 LinearAlgebra.Transpose{Float64,Array{Float64,1}}:
0.75 0.75
while 3 ./[2;2] gives
2-element Array{Float64,1}:
1.5
1.5
The second one is easy to comprehend. It broadcasts 3 and performs element wise division. But what is the reasoning behind having the first operation behave as it did? I assume it took the sum of the vector, which was 2x1, performed division of 3 by 4 and broadcast it to a 1x2 transposed vector. I can accept taking the sum of the vector to perform division, but why the transpose? Or why not just return a scalar?
It simply gives the right hand side operand's pseudo-inverse.
julia> ?/
...
Right division operator: multiplication of x by the inverse of y on the right.
Although it seems surprising at first sight, it is actually the natural behavior. A rowvector*columnvector gives a scalar and hence a scalar divided by a column vector should give a row vector, which is the case. Note that RowVector has been removed in 1.0 and what you get is actually a row vector represented with Transpose.
You can write #less 1 / [2;2] to see what actually happens.
Also take a look at this GitHub issue to understand the behaviour a bit more and this discourse topic for some use cases.
It seems it is calculating the pseudoinverse of the vector and then multiplying by 3.
Using #which 3/[2;2] and so on to see what actually happens, I found that it is eventually calling the following method in stdlib/LinearAlgebra/generic.jl:
function _vectorpinv(dualfn::Tf, v::AbstractVector{Tv}, tol) where {Tv,Tf}
res = dualfn(similar(v, typeof(zero(Tv) / (abs2(one(Tv)) + abs2(one(Tv))))))
den = sum(abs2, v)
# as tol is the threshold relative to the maximum singular value, for a vector with
# single singular value σ=√den, σ ≦ tol*σ is equivalent to den=0 ∨ tol≥1
if iszero(den) || tol >= one(tol)
fill!(res, zero(eltype(res)))
else
res .= dualfn(v) ./ den
end
return res
end
which in the given case effectively becomes transpose([2;2])/sum(abs2, [2;2]) which is the pseudoinverse.
However, this is a bit above my head. So someone more qualified might prove me wrong.

rank() function in R is ranking objects with floating points rather than integers

I'm quite new to R so this may seem quite trivial to many experienced programmers, sorry in advance!
I've got a numeric vector of length 8 that looks like this:
data <- c(45, 67, 23, 24, 5, 23, 45, 23)
When I type in: rank(data), R returns: [1] 6.5 8.0 3.0 5.0 1.0 3.0 6.5 3.0
However with my (very basic) understanding of rank, I expect R to return to me only whole numbers... such as:
[1] 6 8 2 5 1 3 7 4
How can rank() tell me the first element in data has a floating point ranking rather than a whole number ranking? Is it because there are values in data that are repeated and so rank() is trying to handle ties in a way that I am not expecting? If so, please tell me how I can fix this so I can get output that looks like what I previously expected. Also, any information on how rank() deals with NA values would be much appreciated. A basic description on rank() and what bells and whistles can be used would be fantastic! I've looked for videos on youtube and searched stackoverflow to no avail! Thanks so much.
From ?rank:
With some values equal (called ‘ties’), the argument ties.method determines the result at the corresponding indices. The "first" method results in a permutation with increasing values at each index set of ties. The "random" method puts these in random order whereas the default, "average", replaces them by their mean, and "max" and "min" replaces them by their maximum and minimum respectively, the latter being the typical sports ranking.
Sounds like you're using the default setting of "average" for tie breaking, which uses the mean, which is not necessarily an integer.
The built-in documentation should always be your first stop in looking for help. In this case (and most cases), it details all the "bells and whistles"---here there aren't many: just tie-handling and NA-handling. It also has examples at the bottom.

What is the best way to form inner products?

I was delighted to learn that Julia allows a beautifully succinct way to form inner products:
julia> x = [1;0]; y = [0;1];
julia> x'y
1-element Array{Int64,1}:
0
This alternative to dot(x,y) is nice, but it can lead to surprises:
julia> #printf "Inner product = %f\n" x'y
Inner product = ERROR: type: non-boolean (Array{Bool,1}) used in boolean context
julia> #printf "Inner product = %f\n" dot(x,y)
Inner product = 0.000000
So while i'd like to write x'y, it seems best to avoid it, since otherwise I need to be conscious of pitfalls related to scalars versus 1-by-1 matrices.
But I'm new to Julia, and probably I'm not thinking in the right way. Do others use this succinct alternative to dot, and if so, when is it safe to do so?
There is a conceptual problem here. When you do
julia> x = [1;0]; y = [0;1];
julia> x'y
0
That is actually turned into a matrix * vector product with dimensions of 2x1 and 1 respectively, resulting in a 1x1 matrix. Other languages, such as MATLAB, don't distinguish between a 1x1 matrix and a scalar quantity, but Julia does for a variety of reasons. It is thus never safe to use it as alternative to the "true" inner product function dot, which is defined to return a scalar output.
Now, if you aren't a fan of the dots, you can consider sum(x.*y) of sum(x'y). Also keep in mind that column and row vectors are different: in fact, there is no such thing as a row vector in Julia, more that there is a 1xN matrix. So you get things like
julia> x = [ 1 2 3 ]
1x3 Array{Int64,2}:
1 2 3
julia> y = [ 3 2 1]
1x3 Array{Int64,2}:
3 2 1
julia> dot(x,y)
ERROR: `dot` has no method matching dot(::Array{Int64,2}, ::Array{Int64,2})
You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.
The error message suggestion is dot(vec(x),vec(y), but sum(x.*y) also works in this case and is shorter.
julia> sum(x.*y)
10
julia> dot(vec(x),vec(y))
10
Now, you can write x⋅y instead of dot(x,y).
To write the ⋅ symbol, type \cdot followed by the TAB key.
If the first argument is complex, it is conjugated.
Now, dot() and ⋅ also work for matrices.
Since version 1.0, you need
using LinearAlgebra
before you use the dot product function or operator.

Converting matrix multiplication and sum function from Matlab to R

I'm converting a rather complicated set of code from Matlab to R. I have zero experience in Matlab and am a functioning novice in R.
I have a segment of code which reads (in matlab):
dSii=(sum(tao.*Sik,1))'-(sum(m'))'.*Sii-beta.*Sii./N.*(Iii+sum(Iik)');
Which I've simplified and will focus on the first segment (if I can solve the first segment I'm confident I can perform the rest):
J = (sum(A.*B,1))' - ...
tao (or A) and Sik (or B) are matrices. So my assumption is I'm performing matrix multiplication here (A * B)and summing the resultant column. The '1' is what is throwing me off in that statement. In R, that 1 would likely indicate we're talking about a sum of rows as opposed to columns(indicated by 2). But I can't find any supporting documentation for that kind of Matlab statement.
I was thinking of using a statement like this (but of course, too many '1's and ',')
J<- (apply(A*B, 1), 1, sum)
Thanks for all your help. I searched for other examples here and elsewhere and couldn't find an answer. I'm willing to work for it but this is akin to me studying French (which I don't know) to translate in Spanish (which I'm moderate in) while interpreting the whole process in English. :D
Because of the different conventions in R and Matlab, the idiosyncrasies have to be learned for each (just like your language analogy!). The Matlab command sum(A.*B,1) means multiply A and B element-wise, so they must be the same shape, and then sum along dimension 1, i.e. add each row together to get the column sums. Dimension 1 is the default so, sum(A.*B) would do the same thing as sum(A.*B,1). Because R treats * as element-wise for matrix multiplication, the following Matlab and R codes will produce the same column of numbers in J:
Matlab:
A=[[1,2,3];[4,5,6];[7,8,9]];
B=[[10,11,12];[13,14,15];[16,17,18]];
J=sum(A.*B,1)'; %the ' means to transpose the column sums to be a 3x1 matrix
R:
A<-matrix(c(1,2,3,4,5,6,7,8,9),3,byrow=T)
B<-matrix(c(10,11,12,13,14,15,16,17,18),3,byrow=T)
J<-matrix(colSums(A*B)) # no transpose needed here: nrow(J)==3

Resources