Remove null column on julia array - julia

Beggining with Julia, I'm looking to remove the columns with 0 values. I have an array as bellow with a lot of null columns which I would like to remove.
115×40 Array{Float64,2}:
-0.0 -0.0 -0.0 -0.0 … -0.0 0.0 -0.0
0.0 -0.0 -0.0 -0.0 0.0 0.0 0.0
-0.0 -0.0 -0.0 -0.0 -0.0 0.0 -0.0
0.0 0.0 -0.0 -0.0 -0.0 0.0 0.0
0.0 0.0 0.0 -0.0 -0.0 0.0 0.0
-0.0 1.0 -0.0 0.0 … -0.0 0.0 0.0
-0.0 -0.0 0.0 -0.0 -0.0 0.0 0.0
0.0 -0.0 -0.0 -0.0 0.0 0.0 0.0
0.0 -0.0 0.0 -0.0 -0.0 0.0 0.0
⋮ ⋱
0.0 1.0 -0.0 -0.0 0.0 -0.0 -0.0
-0.0 -0.0 0.0 -0.0 0.0 -0.0 -0.0
1.0 0.0 -0.0 -0.0 0.0 -0.0 0.0
-0.0 0.0 -0.0 -0.0 … 0.0 -0.0 -0.0
0.0 0.0 -0.0 0.0 -0.0 -0.0 -0.0
-0.0 -0.0 -0.0 0.0 -0.0 -0.0 -0.0
0.0 -0.0 -0.0 0.0 -0.0 0.0 0.0
-0.0 -0.0 -0.0 -0.0 -0.0 1.0 0.0
Anyone knows how to do ?
Regards,

Let a be the array, then
a[:, vec(mapslices(col -> any(col .!= 0), a, dims = 1))]
works. mapslices reduces a to a 1x40 matrix of booleans, indicating the non-zero columns, and we need to convert that to a Vector for indexing, hence vec (alternatively, one could dropdims).
Depending on you application, a view instead of a copy might be enough.

Related

Create a non-linear structure in higher dimensional space in Julia

Suppose I’m creating a non-linear structure in a 20-dim space. Right now I have code
using Random
"`Uniform(0,b)`, with `0` excluded for sure, and we really mean it."
struct PositiveUniform{T}
b::T
end
function Base.rand(rng::Random.AbstractRNG, pu::PositiveUniform)
while true
r = rand(rng)
r > 0 && return r * pu.b
end
end
m = rand(PositiveUniform(20))
mat_new = [cos(m),sin(m),cos(m),sin(m),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]'
for i in 1:84
m = rand(PositiveUniform(20))
vector = [cos(m),sin(m),cos(m),sin(m),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
mat_new = vcat(mat_new, vector')
end
mat_new'
And my mat_new' is like
I'm wondering is this matrix satisfying my expectation?
(edit: I made a 2D structure 20 x 85, padding with zeros, since that is what was wanted, not a 20D array.)
const cols = 85
const rows = 20
const vec85 = rand(cols) .* rows
const mat2D = vcat(cos.(vec85)', sin.(vec85)', cos.(vec85)', sin.(vec85)', zeros(rows - 4, cols))
display(mat2D)
displays:
20×85 Matrix{Float64}:
-0.917208 -0.999591 -0.95458 -0.681959 0.999834 … 0.704834 0.961039 0.982991 0.967226 0.306118
0.398409 0.0286128 -0.297954 -0.731391 0.0182257 0.709372 0.276413 0.183653 -0.253917 -0.951993
-0.917208 -0.999591 -0.95458 -0.681959 0.999834 0.704834 0.961039 0.982991 0.967226 0.306118
0.398409 0.0286128 -0.297954 -0.731391 0.0182257 0.709372 0.276413 0.183653 -0.253917 -0.951993
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋱ ⋮
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Note I kept your algorithm (perhaps not intended?) of setting the argument to sin and cos as nrows times rand(). If you want to pad the 2D array to contain more than just your matrix, I would look at PaddedViews.jl or similar.

Convert an unlabeled NxN matrix to a table of position and values in R

I have an unlabeled N x N matrix like the one below. It is saved in a csv.
0.5 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.3 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.2 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0
0.1 0.0 0.0 0.0 0.0 0.2 0.0 0.7 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0
0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I want to convert this into a data table with x coordinates, y coordinates, and values as the columns as I believe this is what needs to be done to plot the matrix as a heatmap.
I am completely unfamiliar with R, besides basic syntax, so please be verbose in any suggestions!
Thank you all so much for any help you can provide!
We may read the data with read.table/read.csv, convert the data.frame object to matrix (as.matrix) and then add the table attribute (as.table) and convert to data.frame which will return a data.frame with three columns i.e. row, column and the value in the long format
as.data.frame(as.table(m1))
data
m1 <- as.matrix(read.table('file.txt', header = FALSE))

Cut Out Middle of String

This is what my data looks like
orthogroup12213.faa.aligned.treefile.rooting.0.gtpruned.rearrange.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0
orthogroup12706.faa.aligned.treefile.rooting.0.gtpruned.rearrange.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
...
I want to end with something like this (without the .faa.aligned.treefile.rooting.0.gtpruned.rearrange.0) :
orthogroup12213 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0
orthogroup12706 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I have tried a variety of 'cut' functions but with no luck. Please help!
I would use sed:
sed 's/\.[^ ]*//'
This says, "Find the first dot, and all characters that follow it that aren't a space, and replace them with nothing."
Your example shows 2 lines with both .faa.aligned.treefile.rooting.0.gtpruned.rearrange.0. When this is a fixed string AND the first part is always exact 15 positions you might use cut:
# bad solution, only cut
cut -c1-15,68- file
This solution can be marked as terrible. When the length of the startstring or middle part changes, you are out of order.
When you know that the string to remove starts with a dot and the first space is the next cutting point, you can use
# also bad
sed 's/[.]/ /' file | cut -d" " -f1,3-
It is nice to keep it simple with cut, but cut needs simple input.
First think what is the best way to find the middle string and use something like sed or awk for this.
# example with sed
str='.faa.aligned.treefile.rooting.0.gtpruned.rearrange.0'
sed 's/'$str'//' file

Transform UpperTriangular to Cholesky in Julia

Having a dataset X, I am trying to perform a Cholesky factorization, followed by a Cholesky update. My setting is the following:
data = readtable("PCA_transformed_data_gt1000.csv",header= true)
data = delete!(data, :1)
n,d = size(data)
s = 6.6172
S0 = s*eye(d)
kappa_0 = 0.001
nu_0 = d
mu_0 = zeros(d)
S0 = LinAlg.chol(S0+kappa_0*dot(mu_0,mu_0'))
The type of S0 is
julia> typeof(S0)
UpperTriangular{Float64,Array{Float64,2}}
I am trying to perform the Cholesky update as
U = sqrt((1+1/kappa_0)) * LinAlg.lowrankdowndate!(S0, sqrt(kappa_0)*mu_0)
and get the following error
ERROR: MethodError: no method matching lowrankdowndate!(::UpperTriangular{Float64,Array{Float64,2}}, ::Array{Float64,1})
Closest candidates are:
lowrankdowndate!(::Base.LinAlg.Cholesky{T,S<:AbstractArray{T,2}}, ::Union{Base.ReshapedArray{T,1,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray{T,1},SubArray{T,1,A<:Union{Base.ReshapedArray{T,N,A<:DenseArray,MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N}}},DenseArray},I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex,Colon,Int64,Range{Int64}},N}},L}}) at linalg/cholesky.jl:502
I tried something like
convert(S0,Base.LinAlg.Cholesky)
but got the following
ERROR: MethodError: First argument to `convert` must be a Type, got [2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.57239]
Any ideas how to perform that task?
There are actually two Cholesky factorization methods and it seems you need the other one, which returns a Cholesky variable. The other method is cholfact. From a Cholesky variable, you can extract an upper triangular factor by indexing with :U like so:
C = LinAlg.cholfact(M)
U = C[:U] # <--- this is upper triangular
For the code in the question, this becomes:
data = readtable("PCA_transformed_data_gt1000.csv",header= true)
data = delete!(data, :1)
n,d = size(data)
s = 6.6172
S0 = s*eye(d)
kappa_0 = 0.001
nu_0 = d
mu_0 = zeros(d)
S1 = LinAlg.cholfact(S0+kappa_0*dot(mu_0,mu_0))
U = sqrt((1+1/kappa_0)) * LinAlg.lowrankdowndate!(S1, sqrt(kappa_0)*mu_0)[:U]
The changes are to the dot product (transpose is unnecessary and causes problem in 0.6), and indexing the result of the lowrankdowndate! with [:U] to get the upper triangular matrix. Also, S1 is used for the result of cholfact instead of overwriting S0 for type stability.
Hope this helps.

syntax confusion: function call versus array indexing

Original title: "Kronecker product in Julia"
Assume:
p = 0.7;
PI = [p 1-p;1-p p];
and:
Q = zeros(20,20);
In Matlab we can run:
A=kron(PI(j,:),Q)
while in Julia:
A=kron[PI[j,:],Q]
this leads to the following error:
MethodError: no method matching getindex(::Base.#kron, ::Array{Float64,1}, ::Array{Float64,2})
How to address this and get a result similar to Matlab?
There are two uses of () in your line in Matlab:
A=kron(PI(j,:),Q)
The outer () surround the arguments being passed to the kron function, and the inner () provide the index into PI. In Julia (and Python, and C, and many languages) we use different symbols for these two distinct purposes.
In Julia, we use square brackets [ and ] for indexing, and ( and ) to surround function arguments.
So:
julia> kron(PI[1, :], Q)
40×20 Array{Float64,2}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[etc.]

Resources