How to search and manipulate sparse matrices in Julia - julia

I have a large sparse matrix. I would like to do be able to do two things:
Find a row that has only one non-zero value. Let's call its row index idx.
Zero out column idx, found in 1. I would like to do this efficiently as the matrix is large.
I have tried reading https://docs.julialang.org/en/v1/stdlib/SparseArrays/ but I can't see how to do either.

If I understand you correctly this should work:
julia> using SparseArrays
# Dummy data
julia> A = sparse([1, 1, 2, 2, 3, 3], [1, 2, 3, 1, 2, 3], [2, 3, 0, 0, 0, 5])
3×3 SparseMatrixCSC{Int64,Int64} with 6 stored entries:
[1, 1] = 2
[2, 1] = 0
[1, 2] = 3
[3, 2] = 0
[2, 3] = 0
[3, 3] = 5
# Count non-zero elements across rows
julia> using StatsBase
julia> valcounts = countmap(A.rowval[A.nzval .!= 0])
Dict{Int64,Int64} with 2 entries:
3 => 1
1 => 2
# Find the row(s) with only one non-zero element
julia> [k for k ∈ keys(valcounts) if valcounts[k] == 1]
1-element Array{Int64,1}:
3
# Set the non-zero element in the third row to zero
julia> A[3, A[3, :] .> 0] .= 0
1-element view(::SparseMatrixCSC{Int64,Int64}, 3, [3]) with eltype Int64:
0
julia> A
3×3 SparseMatrixCSC{Int64,Int64} with 6 stored entries:
[1, 1] = 2
[2, 1] = 0
[1, 2] = 3
[3, 2] = 0
[2, 3] = 0
[3, 3] = 0

Related

Can we have Matrix of vectors in julia?

Can we form a julia matrix whose elements are vectors? I have tried the following but it only formed a matrix of Int.
julia> [[1, 2] [2, 3]; [4, 5] [3, 5]]
4×2 Matrix{Int64}:
1 2
2 3
4 3
5 5
But I wanted [1, 2] to be the first element not 1. Is it possible?
Here is one way to do it:
julia> [[[1, 2]] [[2, 3]]; [[4, 5]] [[3, 5]]]
2×2 Matrix{Vector{Int64}}:
[1, 2] [2, 3]
[4, 5] [3, 5]
another would be:
julia> reshape([[1, 2], [4, 5], [2, 3], [3, 5]], 2, 2)
2×2 Matrix{Vector{Int64}}:
[1, 2] [2, 3]
[4, 5] [3, 5]
but maybe there is some better solution.

Julia - Repeat entries of a vector by another vector (inner)

I have an array x and I would like to repeat each entry of x a number of times specified by the corresponding entries of another array y, of the same length of x.
x = [1, 2, 3, 4, 5] # Array to be repeated
y = [3, 2, 1, 2, 3] # Repetitions for each element of x
# result should be [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]
Is there a way to do this in Julia?
Your x and y vectors constitute what is called a run-length encoding of the vector [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]. So if you take the inverse of the run-length encoding, you will get the vector you are looking for. The StatsBase.jl package contains the rle and inverse_rle functions. We can use inverse_rle like this:
julia> using StatsBase
julia> x = [1, 2, 3, 4, 5];
julia> y = [3, 2, 1, 2, 3];
julia> inverse_rle(x, y)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
You've given what I would have suggested as the answer already in your comment:
vcat(fill.(x, y)...)
How does this work? Start with fill:
help?> fill
fill(x, dims::Tuple)
fill(x, dims...)
Create an array filled with the value x. For example, fill(1.0, (5,5)) returns a 5×5 array of floats, with each element initialized to 1.0.
This is a bit more complicated than it needs to be for our case (where we only have one dimension to fill into), so let's look at a simple example:
julia> fill(1, 3)
3-element Vector{Int64}:
1
1
1
so fill(1, 3) just means "take the number one, and put this number into a one-dimensional array 3 times."
This of course is exactly what we want to do here: for every element in x, we want an array that holds this element multiple times, with the multiple given by the corresponding element in y. We could therefore loop over x and y and do something like:
julia> for (xᵢ, yᵢ) ∈ zip(x, y)
fill(xᵢ, yᵢ)
end
Now this loop doesn't return anything, so we'd have to preallocate some storage and assign to that within the loop. A more concise way of writing this while automatically returning an object would be a comprehension:
julia> [fill(xᵢ, yᵢ) for (xᵢ, yᵢ) ∈ zip(x, y)]
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
and even more concisely, we can just use broadcasting:
julia> fill.(x, y)
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
so from the comprehension or the broadcast we are getting a vector of vectors, each vector being an element of x repeated y times. Now all that remains is to put these together into a single vector by concatenating them vertically:
julia> vcat(fill.(x, y)...)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
Here we are using splatting to essentially do:
z = fill.(x, y)
vcat(z[1], z[2], z[3], z[4], z[5])
Note that splatting can have suboptimal performance for arrays of variable length, so a better way is to use reduce which is special cased for this and will give the same result:
reduce(vcat, fill.(x, y))
If performance is a priority, you can also do it the long, manual way:
function runlengthdecode(vals::Vector{T}, reps::Vector{<:Integer}) where T
length(vals) == length(reps) || throw(ArgumentError("Same number of values and counts expected"))
result = Vector{T}(undef, sum(reps))
resind = 1
for (valind, numrep) in enumerate(reps)
for i in 1:numrep
#inbounds result[resind] = vals[valind]
resind += 1
end
end
result
end
This runs about 12 times faster than the vcat/fill based method for the given data, likely because of avoiding creating all the intermediate filled vectors.
You can also instead use fill! on the preallocated result's #views, by replacing the loop in above code with:
for (val, numrep) in zip(vals, reps)
fill!(#view(result[resind:resind + numrep - 1]), val)
resind += numrep
end
which has comparable performance.
Also, for completeness, a comprehension can be quite handy for this. And it's faster than fill and vcat.
julia> [x[i] for i=1:length(x) for j=1:y[i]]
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5

Julia transpose vector of vectors

I have a vector of vectors, say
julia> m=[[1,2],[3,4],[5,6]]
3-element Vector{Vector{Int64}}:
[1, 2]
[3, 4]
[5, 6]
which I want to transpose, meaning that I want a 2-element vector with the corresponding 3-element vectors (1,3,5 and 2,4,6).
This could obviously be done with loops, but I suspect that this is slow and am sure that Julia has a better solution for it. The best one I could come up with so far looks like that:
julia> matrixM=reshape(collect(Iterators.flatten(m)), (size(m[1],1),size(m,1)))
2×3 Matrix{Int64}:
1 3 5
2 4 6
julia> map(i->matrixM[i,:], 1:size(matrixM,1))
2-element Vector{Vector{Int64}}:
[1, 3, 5]
[2, 4, 6]
You can use:
julia> using SplitApplyCombine
julia> invert([[1,2],[3,4],[5,6]])
2-element Vector{Vector{Int64}}:
[1, 3, 5]
[2, 4, 6]

Apply - creating a matrix by combining two other matrices, using value from a vector to select the one to combine column from

I have a task that I am doing with an ordinary for loop. I think it can be done by one of apply functions but can't find a way to do it. Can you please if it is possible to apply apply to the problem or if there is more efficient way of solving it ?
I am aware that I can make udf and do.call() but I think it would be the same as for loop.
The problem:
I have two matrices a and b, both (m x n) and a vector of length n. I want to create third matrix (m x n) which would recieve columns from a or b based on the values of a vector.
For example:
a=
[0, 0, 0, 0]
[0, 0, 0, 0]
[0, 0, 0, 0]
[0, 0, 0, 0]
b=
[1, 1, 1, 1]
[1, 1, 1, 1]
[1, 1, 1, 1]
[1, 1, 1, 1]
x=
[-1, -1, 1, 1]
if x[k] is -1, c recieves column from a, if x[k] is 1 then c recieves column from b, which yields:
c=
[0, 0, 1, 0]
[0, 0, 1, 0]
[0, 0, 1, 0]
[0, 0, 1, 0]
Reproducible example:
a <- matrix(rep(0, 16), nrow = 4, ncol = 4)
b <- matrix(rep(1, 16), nrow = 4, ncol = 4)
x <- c(-1,-1, 1,-1)
c <- matrix(NA, nrow = 4, ncol = 4)
for (i in 1:length(x)){
if (x[[i]] < 0){
c[,i] <- a[,i]
} else {
c[,i] <- b[,i]
}
}
Is there any more efficient solution ?
Regards,
P.
We can either use ifelse after making the 'x' as the same length as 'a/b' by replicating each of the 'x' elements. The col is a convenient function to do that.
c <- a
c[] <- ifelse(x[col(a)]==-1, a, b)
Or as in the previous step, we create a logical vector (x==1), coerce to binary with +, make the length the same as 'a', specify the ncol in the matrix.
matrix(+(x==1)[col(a)], ncol=ncol(a))
# [,1] [,2] [,3] [,4]
#[1,] 0 0 1 0
#[2,] 0 0 1 0
#[3,] 0 0 1 0
#[4,] 0 0 1 0

In R, apply the cut (segmentation) of one vector to the other

Here is an example of the problem I have:
I have a vector v:
v <- 1:10
I can use Hmisc::cut2 to evenly split it into 5 groups, for which I first need:
library(Hmisc)
cut2(v, g=5)
To check:
table(cut2(v, g=5))
[1, 3) [3, 5) [5, 7) [7, 9) [9,10]
2 2 2 2 2
Now I have another vector:
v2 <- 1:8
I want to apply the exact same cut of v to v2 such that for v2 there are also 5 groups whereas the last group [9, 10] has 0 element. Is there an easy way to do this? Thanks!
Another way is to use v2 in order to index from cut2(v, g = 5)
table(cut2(v, g = 5)[v2])
# [1, 3) [3, 5) [5, 7) [7, 9) [9,10]
# 2 2 2 2 0
Does this work for you?
table(cut2(v2,cuts=c(1,3,5,7,9,10)))
[ 1, 3) [ 3, 5) [ 5, 7) [ 7, 9) [ 9,10]
2 2 2 2 0

Resources