Can we form a julia matrix whose elements are vectors? I have tried the following but it only formed a matrix of Int.
julia> [[1, 2] [2, 3]; [4, 5] [3, 5]]
4×2 Matrix{Int64}:
1 2
2 3
4 3
5 5
But I wanted [1, 2] to be the first element not 1. Is it possible?
Here is one way to do it:
julia> [[[1, 2]] [[2, 3]]; [[4, 5]] [[3, 5]]]
2×2 Matrix{Vector{Int64}}:
[1, 2] [2, 3]
[4, 5] [3, 5]
another would be:
julia> reshape([[1, 2], [4, 5], [2, 3], [3, 5]], 2, 2)
2×2 Matrix{Vector{Int64}}:
[1, 2] [2, 3]
[4, 5] [3, 5]
but maybe there is some better solution.
Related
I have an array x and I would like to repeat each entry of x a number of times specified by the corresponding entries of another array y, of the same length of x.
x = [1, 2, 3, 4, 5] # Array to be repeated
y = [3, 2, 1, 2, 3] # Repetitions for each element of x
# result should be [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]
Is there a way to do this in Julia?
Your x and y vectors constitute what is called a run-length encoding of the vector [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]. So if you take the inverse of the run-length encoding, you will get the vector you are looking for. The StatsBase.jl package contains the rle and inverse_rle functions. We can use inverse_rle like this:
julia> using StatsBase
julia> x = [1, 2, 3, 4, 5];
julia> y = [3, 2, 1, 2, 3];
julia> inverse_rle(x, y)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
You've given what I would have suggested as the answer already in your comment:
vcat(fill.(x, y)...)
How does this work? Start with fill:
help?> fill
fill(x, dims::Tuple)
fill(x, dims...)
Create an array filled with the value x. For example, fill(1.0, (5,5)) returns a 5×5 array of floats, with each element initialized to 1.0.
This is a bit more complicated than it needs to be for our case (where we only have one dimension to fill into), so let's look at a simple example:
julia> fill(1, 3)
3-element Vector{Int64}:
1
1
1
so fill(1, 3) just means "take the number one, and put this number into a one-dimensional array 3 times."
This of course is exactly what we want to do here: for every element in x, we want an array that holds this element multiple times, with the multiple given by the corresponding element in y. We could therefore loop over x and y and do something like:
julia> for (xᵢ, yᵢ) ∈ zip(x, y)
fill(xᵢ, yᵢ)
end
Now this loop doesn't return anything, so we'd have to preallocate some storage and assign to that within the loop. A more concise way of writing this while automatically returning an object would be a comprehension:
julia> [fill(xᵢ, yᵢ) for (xᵢ, yᵢ) ∈ zip(x, y)]
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
and even more concisely, we can just use broadcasting:
julia> fill.(x, y)
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
so from the comprehension or the broadcast we are getting a vector of vectors, each vector being an element of x repeated y times. Now all that remains is to put these together into a single vector by concatenating them vertically:
julia> vcat(fill.(x, y)...)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
Here we are using splatting to essentially do:
z = fill.(x, y)
vcat(z[1], z[2], z[3], z[4], z[5])
Note that splatting can have suboptimal performance for arrays of variable length, so a better way is to use reduce which is special cased for this and will give the same result:
reduce(vcat, fill.(x, y))
If performance is a priority, you can also do it the long, manual way:
function runlengthdecode(vals::Vector{T}, reps::Vector{<:Integer}) where T
length(vals) == length(reps) || throw(ArgumentError("Same number of values and counts expected"))
result = Vector{T}(undef, sum(reps))
resind = 1
for (valind, numrep) in enumerate(reps)
for i in 1:numrep
#inbounds result[resind] = vals[valind]
resind += 1
end
end
result
end
This runs about 12 times faster than the vcat/fill based method for the given data, likely because of avoiding creating all the intermediate filled vectors.
You can also instead use fill! on the preallocated result's #views, by replacing the loop in above code with:
for (val, numrep) in zip(vals, reps)
fill!(#view(result[resind:resind + numrep - 1]), val)
resind += numrep
end
which has comparable performance.
Also, for completeness, a comprehension can be quite handy for this. And it's faster than fill and vcat.
julia> [x[i] for i=1:length(x) for j=1:y[i]]
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
I have a vector of vectors, say
julia> m=[[1,2],[3,4],[5,6]]
3-element Vector{Vector{Int64}}:
[1, 2]
[3, 4]
[5, 6]
which I want to transpose, meaning that I want a 2-element vector with the corresponding 3-element vectors (1,3,5 and 2,4,6).
This could obviously be done with loops, but I suspect that this is slow and am sure that Julia has a better solution for it. The best one I could come up with so far looks like that:
julia> matrixM=reshape(collect(Iterators.flatten(m)), (size(m[1],1),size(m,1)))
2×3 Matrix{Int64}:
1 3 5
2 4 6
julia> map(i->matrixM[i,:], 1:size(matrixM,1))
2-element Vector{Vector{Int64}}:
[1, 3, 5]
[2, 4, 6]
You can use:
julia> using SplitApplyCombine
julia> invert([[1,2],[3,4],[5,6]])
2-element Vector{Vector{Int64}}:
[1, 3, 5]
[2, 4, 6]
For the matrix M <- matrix(1:24, nrow = 6, ncol = 4), if you want to subset its elements by the index pairs, which are stored as rows in matrix indexes <- cbind(1:3, 2:4), you can do:
M[indexes]
# [1] 7 14 21
However, I haven't found a simple way to combine index pairs and index grids in slicing higher-dimensional arrays. For example, we have the array A <- array(1:24, dim = c(2,3,4), of which the first dimension should be kept and the last two dimensions are to be subsetted by index pairs. A wordy solution might be:
sapply(1:nrow(indexes), \(i) A[, indexes[i,1], indexes[i,2]])
# and structure the matrix back to the array form, if A has even higher dimension.
# [,1] [,2] [,3]
# [1,] 7 15 23
# [2,] 8 16 24
But I still want something as clean as
A[, indexes[,1], indexes[,2], pairwise = TRUE] # does it exist?
Update 1
Moreover, I have checked what the Numpy in Python has for their arrays, and I found the following example from this site:
X = np.arange(12).reshape((3, 4))
X
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]
# array([ 2, 5, 11])
X[row[:, np.newaxis], col] # broadcasting rules
# array([[ 2, 1, 3],
# [ 6, 5, 7],
# [10, 9, 11]])
The python version of the solution is quite simple and clear. In the same spirit,
I have a large sparse matrix. I would like to do be able to do two things:
Find a row that has only one non-zero value. Let's call its row index idx.
Zero out column idx, found in 1. I would like to do this efficiently as the matrix is large.
I have tried reading https://docs.julialang.org/en/v1/stdlib/SparseArrays/ but I can't see how to do either.
If I understand you correctly this should work:
julia> using SparseArrays
# Dummy data
julia> A = sparse([1, 1, 2, 2, 3, 3], [1, 2, 3, 1, 2, 3], [2, 3, 0, 0, 0, 5])
3×3 SparseMatrixCSC{Int64,Int64} with 6 stored entries:
[1, 1] = 2
[2, 1] = 0
[1, 2] = 3
[3, 2] = 0
[2, 3] = 0
[3, 3] = 5
# Count non-zero elements across rows
julia> using StatsBase
julia> valcounts = countmap(A.rowval[A.nzval .!= 0])
Dict{Int64,Int64} with 2 entries:
3 => 1
1 => 2
# Find the row(s) with only one non-zero element
julia> [k for k ∈ keys(valcounts) if valcounts[k] == 1]
1-element Array{Int64,1}:
3
# Set the non-zero element in the third row to zero
julia> A[3, A[3, :] .> 0] .= 0
1-element view(::SparseMatrixCSC{Int64,Int64}, 3, [3]) with eltype Int64:
0
julia> A
3×3 SparseMatrixCSC{Int64,Int64} with 6 stored entries:
[1, 1] = 2
[2, 1] = 0
[1, 2] = 3
[3, 2] = 0
[2, 3] = 0
[3, 3] = 0
Here is an example of the problem I have:
I have a vector v:
v <- 1:10
I can use Hmisc::cut2 to evenly split it into 5 groups, for which I first need:
library(Hmisc)
cut2(v, g=5)
To check:
table(cut2(v, g=5))
[1, 3) [3, 5) [5, 7) [7, 9) [9,10]
2 2 2 2 2
Now I have another vector:
v2 <- 1:8
I want to apply the exact same cut of v to v2 such that for v2 there are also 5 groups whereas the last group [9, 10] has 0 element. Is there an easy way to do this? Thanks!
Another way is to use v2 in order to index from cut2(v, g = 5)
table(cut2(v, g = 5)[v2])
# [1, 3) [3, 5) [5, 7) [7, 9) [9,10]
# 2 2 2 2 0
Does this work for you?
table(cut2(v2,cuts=c(1,3,5,7,9,10)))
[ 1, 3) [ 3, 5) [ 5, 7) [ 7, 9) [ 9,10]
2 2 2 2 0