Julia: broadcasting `findfirst()` across rows of a matrix - multidimensional-array

I want to find the index of the first value in each row of a matrix that satisfies some condition. I want to figure out how to do this without using array comprehensions.
This is how I would do it with an array comprehension:
# let's say we want to find the first column index, per row, where a number in that row is below some threshold.
threshold = 0.5;
data = randn(50,100);
first_threshold_crossings = [findfirst(data[i,:]<threshold) for i in 1:size(data,1)];
Yielding a list of indices that tells you where (column-wise) each row has a value that first drops below the threshold, going from left to right.
Any faster way you can imagine doing this?

Here's how you can do it:
julia> using Random # For RNG reproducability
julia> A = rand(Random.MersenneTwister(0), 3, 3)
3×3 Array{Float64,2}:
0.823648 0.177329 0.0423017
0.910357 0.27888 0.0682693
0.164566 0.203477 0.361828
julia> findfirst.(x < 0.1, eachrow(A))
3-element Array{Union{Nothing, Int64},1}:
3
3
nothing
Note that findfirst returns nothing if no index satisfies the condition.

Related

Julia: Return the entire vector if any of the element in vector is greater than 50

As I am new to Julia and I am trying to do some exercise
I have a vector
A = [[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
and I have B=[100,50, 50,100]
I wanted to get A with an entire row if any of the elements is greater than the value of B (in order)
when I use A[A.>B] I am getting only elements that are greater than the B value
Any help would be appreciated.
Assuming that A is (your code is incomplete):
A=[[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982] [101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
You could do something like:
julia> A[:,[any(col .> B) for col in eachcol(A)]]
4×2 Matrix{Float64}:
112.015 101.463
60.7645 45.0517
44.2842 29.8964
16.4525 9.86799
Since the OP states A is a vector, one can do an array comprehension,
[a for a in A if any(a.>B)]
or a direct indexing using broadcasting,
A[any.(A.>(B,))]
which both give a similar (same) vector:
2-element Vector{Vector{Float64}}:
[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.867985391578]
Surprisingly, the direct indexing with broadcasting is much faster at this short length.

How to find indices of all empty rows in a sparse matrix,

I have a large sparse matrix M. I would like to find the indices of all the empty rows in the matrix. How can you do that?
Julia uses the compressed sparse column (CSC) format for sparse matrix storage, which means that the row index for all stored values are available. You can thus find all rows which have no stored value by taking the set difference between 1:NROWS and the set of row indices:
julia> using SparseArrays
julia> A = rand(10, 10); A[3,:] .= 0; A[5,:] .= 0; S = sparse(A);
julia> idx = setdiff(Set(1:size(A, 1)), Set(S.rowval))
Set{Int64} with 2 elements:
3
5

Assign values to Julia Matrix based on Condition

I have a matrix of zeros A which has dimension (m x n). I have another matrix of some integer values b. b has length n. I want to have A be set to the identity wherever b has values greater than 5. So basically, for every row of A where b has value greater than 5, set it to the identity.
I tried to do this, but it's not working. Does anyone have an idea of how to do this in Julia?
using LinearAlgebra
usable_values = filter((x) -> x > 5, b)
# A[:, usable_values] = I
A[:, b .> 5] = I
I'm not certain I understand what you mean by "set to the identity": the identity matrix must be square, and hence a row or column of a matrix can't be equal to the identity matrix. I'll operate under the assumption that you want the entries to have value 1. In that case,
A[:, findall(b .> 5)] .= 1
is a simple one-liner. Let's discuss the elements here:
As proposed above, filter will select out the elements of b bigger than 5. But you want the indices of those elements, for which findall is the appropriate function.
Note the use of broadcasted assignment, .=. This means to assign the RHS to each element of the left side. That way you don't need to create a matrix on the RHS.
The loop approach is fine too, but for reasons of performance I'd put that in a function. See the performance tips.
If what you need is for every row of A where b has value greater than 5, set it to the identity this might be helpful to you, while you wait that for some of the gurus here can write the same in one line of code :)
n = 2
m = 5
A = zeros(m, n)
b = rand(1:10, m)
println(b)
for (cnt, value) in enumerate(b)
if value > 5
A[cnt, :] = ones(1, n)
end
end
A
The result I get is:
b = [4, 2, 6, 8, 1]
5×2 Array{Float64,2}:
0.0 0.0
0.0 0.0
1.0 1.0
1.0 1.0
0.0 0.0
I am fairly new to the language, this is the best I can do to help, for now.

Julia version of R's Match?

From R's help pages of match():
Description:
‘match’ returns a vector of the positions of (first) matches of its
first argument in its second.
That is, I can give two vectors, match(v1,v2) returns a vector where the i-th element is the index where v1[i] appears in v2.
Is there such a similar function for Julia? I cannot find it.
It sounds like you're looking for indexin (just as search fodder, this is also called ismember by Matlab). It is very slightly different: it returns a vector where the i'th element is the last index where v1[i] appears in v2.
julia> v1 = [8,6,7,11]; v2 = -10:10;
idxs = indexin(v1, v2)
4-element Array{Int64,1}:
19
17
18
0
It returns zero for the index of an element in v1 that does not appear in v2. So you can "reconstruct" the parts of v1 that are in v2 simply by indexing by the nonzero indices:
julia> v2[idxs[idxs .> 0]]
3-element Array{Int64,1}:
8
6
7
If you look at the implementation, you'll see that it uses a dictionary to store and look up the indices. This means that it only makes one pass over v1 and v2 each, as opposed to searching through v2 for every element in v1. It should be much more efficient in almost all cases.
If it's important to match R's behavior and return the first index, we can crib off the base implementation and just build the dictionary backwards so the lower indices overwrite the higher ones:
function firstindexin(a::AbstractArray, b::AbstractArray)
bdict = Dict{eltype(b), Int}()
for i=length(b):-1:1
bdict[b[i]] = i
end
[get(bdict, i, 0) for i in a]
end
julia> firstindexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
1
3
5
0
julia> indexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
2
4
6
0
I don't think this exists out of the box, but as #Khashaa's comment (and Tim Holy's answer to the other question) points out, you should be able to come up with your own definition fairly quickly. A first attempt:
function matched(v1::Array, v2::Array)
matched = zeros(length(v1))
for i = 1:length(v1)
matched[i] = findfirst(v2, v1[i])
end
return matched
end
(note that I called the function matched because match is defined in Base for string matching, if you wanted to extend it you'd have to import Base.match first). You could certainly make this faster applying some of the tricks from the Julia docs' performance section if you care about performance.
This function should be doing what you're looking for if I understand correctly, try it with e.g.
v1 = [rand(1:10) for i = 1:100]
v2 = [rand(1:10) for i = 1:100]
matched2(v1,v2)

R : adding the values in a [row,column] only if value is true in (same) row, (different) column

This is what I'm trying to code for in R...
Let's say I have 50 rows and 4 columns. If the value in (row 1, column 2) was greater than 5, then count the value in (row 1, column 4).
For an example: (row,column)
If (1,2) = (6) then count the value in (1,4)
If (2,2) = (8) then count the value in (2,4)
If (3,2) = (4) then DO NOT count the value in (3,4)
And so on....Then add the all the values from column 4.
How would I code this in R? I've tried creating a function, looping, if statements, etc.
This shouldn't be too hard! Just subset your data.frame or matrix (however your data is stored) to include only the values of the fourth column, only taking the rows for which the second column is greater than five.
sum(yourDataFrame[yourDataFrame[ ,2] > 5, 4])
Because many R functions are vectorized, it is often far easier (and faster!) to use vectorized functions like sum() than to loop. yourDataFrame[ ,2] > 5 will return a logical vector. Applying sum() to that vector will treat the TRUE entries as 1 and the FALSE entries as 0, thus providing a count of the values in yourDataFrame[ ,2] that are greater than 5.

Resources