Julia version of R's Match? - r

From R's help pages of match():
Description:
‘match’ returns a vector of the positions of (first) matches of its
first argument in its second.
That is, I can give two vectors, match(v1,v2) returns a vector where the i-th element is the index where v1[i] appears in v2.
Is there such a similar function for Julia? I cannot find it.

It sounds like you're looking for indexin (just as search fodder, this is also called ismember by Matlab). It is very slightly different: it returns a vector where the i'th element is the last index where v1[i] appears in v2.
julia> v1 = [8,6,7,11]; v2 = -10:10;
idxs = indexin(v1, v2)
4-element Array{Int64,1}:
19
17
18
0
It returns zero for the index of an element in v1 that does not appear in v2. So you can "reconstruct" the parts of v1 that are in v2 simply by indexing by the nonzero indices:
julia> v2[idxs[idxs .> 0]]
3-element Array{Int64,1}:
8
6
7
If you look at the implementation, you'll see that it uses a dictionary to store and look up the indices. This means that it only makes one pass over v1 and v2 each, as opposed to searching through v2 for every element in v1. It should be much more efficient in almost all cases.
If it's important to match R's behavior and return the first index, we can crib off the base implementation and just build the dictionary backwards so the lower indices overwrite the higher ones:
function firstindexin(a::AbstractArray, b::AbstractArray)
bdict = Dict{eltype(b), Int}()
for i=length(b):-1:1
bdict[b[i]] = i
end
[get(bdict, i, 0) for i in a]
end
julia> firstindexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
1
3
5
0
julia> indexin([1,2,3,4], [1,1,2,2,3,3])
4-element Array{Int64,1}:
2
4
6
0

I don't think this exists out of the box, but as #Khashaa's comment (and Tim Holy's answer to the other question) points out, you should be able to come up with your own definition fairly quickly. A first attempt:
function matched(v1::Array, v2::Array)
matched = zeros(length(v1))
for i = 1:length(v1)
matched[i] = findfirst(v2, v1[i])
end
return matched
end
(note that I called the function matched because match is defined in Base for string matching, if you wanted to extend it you'd have to import Base.match first). You could certainly make this faster applying some of the tricks from the Julia docs' performance section if you care about performance.
This function should be doing what you're looking for if I understand correctly, try it with e.g.
v1 = [rand(1:10) for i = 1:100]
v2 = [rand(1:10) for i = 1:100]
matched2(v1,v2)

Related

Julia: Return the entire vector if any of the element in vector is greater than 50

As I am new to Julia and I am trying to do some exercise
I have a vector
A = [[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
and I have B=[100,50, 50,100]
I wanted to get A with an entire row if any of the elements is greater than the value of B (in order)
when I use A[A.>B] I am getting only elements that are greater than the B value
Any help would be appreciated.
Assuming that A is (your code is incomplete):
A=[[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982] [101.46307715025503, 45.051658067785084, 29.896435433335395, 9.8679853915780]]
You could do something like:
julia> A[:,[any(col .> B) for col in eachcol(A)]]
4×2 Matrix{Float64}:
112.015 101.463
60.7645 45.0517
44.2842 29.8964
16.4525 9.86799
Since the OP states A is a vector, one can do an array comprehension,
[a for a in A if any(a.>B)]
or a direct indexing using broadcasting,
A[any.(A.>(B,))]
which both give a similar (same) vector:
2-element Vector{Vector{Float64}}:
[112.01507313113326, 60.7645449470438, 44.284185340771124, 16.4524736204982]
[101.46307715025503, 45.051658067785084, 29.896435433335395, 9.867985391578]
Surprisingly, the direct indexing with broadcasting is much faster at this short length.

Find numeric placement of letters

Looking to find the numeric placement of letters in a random letter vector using a function equivalent to foo.
myletters = ["a","c","b","d","z"]
foo(myletters)
# [1,3,2,4,26]
Edit: If you're looking for the numeric distance from 'a', here's one solution:
julia> Int.(first.(["a","c","b","d","z"])) - Int('a') + 1
5-element Array{Int64,1}:
1
3
2
4
26
It will gracefully handle unicode (those simply are later code points and thus will have larger values) and longer strings (by only looking at the first character). Capitals, numbers, and some symbols will appear as negative numbers since their code points come before a.
Previous answer: I think you're looking for sortperm. It gives you a vector of indices that, if you index back into the original array with it, will put it in sorted order.
julia> sortperm(["a","c","b","d"])
4-element Array{Int64,1}:
1
3
2
4
I came up with the somewhat convoluted solution:
[reshape((1:26)[myletters[i] .== string.('a':'z')],1)[1] for i=1:length(myletters)]
Or using map
map(x -> reshape((1:26)[x .== string.('a':'z')],1)[1], myletters)

Julia - Reshaping an array according to a vector

I have an array of arrays, a
49455-element Array{Array{AbstractString,1},1}
the length varies, this is just one of many possibilities
I need to do a b = vcat(a...) giving me
195158-element Array{AbstractString,1}:
and convert it to a SharedArray to have all cores work on the strings in it (I'll convert to a Char matrix behind the curtians, but this is not important)
In a, every element is an array of some number of strings, which I do
map(x -> length(x), a)
49455-element Array{Int64,1}:
1
4
8
.
.
2
Is there a way I can easily resotre the array b to the same dimensions of a?
With the Iterators.jl package:
# `a` holds original. `b` holds flattened version. `newa` should == `a`
using Iterators # install using Pkg.add("Iterators")
lmap = map(length,a) # same length vector defined in OP
newa = [b[ib+1:ie] for (ib,ie) in partition([0;cumsum(lmap)],2,1)]
This is somewhat neat, and can also be used to produce a generator for the original vectors, but a for loop implementation should be just as fast and clear.
As a complement to Dan Getz's answer, we can also use zip instead of Iterators.jl's partition:
tails = cumsum(map(length,a))
heads = [1;tails+1][1:end-1]
newa = [b[i:j] for (i,j) in zip(heads,tails)]

What is the best way to form inner products?

I was delighted to learn that Julia allows a beautifully succinct way to form inner products:
julia> x = [1;0]; y = [0;1];
julia> x'y
1-element Array{Int64,1}:
0
This alternative to dot(x,y) is nice, but it can lead to surprises:
julia> #printf "Inner product = %f\n" x'y
Inner product = ERROR: type: non-boolean (Array{Bool,1}) used in boolean context
julia> #printf "Inner product = %f\n" dot(x,y)
Inner product = 0.000000
So while i'd like to write x'y, it seems best to avoid it, since otherwise I need to be conscious of pitfalls related to scalars versus 1-by-1 matrices.
But I'm new to Julia, and probably I'm not thinking in the right way. Do others use this succinct alternative to dot, and if so, when is it safe to do so?
There is a conceptual problem here. When you do
julia> x = [1;0]; y = [0;1];
julia> x'y
0
That is actually turned into a matrix * vector product with dimensions of 2x1 and 1 respectively, resulting in a 1x1 matrix. Other languages, such as MATLAB, don't distinguish between a 1x1 matrix and a scalar quantity, but Julia does for a variety of reasons. It is thus never safe to use it as alternative to the "true" inner product function dot, which is defined to return a scalar output.
Now, if you aren't a fan of the dots, you can consider sum(x.*y) of sum(x'y). Also keep in mind that column and row vectors are different: in fact, there is no such thing as a row vector in Julia, more that there is a 1xN matrix. So you get things like
julia> x = [ 1 2 3 ]
1x3 Array{Int64,2}:
1 2 3
julia> y = [ 3 2 1]
1x3 Array{Int64,2}:
3 2 1
julia> dot(x,y)
ERROR: `dot` has no method matching dot(::Array{Int64,2}, ::Array{Int64,2})
You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.
The error message suggestion is dot(vec(x),vec(y), but sum(x.*y) also works in this case and is shorter.
julia> sum(x.*y)
10
julia> dot(vec(x),vec(y))
10
Now, you can write x⋅y instead of dot(x,y).
To write the ⋅ symbol, type \cdot followed by the TAB key.
If the first argument is complex, it is conjugated.
Now, dot() and ⋅ also work for matrices.
Since version 1.0, you need
using LinearAlgebra
before you use the dot product function or operator.

Multidimensional Array Comprehension in Julia

I'm mucking about with Julia and can't seem to get multidimensional array comprehensions to work. I'm using a nightly build of 0.20-pre for OSX; this could conceivably be a bug in the build. I suspect, however, it's a bug in the user.
Lets say I want to wind up with something like:
5x2 Array
1 6
2 7
3 8
4 9
5 10
And I don't want to just call reshape. From what I can tell, a multidimensional array should be generated something like: [(x, y) for x in 1:5, y in 6:10]. But this generates a 5x5 Array of tuples:
julia> [(x, y) for x in 1:5, y in 6:10]
5x5 Array{(Int64,Int64),2}:
(1,6) (1,7) (1,8) (1,9) (1,10)
(2,6) (2,7) (2,8) (2,9) (2,10)
(3,6) (3,7) (3,8) (3,9) (3,10)
(4,6) (4,7) (4,8) (4,9) (4,10)
(5,6) (5,7) (5,8) (5,9) (5,10)
Or, maybe I want to generate a set of values and a boolean code for each:
5x2 Array
1 false
2 false
3 false
4 false
5 false
Again, I can only seem to create an array of tuples with {(x, y) for x in 1:5, y=false}. If I remove the parens around x, y I get ERROR: syntax: missing separator in array expression. If I wrap x, y in something, I always get output of that kind -- Array, Array{Any}, or Tuple.
My guess: there's something I just don't get here. Anybody willing to help me understand what?
I don't think a comprehension is appropriate for what you're trying to do. The reason can be found in the Array Comprehension section of the Julia Manual:
A = [ F(x,y,...) for x=rx, y=ry, ... ]
The meaning of this form is that F(x,y,...) is evaluated with the variables x, y, etc. taking on each value in their given list of values. Values can be specified as any iterable object, but will commonly be ranges like 1:n or 2:(n-1), or explicit arrays of values like [1.2, 3.4, 5.7]. The result is an N-d dense array with dimensions that are the concatenation of the dimensions of the variable ranges rx, ry, etc. and each F(x,y,...) evaluation returns a scalar.
A caveat here is that if you set one of the variables to a >1 dimensional Array, it seems to get flattened first; so the statement that the "the result is... an array with dimensions that are the concatenation of the dimensions of the variable ranges rx, ry, etc" is not really accurate, since if rx is 2x2 and ry is 3, then you will not get a 2x2x3 result but rather a 4x3. But the result you're getting should make sense in light of the above: you are returning a tuple, so that's what goes in the Array cell. There is no automatic expansion of the returned tuple into the row of an Array.
If you want to get a 5x2 Array from a comprhension, you'll need to make sure x has a length of 5 and y has a length of 2. Then each cell would contain the result of the function evaluated with each possible pairing of elements from x and y as arguments. The thing is that the values in the cells of your example Arrays don't really require evaluating a function of two arguments. Rather what you're trying to do is just to stick two predetermined columns together into a 2D array. For that, use hcat or a literal:
hcat(1:5, 6:10)
[ 1:5 5:10 ]
hcat(1:5, falses(5))
[ 1:5 falses(5) ]
If you wanted to create a 2D Array where column 2 contained the result of a function evaluated on column 1, you could do this with a comprehension like so:
f(x) = x + 5
[ y ? f(x) : x for x=1:5, y=(false,true) ]
But this is a little confusing and it seems more intuitive to me to just do
x = 1:5
hcat( x, map(f,x) )
I think you are just reading the list comprehension wrong
julia> [x+5y for x in 1:5, y in 0:1]
5x2 Array{Int64,2}:
1 6
2 7
3 8
4 9
5 10
When you use them in multiple dimensions you get two variables and need a function for the cell values based on the coordinates
For your second question I think that you should reconsider your requirements. Julia uses typed arrays for performance and storing different types in different columns is possible. To get an untyped array you can use {} instead of [], but I think the better solution is to have an array of tuples (Int, Bool) or even better just use two arrays (one for the ints and one for the bool).
julia> [(i,false) for i in 1:5]
5-element Array{(Int64,Bool),1}:
(1,false)
(2,false)
(3,false)
(4,false)
(5,false)
I kind of like the answer #fawr gave for the efficiency of the datatypes while retaining mutability, but this quickly gets you what you asked for (working off of Shawn's answer):
hcat(1:5,6:10)
hcat({i for i=1:5},falses(5))
The cell-array comprehension in the second part forces the datatype to be Any instead of IntXX
This also works:
hcat(1:5,{i for i in falses(5)})
I haven't found another way to explicitly convert an array to type Any besides the comprehension.
Your intuition was to write [(x, y) for x in 1:5, y in 6:10], but what you need is to wrap the ranges in zip, like this:
[i for i in zip(1:5, 6:10)]
Which gives you something very close to what you need, namely:
5-element Array{(Int64,Int64),1}:
(1,6)
(2,7)
(3,8)
(4,9)
(5,10)
To get exactly what you're looking for, you'll need:
hcat([[i...] for i in zip(1:5, 6:10)]...)'
This gives you:
5x2 Array{Int64,2}:
1 6
2 7
3 8
4 9
5 10
This is another (albeit convoluted) way:
x1 = 1
x2 = 5
y1 = 6
y2 = 10
x = [x for x in x1:x2, y in y1:y2]
y = [y for x in x1:x2, y in y1:y2]
xy = cat(2,x[:],y[:])
As #ivarne noted
[{x,false} for x in 1:5]
would work and give you something mutable
I found a way to produce numerical multidimensional arrays via vcat and the splat operator:
R = [ [x y] for x in 1:3, y in 4:6 ] # make the list of rows
A = vcat(R...) # make n-dim. array from the row list
Then R will be a 3x3 Array{Array{Int64,2},2} while A is a 9x2 Array{Int64,2}, as you want.
For the second case (a set of values and a Boolean code for each), one can do something like
R = [[x y > 5] for x in 1:3, y in 4:6] # condition is y > 5
A = vcat(R...)
where A will be a 9x2 Array{Int64,2}, where true/false is denote by 1/0.
I have tested those in Julia 0.4.7.

Resources