Simplify Julia tuple creation - julia

i need to convert
2-element Vector{Matrix{Tuple{Real, Real}}}:
[(1, 2) (1.8, 2.1) (3, 2)]
[(1, 3) (2.2, 2.9) (3, 3)]
into
(Vector{Real}[[1, 1.8, 3], [1, 2.2, 3]], Vector{Real}[[2, 2.1, 2], [3, 2.9, 3]])
For that my naive approach is
a=[[(1,2) (1.8,2.1) (3,2)],[(1,3) (2.2,2.9) (3,3)]]
b=([first.(s)|>vec for s in a],[last.(s)|>vec for s in a])
Is there a way to write expressions like this simpler (i.e. without repeating most of the [somefunc.(s)|>vec for s in a] expression)?
Is this an efficient solution for a containing >1e6 elements in both vectors?

To get the first element in the tuple you want a way could be
map(i -> reshape(getindex.(i, 1), :), v)
So then
Tuple(map(i -> reshape(getindex.(i,j),:), v) for j in 1:2)
Should give you the output you want, I am not sure this is too much nicer.
Performance is the same but I can run it within a second for 1.e6 in my laptop so it should be okay unless you have really large vectors.
EDIT: I had written 8 in place of 6

Related

Get only elements of one array that are in another array

I'm learning Julia coming from Python. I want to get the elements of an array b such that each element is in array a. My attempt in Julia is shown after doing what I need in python. My question is this: is there a better/faster way to do this in Julia? I'm suspicious about the simplicity of what I've written in Julia, and I worry that such a naive looking solution might have suboptimal performance (again coming from Python).
Python:
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([7, 8, 2, 3, 5])
indices_b_in_a = np.nonzero(np.isin(b, a))
b_in_a = b[indices_b_in_a]
# array([2, 3])
Julia:
a = [1, 2, 3, 4];
b = [7, 8, 2, 3, 5];
indices_b_in_a = findall(ele -> ele in a, b);
b_in_a = b[indices_b_in_a];
#2-element Vector{Int64}:
# 2
# 3
Maybe this would be a helpful answer:
julia> intersect(Set(a), Set(b))
Set{Int64} with 2 elements:
2
3
# Or even
julia> intersect(a, b)
2-element Vector{Int64}:
2
3
Note that if you had repetitive numbers, this method fails to exactly replicate your expected behavior since I'm working on unique values here! If you have repetitive elements, there should replace an element-by-element approach for searching! in that case, using binary search would be a good choice.
Another approach is using broadcasting in Julia:
julia> a = rand(1:100, 1000);
b = rand(1:3000, 5000);
julia> b[in.(b, Ref(a))]
161-element Vector{Int64}:
8
5
70
73
⋮
# Exactly the same approach with a slightly different syntax
julia> b[b.∈Ref(a)]
161-element Vector{Int64}:
8
5
70
73
30
63
73
⋮
Q: What is the role of Ref in the above code block?
Ans: By wrapping a in Ref, I make a Reference of a and prevent the compiler from iterating through a as well within the broadcasting procedure. Otherwise, it would try to iterate on the elements of a and b simultaneously which is not the right solution (even if both objects hold the same length).
However, Julia's syntax is specific (typically), but it's not that complicated. I said this because you mentioned:
I worry that such a naive looking solution...
Last but not least, do not forget to wrap your code in a function if you want to obtain a good performance in Julia.
Another approach using array comprehensions.
julia> [i for i in a for j in b if i == j]
2-element Vector{Int64}:
2
3

How to calculate Euclidean distance between a tuple and each tuple in a Vector using map in Julia?

I want to calculate the Euclidean distance between a tuple and each tuple within a Vector in Julia using the map function, like below (but I get two values instead of three):
julia> tups = [
(1, 3),
(11, 2),
(0, 1)
];
julia> map((x, y) -> √(sum((x.-y).^2)), tups, (3, 3))
2-element Vector{Float64}:
2.0
8.06225774829855
How can I make it work correctly?
Julia has the Distances package especially for these types of calculations. The 'Julian way' encourages interoperability between packages to allow benefitting from future development of the ecosystem. For example, new metric definitions, or specialized hardware code to compute distances.
For the problem in the post, the code would look:
julia> using Distances
julia> tups = [
(1, 3),
(11, 2),
(0, 1)
];
julia> euclidean.(tups,Ref((3,3)))
3-element Vector{Float64}:
2.0
8.06225774829855
3.605551275463989
Notice the use of broadcasting instead of map with dot syntax euclidean.. The Ref((3,3)) causes broadcasting to consider (3,3) as a single element to broadcast and not break it to a pair of Ints.
The code you've written is pretty equal to this:
[
func((1, 3), 3),
func((11, 2), 3)
]
The map function iterates over the given collections iter times equal to the lowest length:
julia> length((3, 3)), length(tups)
(2, 3)
So it iterates two times, not three. To make that work, you can repeat the (3, 3), three times or even omit the (3, 3) argument:
julia> map((x, y) -> √(sum((x.-y).^2)), tups, ((3, 3), (3, 3), (3, 3)))
3-element Vector{Float64}:
2.0
8.06225774829855
3.605551275463989
# OR
julia> map((x, y) -> √(sum((x.-y).^2)), tups, ((3, 3) for _∈1:3))
3-element Vector{Float64}:
2.0
8.06225774829855
3.605551275463989
# Or omit the last argument
julia> map(arg -> √((3 - arg[1])^2 + (3 - arg[2])^2), tups)
3-element Vector{Float64}:
2.0
8.06225774829855
3.605551275463989

Julia: How to pivot a vector of vectors into vector of tuples, similar to zipping its elements

I have a vector containing an unknown number of vectors, which all have the same length. For example, like this:
julia> a
4-element Vector{Vector{Int64}}:
[1, 5]
[2, 6]
[3, 7]
[4, 8]
And I want to pivot it into a vector of tuples, exactly the way you would with zip(), like this:
julia> collect(zip(a...))
2-element Vector{NTuple{4, Int64}}:
(1, 2, 3, 4)
(5, 6, 7, 8)
However, I want to do this without using the splat (...) in there, since the splat will end up happening at compile-time if I understand correctly. I want something like reduce(zip, a), but that's obviously not right:
julia> collect(reduce(zip, a))
2-element Vector{Tuple{Tuple{Tuple{Int64, Int64}, Int64}, Int64}}:
(((1, 2), 3), 4)
(((5, 6), 7), 8)
So I guess I have two questions:
Am I correct that this use of splat is not performant?
What is the right way to do this operation performantly?
Thank you!
For reference, here is a performance characterization of the current approach. It actually appears surprisingly performant (though maybe it would be worse in-situ?):
julia> a = Any[collect(1:1_000_000), [2.0 for _ in 1:1_000_000]]
2-element Vector{Any}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 999991, 999992, 999993, 999994, 999995, 999996, 999997, 999998, 999999, 1000000]
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0 … 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
julia> #time collect(zip(a...))
0.003288 seconds (4 allocations: 15.259 MiB)
1000000-element Vector{Tuple{Int64, Float64}}:
(1, 2.0)
(2, 2.0)
(3, 2.0)
(4, 2.0)
(5, 2.0)
(6, 2.0)
⋮
(999996, 2.0)
(999997, 2.0)
(999998, 2.0)
(999999, 2.0)
(1000000, 2.0)
You can use the invert function from SplitApplyCombine.jl
julia> invert(a)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4]
[5, 6, 7, 8]
julia> Tuple.(invert(a))
2-element Vector{NTuple{4, Int64}}:
(1, 2, 3, 4)
(5, 6, 7, 8)
The second version broadcasts Tuple over the result to get a vector of tuples instead of a vector of vectors. The first version is the faster among the two, but both versions are faster than the collect(zip(...)) method in my benchmarks.

How to interate over two or more vectors or tuples in julia?

Can we iterate over two or more vectors or tuples in julia?
julia> c=Tuple(x for x in a, b)
The above code does not work but shows what i want to do. I need to iterate over both a and b one after other.
Suppose,
julia> a=(1,2)
julia> b=(3,4)
and I want c to be:
julia> c=(1,2,3,4)
Use:
julia> c = Tuple(Iterators.flatten((a, b)))
(1, 2, 3, 4)
to get a Tuple as you requested. But if you are OK with a lazy iterator then just Iterators.flatten((a, b)) is enough.
Very short version:
julia> a=(1,2)
julia> b=(3,4)
julia> c = (a..., b...)
(1, 2, 3, 4)

Julia: All possible sums of `n` entries of a Vector with unique integers, (with repetition)

Let's say I have a vector of unique integers, for example [1, 2, 6, 4] (sorting doesn't really matter).
Given some n, I want to get all possible values of summing n elements of the set, including summing an element with itself. It is important that the list I get is exhaustive.
For example, for n = 1 I get the original set.
For n = 2 I should get all values of summing 1 with all other elements, 2 with all others etc. Some kind of memory is also required, in the sense that I have to know from which entries of the original set did the sum I am facing come from.
For a given, specific n, I know how to solve the problem. I want a concise way of being able to solve it for any n.
EDIT: This question is for Julia 0.7 and above...
This is a typical task where you can use a dictionary in a recursive function (I am annotating types for clarity):
function nsum!(x::Vector{Int}, n::Int, d=Dict{Int,Set{Vector{Int}}},
prefix::Vector{Int}=Int[])
if n == 1
for v in x
seq = [prefix; v]
s = sum(seq)
if haskey(d, s)
push!(d[s], sort!(seq))
else
d[s] = Set([sort!(seq)])
end
end
else
for v in x
nsum!(x, n-1, d, [prefix; v])
end
end
end
function genres(x::Vector{Int}, n::Int)
n < 1 && error("n must be positive")
d = Dict{Int, Set{Vector{Int}}}()
nsum!(x, n, d)
d
end
Now you can use it e.g.
julia> genres([1, 2, 4, 6], 3)
Dict{Int64,Set{Array{Int64,1}}} with 14 entries:
16 => Set(Array{Int64,1}[[4, 6, 6]])
11 => Set(Array{Int64,1}[[1, 4, 6]])
7 => Set(Array{Int64,1}[[1, 2, 4]])
9 => Set(Array{Int64,1}[[1, 4, 4], [1, 2, 6]])
10 => Set(Array{Int64,1}[[2, 4, 4], [2, 2, 6]])
8 => Set(Array{Int64,1}[[2, 2, 4], [1, 1, 6]])
6 => Set(Array{Int64,1}[[2, 2, 2], [1, 1, 4]])
4 => Set(Array{Int64,1}[[1, 1, 2]])
3 => Set(Array{Int64,1}[[1, 1, 1]])
5 => Set(Array{Int64,1}[[1, 2, 2]])
13 => Set(Array{Int64,1}[[1, 6, 6]])
14 => Set(Array{Int64,1}[[4, 4, 6], [2, 6, 6]])
12 => Set(Array{Int64,1}[[4, 4, 4], [2, 4, 6]])
18 => Set(Array{Int64,1}[[6, 6, 6]])
EDIT: In the code I use sort! and Set to avoid duplicate entries (remove them if you want duplicates). Also you could keep track how far in the index on vector x in the loop you reached in outer recursive calls to avoid generating duplicates at all, which would speed up the procedure.
I want a concise way of being able to solve it for any n.
Here is a concise solution using IterTools.jl:
Julia 0.6
using IterTools
n = 3
summands = [1, 2, 6, 4]
myresult = map(x -> (sum(x), x), reduce((x1, x2) -> vcat(x1, collect(product(fill(summands, x2)...))), [], 1:n))
(IterTools.jl is required for product())
Julia 0.7
using Iterators
n = 3
summands = [1, 2, 6, 4]
map(x -> (sum(x), x), reduce((x1, x2) -> vcat(x1, vec(collect(product(fill(summands, x2)...)))), 1:n; init = Vector{Tuple{Int, NTuple{n, Int}}}[]))
(In Julia 0.7, the parameter position of the neutral element changed from 2nd to 3rd argument.)
How does this work?
Let's indent the one-liner (using the Julia 0.6 version, the idea is the same for the Julia 0.7 version):
map(
# Map the possible combinations of `1:n` entries of `summands` to a tuple containing their sum and the summands used.
x -> (sum(x), x),
# Generate all possible combinations of `1:n`summands of `summands`.
reduce(
# Concatenate previously generated combinations with the new ones
(x1, x2) -> vcat(
x1,
vec(
collect(
# Cartesian product of all arguments.
product(
# Use `summands` for `x2` arguments.
fill(
summands,
x2)...)))),
# Specify for what lengths we want to generate combinations.
1:n;
# Neutral element (empty array).
init = Vector{Tuple{Int, NTuple{n, Int}}}[]))
Julia 0.6
This is really just to get a free critique from the experts as to why my method is inferior to theirs!
using Combinatorics, BenchmarkTools
function nsum(a::Vector{Int}, n::Int)::Vector{Tuple{Int, Vector{Int}}}
r = Vector{Tuple{Int, Vector{Int}}}()
s = with_replacement_combinations(a, n)
for i in s
push!(r, (sum(i), i))
end
return sort!(r, by = x -> x[1])
end
#btime nsum([1, 2, 6, 4], 3)
It runs in circa 4.154 μs on my 1.8 GHz processor for n = 3. It produces a sorted array showing the sum (which may appear more than once) and how it is made up (which is unique to each instance of the sum).

Resources