Set operations (union, intersection) on Swift array? - functional-programming

Are there any standard library calls I can use to either perform set operations on two arrays, or implement such logic myself (ideally as functionally and also efficiently as possible)?

Yes, Swift has the Set class.
let array1 = ["a", "b", "c"]
let array2 = ["a", "b", "d"]
let set1:Set<String> = Set(array1)
let set2:Set<String> = Set(array2)
Swift 3.0+ can do operations on sets as:
firstSet.union(secondSet)// Union of two sets
firstSet.intersection(secondSet)// Intersection of two sets
firstSet.symmetricDifference(secondSet)// exclusiveOr
Swift 2.0 can calculate on array arguments:
set1.union(array2) // {"a", "b", "c", "d"}
set1.intersect(array2) // {"a", "b"}
set1.subtract(array2) // {"c"}
set1.exclusiveOr(array2) // {"c", "d"}
Swift 1.2+ can calculate on sets:
set1.union(set2) // {"a", "b", "c", "d"}
set1.intersect(set2) // {"a", "b"}
set1.subtract(set2) // {"c"}
set1.exclusiveOr(set2) // {"c", "d"}
If you're using custom structs, you need to implement Hashable.
Thanks to Michael Stern in the comments for the Swift 2.0 update.
Thanks to Amjad Husseini in the comments for the Hashable info.

Swift Set operations
Example
let a: Set = ["A", "B"]
let b: Set = ["B", "C"]
union of A and B a.union(b)
let result = a.union(b)
var a2 = a
a2.formUnion(b)
//["A", "B", "C"]
symmetric difference of A and B a.symmetricDifference(b)
let result = a.symmetricDifference(b)
//["A", "C"]
difference A \ B a.subtracting(b)
let result = a.subtracting(b)
//["A"]
intersection of A and B a.intersection(b)
let result = a.intersection(b)
//["B"]
Please note that result order depends on hash
[Swift Set]

The most efficient method I know is by using godel numbers. Google for godel encoding.
The idea is so. Suppose you have N possible numbers and need to make sets of them. For example, N=100,000 and want to make sets like {1,2,3}, {5, 88, 19000} etc.
The idea is to keep the list of N prime numbers in memory and for a given set {a, b, c, ...} you encode it as
prime[a]*prime[b]*prime[c]*...
So you encode a set as a BigNumber. The operations with BigNumbers, despite the fact that they are slower than operations with Integers are still very fast.
To unite 2 sets A, B, you take
UNITE(A, B) = lcm(a, b)
lowest-common-multiple of A and B as A and B are sets and both numbers.
To make the intersection you take
INTERSECT(A, B) = gcd (a, b)
greatest common divisor.
and so on.
This encoding is called godelization, you can google for more, all the language of arithmetics written using the logic of Frege can be encoded using numbers in this way.
To get the operation is-member? it is very simple --
ISMEMBER(x, S) = remainder(s,x)==0
To get the cardinal it's a little more complicated --
CARDINAL(S) = # of prime factors in s
you decompose the number S representing the set in product of prime factors and add their exponents. In case the set does not allow duplicates you will have all exponents 1.

There aren't any standard library calls, but you may want to look at the ExSwift library. It includes a bunch of new functions on Arrays including difference, intersection and union.

You may want to follow same pattern as in Objective-C, which also lacks such operations, but there is a simple workaround:
how to intersect two arrays in objective C?

Related

How to update all elements of a nested map in elixir

I'm trying to implement in Elixir some of the maze generation algorithms from the excellent book Mazes for Programmers by Jamis Buck. In imperative languages like Go or V it's a piece of cake but with Elixir I'm stuck.
A maze is a grid of cells. A cell holds information about in which direction we can move. It is represented as a struct with boolean members (north: true or east: false, etc.). A grid is a map where keys are tuples {col, row} and values are Cells. If mz is a maze, mz.grid[{0, 0}] is the cell located at the upper left corner.
One of the basic operation is to open a path from one cell c1 to another c2 and most of the time, if we can go from c1 to c2, we can also go from c2 to c1 which means this operation modifies both cells. To implement this, I have a function open_to(maze, x, y, direction) which returns a tuple of two cells c1_new and c2_new where the direction information in each cell had been changed. Then I can update the grid with Enum.put(maze.grid, {x, y}, c1_new). Same for c2_new.
One of the most simple algorithm, the binary tree algorithm, needs to visit all cells one by one and open a bidirectional link with one of the neighbors. Bidirectional means that both cells need to be updated and the second cell may be visited only later. I'm stuck at this step as I can't find how to update the grid with the cells returned by open_to(). My Elixir pseudo code is as follows:
def generate(mz) do
Enum.map(mz.grid, fn({{x, y}, c1}) ->
neighbors = [Grid.cell_to(mz, x, y, :north), Grid.cell_to(mz, x, y, :east)]
c2_dir = select_the_neighbor(neighbors) # returns one of :north, :east or nil
# Here! open_to returns the updated cells but what to do with them?
{c1_new, c2_new} = if c2_dir != nil, do: Grid.open_to(mz, x, y, c2_dir)
end)
end
I believe the issue comes from the data structure I've chosen and from the way I go through it, but I can't find another way. Any help is appreciated
If I'm understanding the question it's "how can each step in Enum.map/2 update the maze and have that visible to each other and the final result?".
Where data structures in Elixir are immutable, you don't change the data another variable points to.
As a simple example, putting a key/value pair into a map creates an entirely new map:
iex(1)> map = %{a: 3}
%{a: 3}
iex(2)> Map.put(map, :a, 4)
%{a: 4}
iex(3)> map
%{a: 3}
In a similar fashion, Enum.map/2 isn't intended for modifying anything except the value currently being operated on (and even then only in the new list, not the original). If you want to update some value based on each cell you may be looking for Enum.reduce/3. It enumerates things like Enum.map/2, but it takes a value or "accumulator". The reducer function you pass in is called with an item and the accumulator and ought to return the updated value of the accumulator. Then the final value of that accumulator is what is returned from Enum.reduce/3.
So your pseudo code might look something like this:
def generate(mz) do
# don't use the c1 from enumerating the grid--it could be out of date
# you could also just enumerate the grid coordinates:
# - for x <- 0..width-1, y <- 0..height-1, do: {x, y}
# - Map.keys(mz.grid)
Enum.reduce(mz.grid, mz, fn {{x, y}, _dont_use_this_c1}, mz ->
neighbors = [Grid.cell_to(mz, x, y, :north), Grid.cell_to(mz, x, y, :east)]
if c2_dir = select_the_neighbor(neighbors) do
{c1_new, c2_new} = Grid.open_to(mz, x, y, c2_dir)
mz
|> Map.put({x, y}, c1_new)
|> Map.put(find_x_y(x, y, c2_dir), c2_new)
else
# you have to return a maze, or other iterations will try adding cells to nil
mz
end
end)
end

Check equality of a value in all MPI ranks

Say I have some int x. I want to check if all MPI ranks get the same value for x. What's a good way to achieve this using MPI collectives?
The simplest I could think of is, broadcast rank0's x, do the comparison, and allreduce-logical-and the comparison result. This requires two collective operations.
...
x = ...
x_bcast = comm.bcast(x, root=0)
all_equal = comm.allreduce(x==x_bcast, op=MPI.LAND)
if not all_equal:
raise Exception()
...
Is there a better way to do this?
UPDATE:
From the OpenMPI user list, I received the following response. And I think it's quite a nifty trick!
A pattern I have seen in several places is to allreduce the pair p =
{-x,x} with MPI_MIN or MPI_MAX. If in the resulting pair p[0] == -p[1],
then everyone has the same value. If not, at least one rank had a
different value. Example:
bool is_same(int x) {
int p[2];
p[0] = -x;
p[1] = x;
MPI_Allreduce(MPI_IN_PLACE, p, 2, MPI_INT, MPI_MIN, MPI_COMM_WORLD);
return (p[0] == -p[1]);
}
Solutions based on logical operators assume that you can convert between integers and logicals without any data loss. I think that's dangerous. You could do a bitwise AND where you make sure you use all the bytes of your int/real/whatever.
You could do two reductions: one max and one min, and see if they give the same result.
You could also write your own reduction operator: operate on two ints, and do a max on the first, min on the second. Then test if the two are the same.

Julia: Apply 1 dimensional Julia function to multi-dimensional array

I'm a "write Fortran in all languages" kind of person trying to learn modern programming practices. I have a one dimensional function ft(lx)=HT(x,f(x),lx), where x, and f(x) are one dimensional arrays of size nx, and lx is the size of output array ft. I want to apply HT on a multidimensional array f(x,y,z).
Basically I want to apply HT on all three dimensions to go from f(x,y,z) defined on (nx,ny,nz) dimensional grid, to ft(lx,ly,lz) defined on (lx,ly,lz) dimensional grid:
ft(lx,y,z) = HT(x,f(x,y,z) ,lx)
ft(lx,ly,z) = HT(y,ft(lx,y,z) ,ly)
ft(lx,ly,lz) = HT(z,ft(lx,ly,z),lz)
In f95 style I would tend to write something like:
FTx=zeros((lx,ny,nz))
for k=1:nz
for j=1:ny
FTx[:,j,k]=HT(x,f[:,j,k],lx)
end
end
FTxy=zeros((lx,ly,nz))
for k=1:nz
for i=1:lx
FTxy[i,:,k]=HT(y,FTx[i,:,k],ly)
end
end
FTxyz=zeros((lx,ly,lz))
for j=1:ly
for i=1:lx
FTxyz[i,j,:]=HT(z,FTxy[i,j,:],lz)
end
end
I know idiomatic Julia would require using something like mapslices. I was not able to understand how to go about doing this from the mapslices documentation.
So my question is: what would be the idiomatic Julia code, along with proper type declarations, equivalent to the Fortran style version?
A follow up sub-question would be: Is it possible to write a function
FT = HTnD((Tuple of x,y,z etc.),f(x,y,z), (Tuple of lx,ly,lz etc.))
that works with arbitrary dimensions? I.e. it would automatically adjust computation for 1,2,3 dimensions based on the sizes of input tuples and function?
I have a piece of code here which is fairly close to what you want. The key tool is Base.Cartesian.#nexprs which you can read up on in the linked documentation.
The three essential lines in my code are Lines 30 to 32. Here is a verbal description of what they do.
Line 30: reshape an n1 x n2 x ... nN-sized array C_{k-1} into an n1 x prod(n2,...,nN) matrix tmp_k.
Line 31: Apply the function B[k] to each column of tmp_k. In my code, there are some indirections here since I want to allow for B[k] to be a matrix or a function, but the basic idea is as described above. This is the part where you would want to bring in your HT function.
Line 32: Reshape tmp_k back into an N-dimensional array and circularly permute the dimensions such that the second dimension of tmp_k ends up as the first dimension of C_k. This makes sure that the next iteration of the "loop" implied by #nexprs operates on the second dimension of the original array, and so on.
As you can see, my code avoids forming slices along arbitrary dimensions by permuting such that we only ever need to slice along the first dimension. This makes programming much easier, and it can also have some performance benefits. For example, computing the matrix-vector products B * C[i1,:,i3] for all i1,i3can be done easily and very efficiently by moving the second dimension of C into the first position of tmp and using gemm to compute B * tmp. Doing the same efficiently without the permutation would be much harder.
Following #gTcV's code, your function would look like:
using Base.Cartesian
ht(x,F,d) = mapslices(f -> HT(x, f, d), F, dims = 1)
#generated function HTnD(
xx::NTuple{N,Any},
F::AbstractArray{<:Any,N},
newdims::NTuple{N,Int}
) where {N}
quote
F_0 = F
Base.Cartesian.#nexprs $N k->begin
tmp_k = reshape(F_{k-1},(size(F_{k-1},1),prod(Base.tail(size(F_{k-1})))))
tmp_k = ht(xx[k], tmp_k, newdims[k])
F_k = Array(reshape(permutedims(tmp_k),(Base.tail(size(F_{k-1}))...,size(tmp_k,1))))
# https://github.com/JuliaLang/julia/issues/30988
end
return $(Symbol("F_",N))
end
end
A simpler version, which shows the usage of mapslices would look like this
function simpleHTnD(
xx::NTuple{N,Any},
F::AbstractArray{<:Any,N},
newdims::NTuple{N,Int}
) where {N}
for k = 1:N
F = mapslices(f -> HT(xx[k], f, newdims[k]), F, dims = k)
end
return F
end
you could even use foldl if you are a friend of one-liners ;-)
fold_HTnD(xx, F, newdims) = foldl((F, k) -> mapslices(f -> HT(xx[k], f, newdims[k]), F, dims = k), 1:length(xx), init = F)

Delete all duplicated elements in a vector in Julia 1.1

I am trying to write a code which deletes all repeated elements in a Vector. How do I do this?
I already tried using unique and union but they both delete all the repeated items but 1. I want all to be deleted.
For example: let x = [1,2,3,4,1,6,2]. Using union or unique returns [1,2,3,4,6]. What I want as my result is [3,4,6].
There are lots of ways to go about this. One approach that is fairly straightforward and probably reasonably fast is to use countmap from StatsBase:
using StatsBase
function f1(x)
d = countmap(x)
return [ key for (key, val) in d if val == 1 ]
end
or as a one-liner:
[ key for (key, val) in countmap(x) if val == 1 ]
countmap creates a dictionary mapping each unique value from x to the number of times it occurs in x. The solution can then be easily found by extracting every key from the dictionary that maps to val of 1, ie all elements of x that occur precisely once.
It might be faster in some situations to use sort!(x) and then construct an index for the elements of the sorted x that only occur once, but this will be messier to code, and also the output will be in sorted order, which you may not want. The countmap method preserves the original ordering.

Choosing an arbitrary dimension to filter over?

In Julia, is there a good way to "choose to loop over an arbitrary dimension" d? For example, I want to apply a diffusion filter to a 2D x I want to do
for j = 1:size(x,2)
for i = 2:size(x,1)-1
x2[i,j] = x[i-1,j] - 2x[i,j] + x[i+1,j]
end
end
But I want to write a function diffFilter(x2,x,d) where x can be an arbitrary dimension array and d is any dimension less than ndims(x), and it applies this x[i-1] + 2x[i] - x[i+1] filter along the dimension d (into x2 without allocating). Any idea how to do the indexing such that I can use that d to have that special part of the loop be the dth index?
You'll want to look at the pair of blog posts that Tim Holy has written on the subject:
http://julialang.org/blog/2016/02/iteration
http://julialang.org/blog/2016/03/arrays-iteration
That should give you a start on the subject.
The standard library function mapslices does this. You can write a function that applies the filter to a vector, and mapslices will take care of applying it to a particular dimension.

Resources