Channel: communication between different cores - julia

I would like to have a Channel between two functions running on two different cores. The following code does not work:
c1=Channel(32)
#spawnat 2 put!(c1,1)
#spawnat 3 println(c1)
the println tells me that the channel c1 on core 3 is empty. I get no error. Somehow the function on core 3 sees a different channel c1 than core 2.

Channel is for communication between co-routines (aka Green Threading).
For distributed computing normally you use remotecall - see this example from Julia manual:
$ ./julia -p 2
julia> r = remotecall(rand, 2, 2, 2)
Future(2, 1, 4, nothing)
julia> s = #spawnat 2 1 .+ fetch(r)
Future(2, 1, 5, nothing)
julia> fetch(s)
2×2 Array{Float64,2}:
1.18526 1.50912
1.16296 1.60607
Depending on your actual scenario you should have a look at the following libraries for distributed computing with Julia:
SharedArrays - shared memory across processed on the same host
DistributedArrays.jl - shared data across different processes on different hosts
ParallelDataTransfer.jl - sending the data between processes.

This works
const c1=RemoteChannel(()->Channel{Int}(32));
#spawnat 2 put!(c1,1)
#spawnat 3 println(take!(c1))
Found here

Related

append! Vs push! in Julia

In Julia, you can permanently append elements to an existing vector using append! or push!. For example:
julia> vec = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> push!(vec, 4,5)
5-element Vector{Int64}:
1
2
3
4
5
# or
julia> append!(vec, 4,5)
7-element Vector{Int64}:
1
2
3
4
5
But, what is the difference between append! and push!? According to the official doc it's recommended to:
"Use push! to add individual items to a collection which are not already themselves in another collection. The result
of the preceding example is equivalent to push!([1, 2, 3], 4, 5, 6)."
So this is the main difference between these two functions! But, in the example above, I appended individual elements to an existing vector using append!. So why do they recommend using push! in these cases?
append!(v, x) will iterate x, and essentially push! the elements of x to v. push!(v, x) will take x as a whole and add it at the end of v. In your example there is no difference, since in Julia you can iterate a number (it behaves like an iterator with length 1). Here is a better example illustrating the difference:
julia> v = Any[]; # Intentionally using Any just to show the difference
julia> x = [1, 2, 3]; y = [4, 5, 6];
julia> push!(v, x, y);
julia> append!(v, x, y);
julia> v
8-element Vector{Any}:
[1, 2, 3]
[4, 5, 6]
1
2
3
4
5
6
In this example, when using push!, x and y becomes elements of v, but when using append! the elements of x and y become elements of v.
Since Julia is still in its early phase it'd be best if you follow the community standards, and one of the community standards, is your code making "sense" to other developers at first sight - I should know what your "intents" are immediately I read your code.
About append!, the doc says:
"For an ordered container collection, add the elements of each
collections to the end of it. !!! compat "Julia 1.6" Specifying
multiple collections to be appended requires at least Julia 1.6."
The append! method was added and requires Julia 1.6 to use for multiple collections; so in a sense it is the method that's going to be used in the future as Julia gets adopted a lot, Python uses it too, so adopters from there would likely use it too.
About push!, the doc says:
"Insert one or more items in collection. If collection is an ordered
container, the items are inserted at the end (in the given order). If
collection is ordered, use append! to add all the elements of another
collection to it."
The doc advises you use "append!" over "push!" when your collection is ordered. So as a Julia user, if I see append! on your code, I should know the collections its making changes on is in some way "ordered". That's just it. Otherwise, push! and append! does same things(something that might change in the future), but please follow community standards, it will help.
So use append! when you care about order and use push! when order doesn't matter in your collections. This way, any Julia user reading your code, will know your intents right away; but please don't mix them up

Writing a chunk of MPI distributed data via hdf5 in fortran

I have a 3d array distributed into different MPI processes:
real :: DATA(i1:i2, j1:j2, k1:k2)
where i1, i2, ... are different for each MPI process, but the MPI grid is cartesian.
For simplicity let's assume I have a 120 x 120 x 120 array, and 27 MPI processes distributed as 3 x 3 x 3 (so that each processor has an array of size 40 x 40 x 40).
Using hdf5 library I need to write only a slice of that data, say, a slice that goes through the middle perpendicular to the second axis. The resulting (global) array would be of size 120 x 1 x 120.
I'm a bit confused on how to properly use the hdf5 here, and how to generalize full DATA writing (which I can do). The problem is, not each MPI thread is going to be writing. For instance, in the case above, only 9 processes will have to write something, others (which are on the +/-x and +/-z edges of the cube) will not have to, since they don't contain any chunk of the slab I need.
I tried the chunking technique described here, but it looks like that's just for a single thread.
Would be very grateful if the hdf5 community can help me in this :)
When writing an HDF5 dataset in parallel, all MPI processes must participate in the operation (even if a certain MPI process does not have values to write).
If you are not bound to a specific library, take a look at HDFql. Based on what I could understand from the use-case you have posted, here goes an example on how to write data in parallel in Fortran using HDFql.
PROGRAM Example
! use HDFql module (make sure it can be found by the Fortran compiler)
USE HDFql
! declare variables
REAL(KIND=8), DIMENSION(40, 40, 40) :: values
CHARACTER(2) :: start
INTEGER :: state
INTEGER :: x
INTEGER :: y
INTEGER :: z
! create an HDF5 file named "example.h5" and use (i.e. open) it in parallel
state = hdfql_execute("CREATE AND USE FILE example.h5 IN PARALLEL")
! create a dataset named "dset" of data type double of three dimensions (size 120x120x120)
state = hdfql_execute("CREATE DATASET dset AS DOUBLE(120, 120, 120)");
! populate variable "values" with certain values
DO x = 1, 40
DO y = 1, 40
DO z = 1, 40
values(z, y, x) = hdfql_mpi_get_rank() * 100000 + (x * 1600 + y * 40 + z)
END DO
END DO
END DO
! register variable "values" for subsequent use (by HDFql)
state = hdfql_variable_register(values)
IF (hdfql_mpi_get_rank() < 3) THEN
! insert (i.e. write) values from variable "values" into dataset "dset" using an hyperslab in function of the MPI rank (each rank writes 40x40x40 values)
WRITE(start, "(I0)") hdfql_mpi_get_rank() * 40
state = hdfql_execute("INSERT INTO dset(" // start // ":1:1:40) IN PARALLEL VALUES FROM MEMORY 0")
ELSE
! if MPI rank is equal or greater than 3 nothing is written
state = hdfql_execute("INSERT INTO dset IN PARALLEL NO VALUES")
END IF
END PROGRAM
Please check HDFql reference manual to get additional information on how to work with HDF5 files in parallel (i.e. with MPI) using this library.

How do I access an element in a CuArray of Julia and change its value?

I want to change only one element, as shown in the code below.
using Flux, CuArrays
a = rand(3,3) |> gpu
CuArrays.allowscalar(false)
a[1, 1] = 1.0f0
Because allowscalar is set to false, it is natural that it will appear as below.
ERROR: scalar setindex! is disallowed
But if allowscalar is removed, it will appear as below.
Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with allowscalar(false)
I turned "allowscalar" on and off before and after the part that accesses the element.
Then, it was about 20 times slower than when "allowscalar" was set to true.
Next, I tried to create another matrix once on the CPU and then add up the matrices on the GPU, as shown below.
b = zeros(Float32, 3, 3)
b[1, 1] = 1.0f0
b = b |> gpu
a .+= b
However, it is about 4 times faster if I assume that I can do it on the GPU alone such as below.
a .*= 1.0f0 # Dummy calculations that do some processing on the GPU
a .+= a # Dummy calculations that do some processing on the GPU
How do I access an element in a CuArray and change its value?
I look forward to hearing from you soon.
I turned "allowscalar" on and off before and after the part that accesses the element. Then, it was about 20 times slower than when "allowscalar" was set to true.
Toggling allowscalar should not affect performance. In fact, CuArrays itself does so when it needs to inspect individual elements with certain APIs. A macro version of the function makes it easy to do so:
julia> a = CuArrays.rand(3,3);
julia> CuArrays.allowscalar(false)
julia> a[1, 1] = 1.0f0
ERROR: scalar setindex! is disallowed
julia> CuArrays.#allowscalar a[1, 1] = 1.0f0
1.0f0
julia> a
3×3 CuArray{Float32,2,Nothing}:
1.0 0.277899 0.333898
0.126213 0.0881365 0.794662
0.94518 0.586488 0.656359
julia> a[1, 1] = 1.0f0

How to represent a performant heterogenous stack in Julia

I would like to implement a simple concatenative language (aka Joy or Factor) as a DSL in Julia and I am troubled how to optimally represent the stack.
The stack, which represents both data and program code, should be able to hold a sequence of items of different types. In the simplest case Ints, Symbols and, recursively again, stacks (to represent quoted code). The program will then heavily use push! and pop! to shuffle values between different such stacks.
One obvious implementation in Julia, which works but runs rather slow, is to use cell arrays. For example, the following Joy stack [ 1 [ 1 2 +] i + ] (which evaluates to [4]) can be implemented in Julia as
stack = Any[:+,:i,Any[:+,2,1],1]. My typical code then looks like this:
x = pop!(callstack)
if isa(x,Int)
push!(x,datastack)
elseif isa(x,Symbol)
do_stuff(x,datastack)
end
This, however, runs really slow and uses huge memory allocations, probably because such code is not typestable (which is a big performance bottleneck in Julia).
Using C, I would represent the stack compactly as an array (or alternatively as a linked list) of a union:
typedef union Stackelem{
int val;
char *sym;
union Stackelem *quote;
} Stackelem;
Stackelem stack[n];
But how can I achieve such a compact representation of the heterogeneous stack in Julia, and how I avoid the type instability?
This is one way, another way would be to represent args with type Vector{Any}:
julia> immutable Exp
head::Symbol
args::Tuple
end
julia> q = Exp(:+, (1, Exp(:-, (3, 4))))
Exp(:+,(1,Exp(:-,(3,4))))
edit: Another way to represent it might be:
immutable QuoteExp{T} ; vec::Vector{T} ; end
typealias ExpTyp Union{QuoteExp, Int, Symbol}
typealias Exp QuoteExp{ExpTyp}
and then you can do the following:
julia> x = Exp(ExpTyp[:+, 1, 2])
QuoteExp{Union{Int64,QuoteExp{T},Symbol}}(Union{Int64,QuoteExp{T},Symbol}[:+,1,2])
julia> x.vec[1]
:+
julia> x.vec[2]
1
julia> x.vec[3]
2
julia> push!(x.vec,:Scott)
4-element Array{Union{Int64,QuoteExp{T},Symbol},1}:
:+
1
2
:Scott
julia> x.vec[4]
:Scott

How do you do parallel matrix multiplication in Julia?

Is there a good way to do parallel matrix multiplication in julia? I tried using DArrays, but it was significantly slower than just a single-thread multiplication.
Parallel in what sense? If you mean single-machine, multi-threaded, then Julia does this by default as OpenBLAS (the underlying linear algebra library used) is multithreaded.
If you mean multiple-machine, distributed-computing-style, then you will be encountering a lot of communications overhead that will only be worth it for very large problems, and a customized approach might be needed.
The problem is most likely that direct (maybe single-threaded) matrix-multiplication is normally performed with an optimized library function. In the case of OpenBLAS, this is already multithreaded. For arrays with size 2000x2000, the simple matrixmultiplication
#time c = sa * sb;
results in 0.3 seconds multithreaded and 0.7 seconds singlethreaded.
Splitting of a single dimension in multiplication the times get even worse and reach around 17 seconds in singlethreaded mode.
#time for j = 1:n
sc[:,j] = sa[:,:] * sb[:,j]
end
shared arrays
The solution to your problem might be the use of shared arrays, which share the same data across your processes on a single computer. Please note that shared arrays are still marked as experimental.
# create shared arrays and initialize them with random numbers
sa = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sb = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sc = SharedArray(Float64,(n,n));
Then you have to create a function, which performs a cheap matrix multiplication on a subset of the matrix.
#everywhere function mymatmul!(n,w,sa,sb,sc)
# works only for 4 workers and n divisible by 4
range = 1+(w-2) * div(n,4) : (w-1) * div(n,4)
sc[:,range] = sa[:,:] * sb[:,range]
end
Finally, the main process tells the workers to work on their part.
#time #sync begin
for w in workers()
#async remotecall_wait(w, mymatmul!, n, w, sa, sb, sc)
end
end
which takes around 0.3 seconds which is the same time as the multithreaded single-process time.
It sounds like you're interested in dense matrices, in which case see the other answers. Should you be (or become) interested in sparse matrices, see https://github.com/madeleineudell/ParallelSparseMatMul.jl.

Resources