Merge large number of arrays by common column values in julia - julia

Expanding a previous question I put here before, suppose we have a large number of arrays (say 500 arrays), like the following 3 first ones
5.0 3.5
6.0 3.6
7.0 3.0
5.0 4.5
6.0 4.7
8.0 3.0
5.0 4.0
6.0 3.2
8.0 4.0
and so on, stored in one array, so that we have an array of 500 arrays of the type above. I want to merge the 500 arrays into one array, by common values of the first column, calculating the mean values of the corresponding elements of the second column. The result must be the following array:
5.0 mean of all 5's values
6.0 mean of all 6's values
7.0 mean of all 7's values
8.0 mean of all 8's values
How can I achieve that? Thank you!

Also back with a slight modification of https://stackoverflow.com/a/50842721/2001017
function aggregate(m::Array{<:Array{<:Number,2},1})
result=sortrows(vcat(m...))
n = size(result,1)
if n <= 1
return result
end
key_idx = 1
key = result[key_idx,1]
count = 1
for i in 2:n
if key == result[i,1]
result[key_idx,2:end] += result[i,2:end]
count += 1
else
result[key_idx,2:end] /= count
count = 1
key = result[i,1]
key_idx += 1
result[key_idx,1] = key
result[key_idx,2:end] = result[i,2:end]
end
end
result[key_idx,2:end] /= count
return result[1:key_idx,:]
end
Demo:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
a=[x,y,z]
julia> a
3-element Array{Array{Float64,2},1}:
[5.0 3.5; 6.0 3.6; 7.0 3.0]
[5.0 4.5; 6.0 4.7; 8.0 3.0]
[5.0 4.0; 6.0 3.2; 8.0 4.0]
julia> aggregate(a)
4×2 Array{Float64,2}:
5.0 4.0
6.0 3.83333
7.0 3.0
8.0 3.5

Here's a version that is ~6 times faster than the answer from #PicaudVincent (based on his input data), but which doesn't sort the keys, so the rows of the return matrix is in arbitrary order:
function accumarrays(A::Vector{Matrix{T}}) where {T}
d = Dict{T, Tuple{T, Int}}()
for a in A
for i in indices(a, 1)
ai = a[i, 1]
d[ai] = get(d, ai, (zero(T), 0)) .+ (a[i, 2], 1)
end
end
Aout = Matrix{typeof(one(T)/1)}(length(d), 2)
i = 0
for (key, val) in d
Aout[i+=1, 1] = key
Aout[i, 2] = val[1] / val[2]
end
return Aout
end
If you need the rows to be sorted, this works, but is just 4-5 times faster:
function accumarrays_(A::Vector{Matrix{T}}) where {T}
d = Dict{T, Tuple{T, Int}}()
for a in A
for i in indices(a, 1)
ai = a[i, 1]
d[ai] = get(d, ai, (zero(T), 0)) .+ (a[i, 2], 1)
end
end
dkeys = sort!(collect(keys(d)))
Aout = Matrix{typeof(one(T)/1)}(length(dkeys), 2)
for i in eachindex(dkeys)
val = d[dkeys[i]]
Aout[i, 1] = dkeys[i]
Aout[i, 2] = val[1] / val[2]
end
return Aout
end

Related

How to use nested list comprehension in julia

How do I make the following code into nested list comprehension?
node_x = 5
node_y = 5
node_z = 5
xyz = Matrix(undef, node_x*node_y*node_z,3)
ii = 0
dx = 1.0
for k in 0:node_z-1
for j in 0:node_y-1
for i in 0:node_x-1
x = i * dx
y = j * dx
z = k * dx
ii += 1
#println([x, y, z])
xyz[ii, 1] = x
xyz[ii, 2] = y
xyz[ii, 3] = z
end
end
end
In python and numpy, I can write such as following codes.
xyz = np.array([[i*dx, j*dx, k*dx] for k in range(node_z) for j in range(node_y) for i in range(node_x)])
Comprehensions can be nested just the same, it's just range that is a bit different, but in your case there is the start:end syntactic sugar:
julia> [[i*dx, j*dx, k*dx] for k in 1:node_z for j in 1:node_y for i in 1:node_x]
125-element Vector{Vector{Float64}}:
[1.0, 1.0, 1.0]
[2.0, 1.0, 1.0]
⋮
[4.0, 5.0, 5.0]
[5.0, 5.0, 5.0]
To get the same array as your Python example, you'd have to permute the dimensions of the 3-element vectors and concatenate the list:
julia> vcat(([i*dx j*dx k*dx] for k in 1:node_z for j in 1:node_y for i in 1:node_x)...)
125×3 Matrix{Float64}:
1.0 1.0 1.0
2.0 1.0 1.0
⋮
4.0 5.0 5.0
5.0 5.0 5.0

How to fill missing values in a loop using conditions in julia?

I would like to check the loop for number of items and then if the items are of not expected size then fill it with 0's. For example, I have created a loop which tries to access an array's elements for a range of 10
x = range(1, 100, length=45) |> collect
n = trunc(Int, length(x)/10) + 1
s = 1
l = 10
for i in 1:n
print(x[s:l])
s += 10
l +=10
end
In the above code, last iteration doesn't print any result as the number of elements are only 5 but it expects it to be 10. Hence, I would like to know, how may i check in this loop for every iteration the number of elements and if they are not expected then fill it with 0's.
Please suggest and advise on achieving the expected operation.
Thanks!
I think that PaddedViews is what you are looking for:
julia> using PaddedViews
julia> PaddedView(0, x, (ceil(Int, length(x)/10)*10,))
50-element PaddedView(0.0, ::Vector{Float64}, (Base.OneTo(50),)) with eltype Float64:
1.0
3.25
5.5
7.75
10.0
12.25
14.5
16.75
⋮
97.75
100.0
0.0
0.0
0.0
0.0
0.0

How to create two nested for loops in a single line in Julia

I have seen it a few times where someone has a situation where they want to put two for loops on the same line nested in one another.
Just to confirm, is this possible in Julia and if so what does it look like? Thanks!
Correct, Julia allows you to tersely express nested for loops.
As an example, consider filling in a 3x3 matrix in column order:
julia> xs = zeros(3,3)
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> let a = 1
for j in 1:3, i in 1:3
xs[i,j] = a
a += 1
end
end
julia> xs
3×3 Array{Float64,2}:
1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0
The above loop is equivalent to this more verbose version:
julia> let a = 1
for j in 1:3
for i in 1:3
xs[i,j] = a
a += 1
end
end
end
This syntax is even supported for higher dimensions(!):
julia> for k in 1:3, j in 1:3, i in 1:3
#show (i, j, k)
end

Merge arrays by common column values in julia

Suppose we have the following 3 arrays in Julia:
5.0 3.5
6.0 3.6
7.0 3.0
5.0 4.5
6.0 4.7
8.0 3.0
5.0 4.0
6.0 3.2
8.0 4.0
I want to merge the 3 arrays in one array, by common values of the first column, summing the values of the second column. The result must be the following array:
5.0 12
6.0 11.5
7.0 3.0
8.0 7.0
I tried vcat and reduce but I don't get the pretended result. Is there a relatively simple way to code the instructions, avoiding a time-consuming code? Thank you!
There are probably many ways to do it. If you want to avoid coding you can use DataFrames package. This is not the fastest solution, but it is short.
Assume you have arrays defined as variables:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
Then you can do:
using DataFrames
Matrix(aggregate(DataFrame(vcat(x,y,z)), :x1, sum))
The :x1 part is because by default first column of a DataFrame is called :x1 if you do not give an explicit name to it. In this recipe we convert matrices to a DataFrame aggregate them and convert back the result to a matrix.
Without extra package, a possible solution can be something like
function aggregate(m::Array{<:Number,2}...)
result=sortrows(vcat(m...))
n = size(result,1)
if n <= 1
return result
end
key_idx=1
key=result[key_idx,1]
for i in 2:n
if key==result[i,1]
result[key_idx,2:end] += result[i,2:end]
else
key = result[i,1]
key_idx += 1
result[key_idx,1] = key
result[key_idx,2:end] = result[i,2:end]
end
end
return result[1:key_idx,:]
end
Demo:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
aggregate(x,y,z)
Prints:
4×2 Array{Float64,2}:
5.0 12.0
6.0 11.5
7.0 3.0
8.0 7.0
Note: this solution also works with any number of columns
Given the following two assumptions:
the first column of each input array is sorted,
the first column of each input array is unique,
then for most input combinations (i.e. number of input arrays, sizes of arrays), the following algorithm should significantly outperform the other answers by taking advantage of the assumptions:
function f_ag(x::Matrix{T}...)::Matrix{T} where {T<:Number}
isempty(x) && error("Empty input")
any([ size(y,2) != 2 for y in x ]) && error("Input matrices must have two columns")
length(x) == 1 && return copy(x[1]) #simple case shortcut
nxmax = [ size(y,1) for y in x ]
nxarrinds = find(nxmax .> 0)
nxrowinds = ones(Int, length(nxarrinds))
z = Tuple{T,T}[]
while !isempty(nxarrinds)
xmin = minimum(T[ x[nxarrinds[j]][nxrowinds[j], 1] for j = 1:length(nxarrinds) ])
minarrinds = Int[ j for j = 1:length(nxarrinds) if x[nxarrinds[j]][nxrowinds[j], 1] == xmin ]
rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])
push!(z, (xmin, rowsum))
for k in minarrinds
nxrowinds[k] += 1
end
for j = length(nxarrinds):-1:1
if nxrowinds[j] > nxmax[nxarrinds[j]]
deleteat!(nxrowinds, j)
deleteat!(nxarrinds, j)
end
end
end
return [ z[n][j] for n = 1:length(z), j = 1:2 ]
end
If assumption 2 is violated, that is, the first column is not guaranteed to be unique, you can still take advantage of the sort order, but the algorithm is going to be more complicated again since you'll need to additionally look forward on each minimum index to check for duplicates. I'm not going to put myself through that pain at this point.
Also note, you could adjust the following line:
rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])
to this:
rowsum = input_func(T[ x[nxarrinds[k]][nxrowinds[k], 2:end] for k in minarrinds ])
and now you can input whatever function you like, and also have any number of additional columns in your input matrices.
There are probably some additional optimizations that could be added here, eg pre-allocating z, specialized routine when there are only two input matrices, etc, but I'm not going to bother with them.

Sparse Slicing of Sparse Arrays in Chapel

Given some A: [sps] over a sparse subdomain of a dom: domain(2), a slice A[A.domain.dim(1), k] yields the k​th​​ column as a dense 1D-array. How do I retrieve the k​th​​ n−1 dimensional slice of a sparse nD-array as a sparse (n-1)D-array?
var nv: int = 8,
D: domain(2) = {1..nv, 1..nv},
SD: sparse subdomain(D),
X: [SD] real;
SD += (1,2); X[1,2] = 1;
SD += (2,3); X[2,3] = 1;
SD += (3,1); X[3,1] = 1;
SD += (3,4); X[3,4] = 1;
SD += (4,5); X[4,5] = 1;
SD += (3,6); X[3,6] = 1;
SD += (6,8); X[6,8] = 1;
writeln(X);
writeln(X[X.domain.dim(1),2]);
returns
1.0
1.0
1.0 1.0 1.0
1.0
1.0
1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
The expectation in the case that I succeed in sparse slicing would be a single 1.0 returned with the ability to retrieve this position of that entry by calling writeln() on slice.domain.
I think that, unfortunately, you are doing the right sort of thing and that you're just running afoul of the current (as of Chapel 1.16) limitations with respect to slicing sparse domains.

Resources