Merge arrays by common column values in julia - julia

Suppose we have the following 3 arrays in Julia:
5.0 3.5
6.0 3.6
7.0 3.0
5.0 4.5
6.0 4.7
8.0 3.0
5.0 4.0
6.0 3.2
8.0 4.0
I want to merge the 3 arrays in one array, by common values of the first column, summing the values of the second column. The result must be the following array:
5.0 12
6.0 11.5
7.0 3.0
8.0 7.0
I tried vcat and reduce but I don't get the pretended result. Is there a relatively simple way to code the instructions, avoiding a time-consuming code? Thank you!

There are probably many ways to do it. If you want to avoid coding you can use DataFrames package. This is not the fastest solution, but it is short.
Assume you have arrays defined as variables:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
Then you can do:
using DataFrames
Matrix(aggregate(DataFrame(vcat(x,y,z)), :x1, sum))
The :x1 part is because by default first column of a DataFrame is called :x1 if you do not give an explicit name to it. In this recipe we convert matrices to a DataFrame aggregate them and convert back the result to a matrix.

Without extra package, a possible solution can be something like
function aggregate(m::Array{<:Number,2}...)
result=sortrows(vcat(m...))
n = size(result,1)
if n <= 1
return result
end
key_idx=1
key=result[key_idx,1]
for i in 2:n
if key==result[i,1]
result[key_idx,2:end] += result[i,2:end]
else
key = result[i,1]
key_idx += 1
result[key_idx,1] = key
result[key_idx,2:end] = result[i,2:end]
end
end
return result[1:key_idx,:]
end
Demo:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
aggregate(x,y,z)
Prints:
4×2 Array{Float64,2}:
5.0 12.0
6.0 11.5
7.0 3.0
8.0 7.0
Note: this solution also works with any number of columns

Given the following two assumptions:
the first column of each input array is sorted,
the first column of each input array is unique,
then for most input combinations (i.e. number of input arrays, sizes of arrays), the following algorithm should significantly outperform the other answers by taking advantage of the assumptions:
function f_ag(x::Matrix{T}...)::Matrix{T} where {T<:Number}
isempty(x) && error("Empty input")
any([ size(y,2) != 2 for y in x ]) && error("Input matrices must have two columns")
length(x) == 1 && return copy(x[1]) #simple case shortcut
nxmax = [ size(y,1) for y in x ]
nxarrinds = find(nxmax .> 0)
nxrowinds = ones(Int, length(nxarrinds))
z = Tuple{T,T}[]
while !isempty(nxarrinds)
xmin = minimum(T[ x[nxarrinds[j]][nxrowinds[j], 1] for j = 1:length(nxarrinds) ])
minarrinds = Int[ j for j = 1:length(nxarrinds) if x[nxarrinds[j]][nxrowinds[j], 1] == xmin ]
rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])
push!(z, (xmin, rowsum))
for k in minarrinds
nxrowinds[k] += 1
end
for j = length(nxarrinds):-1:1
if nxrowinds[j] > nxmax[nxarrinds[j]]
deleteat!(nxrowinds, j)
deleteat!(nxarrinds, j)
end
end
end
return [ z[n][j] for n = 1:length(z), j = 1:2 ]
end
If assumption 2 is violated, that is, the first column is not guaranteed to be unique, you can still take advantage of the sort order, but the algorithm is going to be more complicated again since you'll need to additionally look forward on each minimum index to check for duplicates. I'm not going to put myself through that pain at this point.
Also note, you could adjust the following line:
rowsum = sum(T[ x[nxarrinds[k]][nxrowinds[k], 2] for k in minarrinds ])
to this:
rowsum = input_func(T[ x[nxarrinds[k]][nxrowinds[k], 2:end] for k in minarrinds ])
and now you can input whatever function you like, and also have any number of additional columns in your input matrices.
There are probably some additional optimizations that could be added here, eg pre-allocating z, specialized routine when there are only two input matrices, etc, but I'm not going to bother with them.

Related

How to fill missing values in a loop using conditions in julia?

I would like to check the loop for number of items and then if the items are of not expected size then fill it with 0's. For example, I have created a loop which tries to access an array's elements for a range of 10
x = range(1, 100, length=45) |> collect
n = trunc(Int, length(x)/10) + 1
s = 1
l = 10
for i in 1:n
print(x[s:l])
s += 10
l +=10
end
In the above code, last iteration doesn't print any result as the number of elements are only 5 but it expects it to be 10. Hence, I would like to know, how may i check in this loop for every iteration the number of elements and if they are not expected then fill it with 0's.
Please suggest and advise on achieving the expected operation.
Thanks!
I think that PaddedViews is what you are looking for:
julia> using PaddedViews
julia> PaddedView(0, x, (ceil(Int, length(x)/10)*10,))
50-element PaddedView(0.0, ::Vector{Float64}, (Base.OneTo(50),)) with eltype Float64:
1.0
3.25
5.5
7.75
10.0
12.25
14.5
16.75
⋮
97.75
100.0
0.0
0.0
0.0
0.0
0.0

How to create two nested for loops in a single line in Julia

I have seen it a few times where someone has a situation where they want to put two for loops on the same line nested in one another.
Just to confirm, is this possible in Julia and if so what does it look like? Thanks!
Correct, Julia allows you to tersely express nested for loops.
As an example, consider filling in a 3x3 matrix in column order:
julia> xs = zeros(3,3)
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> let a = 1
for j in 1:3, i in 1:3
xs[i,j] = a
a += 1
end
end
julia> xs
3×3 Array{Float64,2}:
1.0 4.0 7.0
2.0 5.0 8.0
3.0 6.0 9.0
The above loop is equivalent to this more verbose version:
julia> let a = 1
for j in 1:3
for i in 1:3
xs[i,j] = a
a += 1
end
end
end
This syntax is even supported for higher dimensions(!):
julia> for k in 1:3, j in 1:3, i in 1:3
#show (i, j, k)
end

Merge large number of arrays by common column values in julia

Expanding a previous question I put here before, suppose we have a large number of arrays (say 500 arrays), like the following 3 first ones
5.0 3.5
6.0 3.6
7.0 3.0
5.0 4.5
6.0 4.7
8.0 3.0
5.0 4.0
6.0 3.2
8.0 4.0
and so on, stored in one array, so that we have an array of 500 arrays of the type above. I want to merge the 500 arrays into one array, by common values of the first column, calculating the mean values of the corresponding elements of the second column. The result must be the following array:
5.0 mean of all 5's values
6.0 mean of all 6's values
7.0 mean of all 7's values
8.0 mean of all 8's values
How can I achieve that? Thank you!
Also back with a slight modification of https://stackoverflow.com/a/50842721/2001017
function aggregate(m::Array{<:Array{<:Number,2},1})
result=sortrows(vcat(m...))
n = size(result,1)
if n <= 1
return result
end
key_idx = 1
key = result[key_idx,1]
count = 1
for i in 2:n
if key == result[i,1]
result[key_idx,2:end] += result[i,2:end]
count += 1
else
result[key_idx,2:end] /= count
count = 1
key = result[i,1]
key_idx += 1
result[key_idx,1] = key
result[key_idx,2:end] = result[i,2:end]
end
end
result[key_idx,2:end] /= count
return result[1:key_idx,:]
end
Demo:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
z = [5.0 4.0
6.0 3.2
8.0 4.0]
a=[x,y,z]
julia> a
3-element Array{Array{Float64,2},1}:
[5.0 3.5; 6.0 3.6; 7.0 3.0]
[5.0 4.5; 6.0 4.7; 8.0 3.0]
[5.0 4.0; 6.0 3.2; 8.0 4.0]
julia> aggregate(a)
4×2 Array{Float64,2}:
5.0 4.0
6.0 3.83333
7.0 3.0
8.0 3.5
Here's a version that is ~6 times faster than the answer from #PicaudVincent (based on his input data), but which doesn't sort the keys, so the rows of the return matrix is in arbitrary order:
function accumarrays(A::Vector{Matrix{T}}) where {T}
d = Dict{T, Tuple{T, Int}}()
for a in A
for i in indices(a, 1)
ai = a[i, 1]
d[ai] = get(d, ai, (zero(T), 0)) .+ (a[i, 2], 1)
end
end
Aout = Matrix{typeof(one(T)/1)}(length(d), 2)
i = 0
for (key, val) in d
Aout[i+=1, 1] = key
Aout[i, 2] = val[1] / val[2]
end
return Aout
end
If you need the rows to be sorted, this works, but is just 4-5 times faster:
function accumarrays_(A::Vector{Matrix{T}}) where {T}
d = Dict{T, Tuple{T, Int}}()
for a in A
for i in indices(a, 1)
ai = a[i, 1]
d[ai] = get(d, ai, (zero(T), 0)) .+ (a[i, 2], 1)
end
end
dkeys = sort!(collect(keys(d)))
Aout = Matrix{typeof(one(T)/1)}(length(dkeys), 2)
for i in eachindex(dkeys)
val = d[dkeys[i]]
Aout[i, 1] = dkeys[i]
Aout[i, 2] = val[1] / val[2]
end
return Aout
end

What's Julia's equivalent of R's seq(..., length.out = n)

I can see from this link that R's equivalent of seq is n:m in (http://www.johnmyleswhite.com/notebook/2012/04/09/comparing-julia-and-rs-vocabularies/).
But the case of seq(a,b, length.out = n) is not covered.
For example seq(1, 6, length.out=3) gives c(1.0, 3.5, 6.0). It is a really nice way to specify the number of outputs.
What's its equivalent in Julia?
As of Julia 1.0:
linspace has been deprecated. You can still use range:
julia> range(0, stop = 5, length = 3)
0.0:2.5:5.0
As #TasosPapastylianou noted, if you want this to be a vector of values, you can use collect:
julia> collect( range(0, stop = 5, length = 3) )
3-element Array{Float64,1}:
0.0
2.5
5.0
You are looking for the linspace function. Note this is synonymous to the equivalent function in matlab / octave.
Also note that this returns a "steprange" type object:
julia> a = linspace(1,5,9)
1.0:0.5:5.0
julia> typeof(a)
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}
julia> collect(a)
9-element Array{Float64,1}:
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
PS: similarly, there exists a range function which is equivalent to the start:step:stop syntax, similar to the seq(from=, to=, by=) syntax in R.

Determinant in Fortran95

This code in fortran calculates the determinant of a nxn matrix using the laplacian formula (expansion by minors). I understand fully how this process works.
But could somebody give me an insight into how the following code operates over, say a given iteration, this section of the code contains the recursive function determinant(matrix) - assume some nxn matrix is read in and passed through and another function to call the cofactor. There are aspects of the code I understand but its the recursion that is confusing me profoundly. I've tried to run through step by step with a 3x3 matrix but to no avail.
! Expansion of determinants using Laplace formula
recursive function determinant(matrix) result(laplace_det)
real, dimension(:,:) :: matrix
integer :: msize(2), i, n
real :: laplace_det, det
real, dimension(:,:), allocatable :: cf
msize = shape(matrix)
n = msize(1)
if (n .eq. 1) then
det = matrix(1,1)
else
det = 0
do i=1, n
allocate(cf(n-1, n-1))
cf = cofactor(matrix, i, 1)
det = det + ((-1)**(i+1))* matrix(i,1) * determinant(cf)
deallocate(cf)
end do
end if
laplace_det = det
end function determinant
function cofactor(matrix, mI, mJ)
real, dimension(:,:) :: matrix
integer :: mI, mJ
integer :: msize(2), i, j, k, l, n
real, dimension(:,:), allocatable :: cofactor
msize = shape(matrix)
n = msize(1)
allocate(cofactor(n-1, n-1))
l=0
k = 1
do i=1, n
if (i .ne. mI) then
l = 1
do j=1, n
if (j .ne. mJ) then
cofactor(k,l) = matrix(i,j)
l = l+ 1
end if
end do
k = k+ 1
end if
end do
return
end function cofactor
The main section im struggling with is these two calls and the operation of the respective cofactor calculation.
cf = cofactor(matrix, i, 1)
det = det + ((-1)**(i+1))* matrix(i,1) * determinant(cf)
Any input for an explanation would be greatly appreciated (like i said an example of one iteration). This is my first post in stack-overflow as most of my question reside in mathstack (as you can probably tell by the mathematical nature of the question). Although I do have experience programming, the concept of recursion (especially in this example) is really boggling my mind.
If any edits are required please go ahead, im not familiar with the etiquette on stack overflow.
Let us suppose that we pass the following 3x3 matrix to determinant():
2 9 4
7 5 3
6 1 8
In the routine, the following two lines are executed iteratively for i = 1,2,3:
cf = cofactor(matrix, i, 1)
det = det + ((-1)**(i+1))* matrix(i,1) * determinant(cf)
which corresponds to the Laplace expansion with respect to the first column. More specifically, one passes the above 3x3 matrix to cofactor() to get a 2x2 sub-matrix by removing the i-th row and 1st column of the matrix. The obtained 2x2 sub-matrix (cf) is then passed to determinant() in the next line to calculate the co-factor corresponding to this sub-matrix. So, in this first iterations we are trying to calculate
Note here that the three determinants in the right-hand side are yet to be calculated by subsequent calls of determinant(). Let us consider one such subsequent call, e.g. for i=1. We are passing the following sub-matrix (stored in cf)
5 3
1 8
to determinant(). Then, the same procedure as described above is repeated again and independently of the Laplace expansion for the parent 3x3 matrix. That is, the determinant() now iterates over i=1,2 and tries to calculate
Note that the i in this subsequent call is different from the i of the previous call; they are all local variables living inside a particular call of a routine and are totally independent from each other. Also note that the index of dummy array argument (like matrix(:,:)) always start from 1 in Fortran (unless otherwise specified). This kind of operations are repeated until the size of the sub-matrix becomes 1.
But in practice, I believe that the easiest way to understand this kind of code is to write intermediate data and track the actual flow of data/routines. For example, we can insert a lot of print statements as
module mymod
implicit none
contains
recursive function determinant(matrix) result(laplace_det)
real :: matrix(:,:)
integer :: i, n, p, q
real :: laplace_det, det
real, allocatable :: cf(:,:)
n = size(matrix, 1)
!***** output *****
print "(a)", "Entering determinant() with matrix = "
do p = 1, n
print "(4x,100(f3.1,x))", ( matrix( p, q ), q=1,n )
enddo
if (n == 1) then
det = matrix(1,1)
else
det = 0
do i = 1, n
allocate( cf(n-1, n-1) )
cf = cofactor( matrix, i, 1 )
!***** output *****
print "(4x,a,i0,a,i0,a)", "Getting a ", &
n-1, "-by-", n-1, " sub-matrix from cofactor():"
do p = 1, n-1
print "(8x, 100(f3.1,x))", ( cf( p, q ), q=1,n-1 )
enddo
print "(4x,a)", "and passing it to determinant()."
det = det + ((-1)**(i+1))* matrix( i, 1 ) * determinant( cf )
deallocate(cf)
end do
end if
laplace_det = det
!***** output *****
print *, " ---> Returning det = ", det
end function
function cofactor(matrix, mI, mJ)
.... (same as the original code)
end function
end module
program main
use mymod
implicit none
real :: a(3,3), det
a( 1, : ) = [ 2.0, 9.0, 4.0 ]
a( 2, : ) = [ 7.0, 5.0, 3.0 ]
a( 3, : ) = [ 6.0, 1.0, 8.0 ]
det = determinant( a )
print "(a, es30.20)", "Final det = ", det
end program
then the output clearly shows how the data are processed:
Entering determinant() with matrix =
2.0 9.0 4.0
7.0 5.0 3.0
6.0 1.0 8.0
Getting a 2-by-2 sub-matrix from cofactor():
5.0 3.0
1.0 8.0
and passing it to determinant().
Entering determinant() with matrix =
5.0 3.0
1.0 8.0
Getting a 1-by-1 sub-matrix from cofactor():
8.0
and passing it to determinant().
Entering determinant() with matrix =
8.0
---> Returning det = 8.0000000
Getting a 1-by-1 sub-matrix from cofactor():
3.0
and passing it to determinant().
Entering determinant() with matrix =
3.0
---> Returning det = 3.0000000
---> Returning det = 37.000000
Getting a 2-by-2 sub-matrix from cofactor():
9.0 4.0
1.0 8.0
and passing it to determinant().
Entering determinant() with matrix =
9.0 4.0
1.0 8.0
Getting a 1-by-1 sub-matrix from cofactor():
8.0
and passing it to determinant().
Entering determinant() with matrix =
8.0
---> Returning det = 8.0000000
Getting a 1-by-1 sub-matrix from cofactor():
4.0
and passing it to determinant().
Entering determinant() with matrix =
4.0
---> Returning det = 4.0000000
---> Returning det = 68.000000
Getting a 2-by-2 sub-matrix from cofactor():
9.0 4.0
5.0 3.0
and passing it to determinant().
Entering determinant() with matrix =
9.0 4.0
5.0 3.0
Getting a 1-by-1 sub-matrix from cofactor():
3.0
and passing it to determinant().
Entering determinant() with matrix =
3.0
---> Returning det = 3.0000000
Getting a 1-by-1 sub-matrix from cofactor():
4.0
and passing it to determinant().
Entering determinant() with matrix =
4.0
---> Returning det = 4.0000000
---> Returning det = 7.0000000
---> Returning det = -360.00000
Final det = -3.60000000000000000000E+02
You can insert more print lines until the whole mechanism becomes clear.
BTW, the code in the Rossetta page seems much simpler than the OP's code by creating a sub-matrix directly as a local array. The simplified version of the code reads
recursive function det_rosetta( mat, n ) result( accum )
integer :: n
real :: mat(n, n)
real :: submat(n-1, n-1), accum
integer :: i, sgn
if ( n == 1 ) then
accum = mat(1,1)
else
accum = 0.0
sgn = 1
do i = 1, n
submat( 1:n-1, 1:i-1 ) = mat( 2:n, 1:i-1 )
submat( 1:n-1, i:n-1 ) = mat( 2:n, i+1:n )
accum = accum + sgn * mat(1, i) * det_rosetta( submat, n-1 )
sgn = - sgn
enddo
endif
end function
Note that the Laplace expansion is made along the first row, and that the submat is assigned using array sections. The assignment can also be written simply as
submat( :, :i-1 ) = mat( 2:, :i-1 )
submat( :, i: ) = mat( 2:, i+1: )
where the upper and lower bounds of the array sections are omitted (then, the declared values of upper and lower bounds are used by default). The latter form is used in the Rosetta page.

Resources