I have two multi-dimensional arrays, i.e. A3D and B3D. The array A3D has a dimension of 2 x 2 x n, while the array B3D has a dimension of m x 2 x n. Every 2D subarray in A3D is a symmetric matrix. For every i, I want to compute B3D[:,:,i]* A3D[:,:,i]* transpose(B3D[:,:,i]). The result is then stored in a multi-dimensional array. I tried the following Julia codes to accomplish the task. However, the computational time with my codes was around 4(s), which is quite slow. I am wondering whether the performance of my codes could be improved. Below are my Julia codes. Thanks for looking at my problem.
m = 100;
n = 30_000; # this could be a very large number.
A3D = rand(2,2,n);
[A3D[:,:,i] = Symmetric(A3D[:,:,i]) for i in 1:n];
B3D = rand(m,2,n);
res3D = zeros(m,m,n);
# approach 1
#time [res3D[:,:,i] = eB*eA*transpose(eB)
for (eA, eB, i) in zip(eachslice(A3D,dims=3),eachslice(B3D,dims=3),1:n)];
UPDATE:
I added another approach to tackle my problem (see below). Approach 2 is a bit better than approach 1. But, can we even improve the performance of my code further?
# approach 2
#inbounds for i = 1:n
res3D[:,:,i] = B3D[:,:,i]*A3D[:,:,i]*B3D[:,:,i]';
end
Related
Let's say I have a vector V, and I want to either turn this vector into multiple m x n matrices, or get multiple m x n matrices from this Vector V.
For the most basic example: Turn V = collect(1:75) into 3 5x5 matrices.
As far as I am aware this can be done by first using reshape reshape(V, 5, :) and then looping through it. Is there a better way in Julia without using a loop?
If possible, a solution that can easily change between row-major and column-major results is preferrable.
TL:DR
m, n, n_matrices = 4, 2, 5
V = collect(1:m*n*n_matrices)
V = reshape(V, m, n, :)
V = permutedims(V, [2,1,3])
display(V)
From my limited knowledge about Julia:
When doing V = collect(1:m*n), you initialize a contiguous array in memory. From V you wish to create a container of m by n matrices. You can achieve this by doing reshape(V, m, n, :), then you can access the first matrix with V[:,:,1]. The "container" in this case is just another array (thus you have a three dimensional array), which in this case we interpret as "an array of matrices" (but you could also interpret it as a box). You can then transpose every matrix in your array by swapping the first two dimensions like this: permutedims(V, [2,1,3]).
How this works
From what I understand; n-dimensional arrays in Julia are contiguous arrays in memory when you don't do any "skipping" (e.g. V[1:2:end]). For example the 2 x 4 matrix A:
1 3 5 7
2 4 6 8
is in memory just 1 2 3 4 5 6 7 8. You simply interpret the data in a specific way, where the first two numbers makes up the first column, then the second two numbers makes the next column so on so forth. The reshape function simply specifies how you want to interpret the data in memory. So if we did reshape(A, 4, 2) we basically interpret the numbers in memory as "the first four values makes the first column, the second four values makes the second column", and we would get:
1 5
2 6
3 7
4 8
We are basically doing the same thing here, but with an extra dimension.
From my observations it also seems to be that permutedims in this case reallocates memory. Also, feel free to correct me if I am wrong.
Old answer:
I don't know much about Julia, but in Python using NumPy I would have done something like this:
reshape(V, :, m, n)
EDIT: As #BatWannaBe states, the result is technically one array (but three dimensional). You can always interpret a three dimensional array as a container of 2D arrays, which from my understanding is what you ask for.
I need a function which will give me is_coprime(m, n) in constant time. In my case, 1 < m, n < 100. So I can easily pre-compute the values and store them in an 100x100 array.
My approach would be to store gcd(m,n) == 1 for each m,n index in the array. However this is also time-consuming. So I was wondering if there is a well-known algorithmic solution for this type of problems, where all possible gcd / co-prime over a 1 to N range is needed.
I'm learning Julia, but have relatively little programming experience outside of R. I'm taking this problem directly from rosalind.info and you can find it here if you'd like a bit more detail.
I've given two strings: a motif and a sequence where the motif is a substring of the sequence and i'm tasked with finding out the index of the beginning position of the substring however many times it is found in the sequence.
For example:
Sequence: "GATATATGCATATACTT"
Motif: "ATAT"
ATAT is found three times, once beginning at index 2, once at index 4, and once at index 10. This is assuming 1-based indexing. So the final output would be: 2 4 10
Here's what I have so far:
f = open("motifs.txt")
stream = readlines(f)
sequence = chomp(stream[1])
motif = chomp(stream[2])
println("Sequence: $sequence")
println("Motif: $motif")
result = searchindex(sequence, motif)
println("$result")
close(f)
My main problem seems to be that there isn't a searchindexall option. The current script gives me the first index of the first time the motif is encountered (index 2), i've tried a variety of for loops that haven't ended in much success so i'm hoping that someone can give me some insight on this.
Here is one solution with while loops:
sequence = "GATATATGCATATACTT"
motif = "ATAT"
function find_indices(sequence, motif)
# initalise empty array of integers
found_indices = Array{Int, 1}()
# set initial values for search helpers
start_at = 1
while true
# search string for occurrence of motif
result = searchindex(sequence, motif, start_at)
# if motif not found, terminate while loop
result == 0 && break
# add new index to results
push!(found_indices, result-1+start_at)
start_at += result + 1
end
return found_indices
end
This gives what you want:
>find_indices(sequence, motif)
2
4
10
If the performance is not so important, regular expression can be a good choice.
julia> map(x->x.offset, eachmatch(r"ATAT", "GATATATGCATATACTT", true))
3-element Array{Any,1}:
2
4
10
PS. The third arguments of eachmatch means "overlap", don't forget to set it true.
If a better performance is required, maybe you should spend some time implementing an algorithm like KMP.
I am trying to create a function that takes the sum of the first n odd integers, i.e the summation from i=1 to n of (2i-1).
If n = 1 it should output 1
If n = 2 it should output 4
I'm having problems using a for loop which only outputs the nth term
n <-2
for (i in 1:n)
{
y<-((2*i)-1)
}
y
In R programming we try avoiding for loops
cumsum ( seq(1,2*n, by=2) )
Or just use 'sum' if you don't want the series of partial sums.
There's actually no need to use a loop or to construct the sequence of the first n odd numbers here -- this is an arithmetic series so we know the sum of the first n elements in closed form:
sum.first.n.odd <- function(n) n^2
sum.first.n.odd(1)
[1] 1
sum.first.n.odd(2)
[1] 4
sum.first.n.odd(100)
[1] 10000
This should be a good deal more efficient than any solution based on for or sum because it never computes the elements of the sequence.
[[Just seeing the title -- the OP apparently knows the analytic result and wanted something else...]]
Try this:
sum=0
n=2
for(i in seq(1,2*n,2)){
sum=sum+i
}
But, of course, R is rather slow when working with loops. That's why one should avoid them.
In the following code I am using the Julia Optim package for finding an optimal matrix with respect to an objective function.
Unfortunately the provided optimize function only supports vectors, so I have to transform the matrix to a vector before passing it to the optimize function, and also transform it back when using it in the objective function.
function opt(A0,X)
I1(A) = sum(maximum(X*A,1))
function transform(A)
# reshape matrix to vector
return reshape(A,prod(size(A)))
end
function transformback(tA)
# reshape vector to matrix
return reshape(tA, size(A0))
end
obj(tA) = -I1(transformback(tA))
result = optimize(obj, transform(A0), method = :nelder_mead)
return transformback(result.minimum)
end
I think Julia is allocating new space for this every time and it feels slow, so what would be a more efficient way to tackle this problem?
So long as arrays contain elements that are considered immutable, which includes all primitives, then elements of an array are contained in 1 big contiguous blob of memory. So you can break dimension rules and simply treat a 2 dimensional array as a 1-dimensional array, which is what you want to do. So you don't need to reshape, but I don't think reshape is your problem
Arrays are column major and contiguous
Consider the following function
function enumerateArray(a)
for i = 1:*(size(a)...)
print(a[i])
end
end
This function multiplies all of the dimensions of a together and then loops from 1 to that number assuming a is one dimensional.
When you define a as the following
julia> a = [ 1 2; 3 4; 5 6]
3x2 Array{Int64,2}:
1 2
3 4
5 6
The result is
julia> enumerateArray(a)
135246
This illustrates a couple of things.
Yes it actually works
Matrices are stored in column-major format
reshape
So, the question is why doesn't reshape use that fact? Well it does. Here's the julia source for reshape in array.c
a = (jl_array_t*)allocobj((sizeof(jl_array_t) + sizeof(void*) + ndimwords*sizeof(size_t) + 15)&-16);
So yes a new array is created, but the only the new dimension information is created, it points back to the original data which is not copied. You can verify this simply like this:
b = reshape(a,6);
julia> size(b)
(6,)
julia> size(a)
(3,2)
julia> b[4]=100
100
julia> a
3x2 Array{Int64,2}:
1 100
3 4
5 6
So setting the 4th element of b sets the (1,2) element of a.
As for overall slowness
I1(A) = sum(maximum(X*A,1))
will create a new array.
You can use a couple of macros to track this down #profile and #time. Time will additionally record the amount of memory allocated and can be put in front of any expression.
For example
julia> A = rand(1000,1000);
julia> X = rand(1000,1000);
julia> #time sum(maximum(X*A,1))
elapsed time: 0.484229671 seconds (8008640 bytes allocated)
266274.8435928134
The statistics recorded by #profile are output using Profile.print()
Also, most methods in Optim actually allow you to supply Arrays, not just Vectors. You could generalize the nelder_mead function to do the same.