I have a code which follows particles and outputs the particles at N timeslices. So what I have in the end is an array TEMP(1:M,0:N) where M is the total number of particles. Now the problem is easily made parallel by dividing up the particles. So each processor does work on an array TEMP(MYSTART:MYEND,0:N), while the master does it's work in TEMP(1:M,0:N). Now I need to recombine the array (there are actually multiple arrays like this but focus on one for now). Using a SENDV-RECV loop I coded up:
IF (myid == master) THEN
ALLOCATE(B_lines(1:nlines,0:nsteps),...
ELSE
ALLOCATE(B_lines(mystart:myend,0:nsteps),...
END IF
CALL MPI_BARRIER(MPI_COMM_FIELDLINES,ierr_mpi)
IF (ierr_mpi /=0) CALL andle_err(MPI_BARRIER_ERR,'fieldlines_init',ierr_mpi)
IF (myid == master) THEN
ALLOCATE(buffer_mast(4,0:nsteps))
DO i = myend+1, nlines
CALL MPI_RECV(buffer_mast,4*(nsteps+1),MPI_DOUBLE_PRECISION,&
MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_FIELDLINES,status,ierr_mpi)
IF (ierr_mpi /=0) CALL handle_err(MPI_RECV_ERR,'fieldlines_init_mgrid',ierr_mpi)
sender = status(MPI_SOURCE)
j = status(MPI_TAG)
R_lines(j,:) = buffer_mast(1,:)
Z_lines(j,:) = buffer_mast(2,:)
PHI_lines(j,:) = buffer_mast(3,:)
B_lines(j,:) = buffer_mast(4,:)
END DO
DEALLOCATE(buffer_mast)
ELSE
IF (mystart <= nlines) THEN
ALLOCATE(buffer_slav(4,0:nsteps))
DO j = mystart, myend
buffer_slav(1,:) = R_lines(j,:)
buffer_slav(2,:) = Z_lines(j,:)
buffer_slav(3,:) = PHI_lines(j,:)
buffer_slav(4,:) = B_lines(j,:)
CALL MPI_SEND(buffer_slav,4*(nsteps+1),MPI_DOUBLE_PRECISION,master,j,MPI_COMM_FIELDLINES,ierr_mpi)
IF (ierr_mpi /=0) CALL handle_err(MPI_SEND_ERR,'fieldlines_init_mgrid',ierr_mpi)
END DO
DEALLOCATE(buffer_slav)
END IF
END IF
Now this works just fine but scales poorly. Even with only 64 cores the code spends a great deal of time just sending the data back and forth. Now I'd like to take advantage of GATHERV. So I created a subroutine which I call like:
CALL FIELDLINES_TRANSMIT_2DDBL(mystart,myend,0,nsteps,B_lines(mystart:myend,0:nsteps),&
numprocs,mnum,moffsets,myid,master,MPI_COMM_FIELDLINES,ier)
And looks like:
SUBROUTINE FIELDLINES_TRANSMIT_2DDBL(n1,n2,m1,m2,data_in,nproc,mnum,moffsets,id,root,COMM_local,ier)
USE stel_kinds, ONLY: rprec
IMPLICIT NONE
INCLUDE 'mpif.h' ! MPI
INTEGER, INTENT(in) :: n1,n2,m1,m2,nproc,id,root,COMM_local
INTEGER, INTENT(in) :: mnum(nproc), moffsets(nproc)
REAL(rprec), INTENT(inout) :: data_in(n1:n2,m1:m2)
INTEGER, INTENT(inout) :: ier
INTEGER, PARAMETER :: ndims=2
INTEGER, PARAMETER :: sstart(2) = (/0,0/) ! Starting offsets
INTEGER :: dbl_size, localsize, ARRAY_SEND_TYPE, RESIZED_ARRAY_SEND_TYPE
INTEGER :: asize(ndims), ssize(ndims), mrec(nproc)
INTEGER(KIND=MPI_ADDRESS_KIND):: low_bound,extent
DOUBLE PRECISION, ALLOCATABLE :: buffer_temp(:,:)
IF (ier <0) RETURN
mrec = 1
ssize(1) = n2-n1+1
ssize(2) = m2-m1+1
localsize = mnum(id+1)
ALLOCATE(buffer_temp(ssize(1),ssize(2)))
buffer_temp(1:ssize(1),1:ssize(2)) = data_in(n1:n2,m1:m2)
asize = ssize
CALL MPI_BCAST(asize, 2, MPI_INTEGER, root, COMM_local, ier)
CALL MPI_TYPE_CREATE_SUBARRAY(ndims,asize,ssize,sstart,MPI_ORDER_FORTRAN,&
MPI_DOUBLE_PRECISION,ARRAY_SEND_TYPE,ier)
CALL MPI_TYPE_COMMIT(ARRAY_SEND_TYPE,ier)
CALL MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION, dbl_size,ier)
low_bound = 0
extent = dbl_size
CALL MPI_TYPE_CREATE_RESIZED(ARRAY_SEND_TYPE,low_bound,extent,RESIZED_ARRAY_SEND_TYPE,ier)
CALL MPI_TYPE_COMMIT(RESIZED_ARRAY_SEND_TYPE,ier)
IF (id == root) THEN
localsize = PRODUCT(ssize)
mrec(1) = localsize
CALL MPI_GATHERV(MPI_IN_PLACE,1,MPI_DOUBLE_PRECISION,&
data_in,mrec, moffsets,RESIZED_ARRAY_SEND_TYPE,&
root,COMM_local,ier)
ELSE
CALL MPI_GATHERV(buffer_temp,localsize,MPI_DOUBLE_PRECISION,&
buffer_temp,mrec, moffsets,RESIZED_ARRAY_SEND_TYPE,&
root,COMM_local,ier)
END IF
CALL MPI_TYPE_FREE(ARRAY_SEND_TYPE,ier); IF (ier <0) RETURN
CALL MPI_TYPE_FREE(RESIZED_ARRAY_SEND_TYPE,ier); IF (ier <0) RETURN
DEALLOCATE(buffer_temp)
ier = 0
CALL MPI_BARRIER(COMM_local, ier)
RETURN
END SUBROUTINE FIELDLINES_TRANSMIT_2DDBL
Now I had another code which had opposite indexing (0:nsteps,1:nlines) and it appears to work just fine, but when I try to pass the arrays indexed (1:nlines,0:nsteps) the resulting array appears to all messed up. Any advice would be appreciated.
I'm not exactly clear about all the details of your setup, but here is a simple solution that illustrates how to do it under the restriction that the work divides exactly among processes, i.e. nlines is a multiple of size.
In this case, the pattern at the master is a simple vector (i.e. a 2D subarray). The only trick is to resize the vector appropriately so that the contributions line up correctly at the receiver. You don't need to use Gatherv - each process sends a bunch of doubles and the master receives a single vector from everyone.
The solution for the general case depends on how you split up the data when it does not divide exactly. You could use this solution if all the processes have the same amount of data except for the last one who has less data, but you pad out the data on the last process so that it can send and receive the same amount of data as the rest (including some dummy data).
I initialise the data so that, for example, at step 3 the values of the lines distributed across processes are 301, 302, 303, ... 300+nlines. If I run on 4 processes with 12 lines and 4 steps then the output looks correct:
laptop:~$ mpiexec -n 4 ./gather2d
rank, mystart, myend 2 7 9
rank, mystart, myend 1 4 6
rank, mystart, myend 0 1 3
rank, mystart, myend 3 10 12
1 101 201 301 401
2 102 202 302 402
3 103 203 303 403
4 104 204 304 404
5 105 205 305 405
6 106 206 306 406
7 107 207 307 407
8 108 208 308 408
9 109 209 309 409
10 110 210 310 410
11 111 211 311 411
12 112 212 312 412
Here's the code:
program gather2d
use mpi
implicit none
integer :: nlines, nsteps
integer :: comm, rank, size, ierr, arraytype, arraytyperesized
integer :: mystart, myend
integer :: i, j
integer :: dblesize
double precision, allocatable, dimension(:,:) :: B_lines
integer(kind=mpi_address_kind) lb, extent
comm = MPI_COMM_WORLD
nlines = 12
nsteps = 4
call MPI_Init(ierr)
call MPI_Comm_size(comm, size, ierr)
call MPI_Comm_rank(comm, rank, ierr)
mystart = nlines/size * rank + 1
myend = nlines/size * (rank+1)
write(*,*) 'rank, mystart, myend ', rank, mystart, myend
if (rank == 0) then
allocate(B_lines(1:nlines, 0:nsteps))
else
allocate(B_lines(mystart:myend, 0:nsteps))
end if
do i = mystart, myend
do j = 0, nsteps
B_lines(i,j) = i+100*j
end do
end do
call MPI_Type_vector(nsteps+1, myend-mystart+1, nlines, &
MPI_DOUBLE_PRECISION, arraytype, ierr)
call MPI_Type_size(MPI_DOUBLE_PRECISION, dblesize, ierr)
lb = 0
extent = (myend-mystart+1)*dblesize
call MPI_Type_Create_resized(arraytype, lb, extent, arraytyperesized, ierr)
call MPI_Type_commit(arraytyperesized, ierr)
if (rank == 0) then
call MPI_Gather(MPI_IN_PLACE, (nsteps+1)*(myend-mystart+1), &
MPI_DOUBLE_PRECISION, &
B_lines, 1, arraytyperesized, 0, comm, ierr)
else
call MPI_Gather(B_lines, (nsteps+1)*(myend-mystart+1), &
MPI_DOUBLE_PRECISION, &
B_lines, 1, arraytyperesized, 0, comm, ierr)
end if
if (rank == 0) then
do i = 1, nlines
write(*,*) (int(B_lines(i,j)), j = 0, nsteps)
end do
end if
call MPI_Finalize(ierr)
end program gather2d
OK so I found a memory hungry way to do this. Basically I put everything is globally sized arrays and use MPI_REDUCE. This is much faster than my old queue system but at the cost of memory. Here's the code
SUBROUTINE TRANSMIT_2DDBL(n1,n2,m1,m2,data_in,n1_gbl,n2_gbl,id,root,COMM_local,ier)
IMPLICIT NONE
INCLUDE 'mpif.h' ! MPI
INTEGER, INTENT(in) :: n1,n2,m1,m2,id,root,COMM_local
INTEGER, INTENT(in) :: n1_gbl, n2_gbl
DOUBLE PRECISION, INTENT(inout) :: data_in(n1:n2,m1:m2)
INTEGER, INTENT(inout) :: ier
INTEGER :: nt_gbl
DOUBLE PRECISION, ALLOCATABLE :: buffer_temp(:,:)
IF (ier <0) RETURN
nt_gbl=(n2_gbl-n1_gbl+1)*(m2-m1+1)
ALLOCATE(buffer_temp(n1_gbl:n2_gbl,m1:m2))
buffer_temp = 0
buffer_temp(n1:n2,m1:m2) = data_in(n1:n2,m1:m2)
IF (id == root) THEN
CALL MPI_REDUCE(MPI_IN_PLACE,buffer_temp,nt_gbl,MPI_DOUBLE_PRECISION,MPI_SUM,&
root,COMM_local,ier)
data_in = buffer_temp
ELSE
CALL MPI_REDUCE(buffer_temp,buffer_temp,nt_gbl,MPI_DOUBLE_PRECISION,MPI_SUM,&
root,COMM_local,ier)
END IF
DEALLOCATE(buffer_temp)
ier = 0
CALL MPI_BARRIER(COMM_local, ier)
RETURN
END SUBROUTINE TRANSMIT_2DDBL
Of course the root process needs to pass n1_gbl and n2_gbl as n1 and n2 respectively before entering. But it seems to improve performance for me.
Related
I've taken a project named with "Symbolic Linear Algebra" which is about doing basic operations on infinite matrices like addition, multiplication, accessing specific element etc. I will be implementing those on Julia.
For specifying those infinite matrices we'll have some mathematical cases like:
So the visual representation of matrix will be like:
For example let's say we want to find A + A' for this example. Here our cases change so we need to rewrite those cases to get desired output right ? I know Mathematica does this but how can I implement this? Yes, this was too general so let me ask some questions;
Let's start with taking cases as input. There can be many cases with different rules like if i % 2 == 0 or i == j like in this example how can I provide a generic input ?
Let's say that I'm done with input and I want to make those simple operations. How can I combine those cases in a programming language like Julia ?
I've wrote some non-generic dumb code to see how things will go so I will provide my code to apply minimum reproducible example but don't take it seriously, I think I'm just looking for a clue or a roadmap to get rid of the question marks in my head.
using Parameters
struct inf_matrix
mod_of :: Integer
mod_value :: Integer
i_coefficient :: Integer
j_coefficient :: Integer
value :: Integer
end
function single_demo(_mod_of :: Integer, _mod_value :: Integer, _i_coefficient :: Integer, _j_coefficient :: Integer, _value :: Integer)
test_matrix = inf_matrix(_mod_of, _mod_value, _i_coefficient, _j_coefficient, _value)
return test_matrix
end
function get_elem(st::inf_matrix ,i :: Integer, j :: Integer)
#This function is not completed yet
if (i % st.mod_of == st.mod_value) && (2 * st.i_coefficient == j)
return st.value;
else
return -1
end
end
demo_1 = single_demo(2, 0 ,1, 2, 1)
println(get_elem(demo_1, 1, 0))
Any help would be appreciated.
Here is how you could do this
import Base: getindex, +, *
abstract type InfiniteMatrix end
struct InfiniteIdentity <: InfiniteMatrix end
getindex(::InfiniteIdentity, i, j) = i .== j'
struct InfiniteConstant <: InfiniteMatrix
c
end
getindex(m::InfiniteConstant, i::Integer, j::Integer) = m.c
getindex(m::InfiniteConstant, i, j) = fill(m.c, size(i)..., size(j)...)
struct InfiniteMatrixFilter <: InfiniteMatrix
condition::Function
iftrue::InfiniteMatrix
iffalse::InfiniteMatrix
end
getindex(m::InfiniteMatrixFilter, i, j) = ifelse.(m.condition.(i,j'), m.iftrue[i,j], m.iffalse[i,j])
struct InfiniteMatrixFunction <: InfiniteMatrix
f::Function
args
end
getindex(m::InfiniteMatrixFunction, i, j) = m.f(getindex.(m.args, Ref(i), Ref(j))...)
+(m1::InfiniteMatrix, m2::InfiniteMatrix) = InfiniteMatrixFunction(+, (m1, m2))
*(n::Number, m::InfiniteMatrix) = InfiniteMatrixFunction(x -> n*x, (m,))
julia> i = InfiniteIdentity()
InfiniteIdentity()
julia> c1 = InfiniteConstant(1)
InfiniteConstant(1)
julia> (2i+3c1)[1:5, 1:5]
5×5 Array{Int64,2}:
5 3 3 3 3
3 5 3 3 3
3 3 5 3 3
3 3 3 5 3
3 3 3 3 5
julia> m = InfiniteMatrixFilter((i,j) -> i%2 == 0, c1, 0c1)
InfiniteMatrixFilter(var"#43#44"(), InfiniteConstant(1), InfiniteMatrixFunction(var"#41#42"{Int64}(0), (InfiniteConstant(1),)))
julia> m[1:5, 1:5]
5×5 Array{Int64,2}:
0 0 0 0 0
1 1 1 1 1
0 0 0 0 0
1 1 1 1 1
0 0 0 0 0
(this is only a proof of concept and it's not optimized or bugfree)
I was curious how quick and accurate, algorithm from Rosseta code ( https://rosettacode.org/wiki/Ackermann_function ) for (4,2) parameters, could be. But got StackOverflowError.
julia> using Memoize
#memoize ack3(m, n) =
m == 0 ? n + 1 :
n == 0 ? ack3(m-1, 1) :
ack3(m-1, ack3(m, n-1))
# WARNING! Next line has to calculate and print number with 19729 digits!
julia> ack3(4,2) # -> StackOverflowError
# has to be -> 2003529930406846464979072351560255750447825475569751419265016973710894059556311
# ...
# 4717124577965048175856395072895337539755822087777506072339445587895905719156733
EDIT:
Oscar Smith is right that trying ack3(4,2) is unrealistic. This is version translated from Rosseta's C++:
module Ackermann
function ackermann(m::UInt, n::UInt)
function ack(m::UInt, n::BigInt)
if m == 0
return n + 1
elseif m == 1
return n + 2
elseif m == 2
return 3 + 2 * n;
elseif m == 3
return 5 + 8 * (BigInt(2) ^ n - 1)
else
if n == 0
return ack(m - 1, BigInt(1))
else
return ack(m - 1, ack(m, n - 1))
end
end
end
return ack(m, BigInt(n))
end
end
julia> import Ackermann;Ackermann.ackermann(UInt(1),UInt(1));#time(a4_2 = Ackermann.ackermann(UInt(4),UInt(2)));t = "$a4_2"; println("len = $(length(t)) first_digits=$(t[1:20]) last digits=$(t[end-20:end])")
0.000041 seconds (57 allocations: 33.344 KiB)
len = 19729 first_digits=20035299304068464649 last digits=445587895905719156733
Julia itself does not have an internal limit to the stack size, but your operating system does. The exact limits here (and how to change them) will be system dependent. On my Mac (and I assume other POSIX-y systems), I can check and change the stack size of programs that get called by my shell with ulimit:
$ ulimit -s
8192
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0 # a simpler recursive function
f (generic function with 1 method)
julia> f(523918)
0
julia> f(523919)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
$ ulimit -s 16384
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0
f (generic function with 1 method)
julia> f(1048206)
0
julia> f(1048207)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
I believe the exact number of recursive calls that will fit on your stack will depend upon both your system and the complexity of the function itself (that is, how much each recursive call needs to store on the stack). This is the bare minimum. I have no idea how big you'd need to make the stack limit in order to compute that Ackermann function.
Note that I doubled the stack size and it more than doubled the number of recursive calls — this is because of a constant overhead:
julia> log2(523918)
18.998981503278365
julia> 2^19 - 523918
370
julia> log2(1048206)
19.99949084151746
julia> 2^20 - 1048206
370
Just fyi, even if you change the max recursion depth, you won't get the right answer as Julia uses 64 bit integers, so integer overflow with make stuff not work. To get the right answer, you will have to use big ints to have any hope. The next problem is that you probably don't want to memoize, as almost all of the computations are not repeated, and you will be computing the function more than 10^19729 different inputs, which you really do not want to store.
Here is a code for the fibonnaci sequence taken from rosettacode.com
FIBNCI: MOV C, A ; C will store the counter
DCR C ; decrement, because we know f(1) already
MVI A, 1
MVI B, 0
LOOP: MOV D, A
ADD B ; A := A + B
MOV B, D
DCR C
JNZ LOOP ; jump if not zero
RET ; return from subroutine
If the value taken from A is originally 0 and we decrement C, does C become -1? if so what happens to that value at the 2nd DCR; and what does the the JNZ instruction see or do?
This is my first contact with assembly language so its a bit confusing at the moment. I'm thinking that if C is already -1 and counting when we reach the JNZ instruction, wouldn't this code be stuck in a loop? Or is the JNZ looking elsewhere?
Intel 8080 registers A, B, C, ... are 8 bit.
So if A was 0, then C becomes -1, which is encoded in 8 bits as 0b11111111 (all eight bits set to 1). When you treat that as unsigned 8 bit value, it's equal to 255.
Now if you would increment that value by 1, it would become 256, which in binary is 0b100000000 -> as C is 8 bit wide, that value would be truncated to 0b00000000, which is 0. So -1 + 1 = 0, as expected (and 255 + 1 = 0 in unsigned Math, because you hit the 8 bit limit, so the value "overflows").
The second DCR will decrease that -1/255 value, the C will then contain -2 (which equals to 254 unsigned, as 255 - 1 = 254, and in binary looks like 0b11111110).
JNZ will loop till zero, so that means the loop will run 255 times (for A=0 argument, for A=1 it will run 256 times), until the C does reach again zero from the 255 (meanwhile the A and B registers containing F(n-2) and F(n-1) will overflow many times, thus rendering the result unusable ... the last correct result is for A=13 being 233 I think (too lazy to verify))
For the start check for input < 2:
FIBNCI: CPI 2 ;return if A < 2
RC ;F(0) = 0, F(1) = 1
The rest of the code seems to be OK. F(2) = 1, F(3) = 2, F(4) = 3, ...
You could modify the code to use double add (DAD) to get a 16 bit result. The largest input for 8 bit result is decimal 13: fib(13) = 233. The largest input for 16 bit result is 24: F(24) = 46368.
This is a follow-up question to my previous Fortran question.
I have a working Fortran program that has a subroutine that filters an array. This is the program:
program test
integer, parameter :: n = 3
integer, parameter :: m = 4
double precision, dimension(n,m) :: A
double precision, dimension(:,:), allocatable :: B
A(1,:) = [11, 22, 43, 55]
A(2,:) = [15, 56, 65, 63]
A(3,:) = [54, 56, 32, 78]
print*, 'A :'
print*, int(A)
CALL extractB(A, B)
print*, 'B'
print*, int(B)
contains
subroutine extractB(A, B)
implicit none
double precision, dimension(:,:), intent(in) :: A
double precision, dimension(:,:), allocatable :: B
integer :: nrowB, i, pos
nrowB = count( A(:,2)==56)
allocate( B(nrowB, size(A,2)-1 ) )
pos = 1
do i = 1, size(A,1)
if(A(i,2)==56)then
B(pos,1) = A(i,1)
B(pos,2:) = A(i,3:)
pos = pos+1
end if
end do
end subroutine extractB
end program
The program compiles, runs, and it does what it has to do very well.
I want to call the extractB subroutine with R. I have asked similar questions and found was able to make them work, but this one is somehow different and not working.
My fortran subrutine is in the mytest.f90 file and has this code:
subroutine extractB(A, B)
implicit none
double precision, dimension(:,:), intent(in) :: A
double precision, dimension(:,:), allocatable :: B
integer :: nrowB, i, pos
nrowB = count( A(:,2)==56)
allocate( B(nrowB, size(A,2)-1 ) )
pos = 1
do i = 1, size(A,1)
if(A(i,2)==56)then
B(pos,1) = A(i,1)
B(pos,2:) = A(i,3:)
pos = pos+1
end if
end do
end subroutine extractB
I compile it in R and load the library with these commands:
system("R CMD SHLIB ./Fortran/mytest.f90")
dyn.load("./Fortran/mytest.so")
Then, in R, i create and pass a data frame to the subroutine
A = data.frame(c(11,15,54), c(22,56,56), c(43,65,32), c(55,63,78))
X<-.Fortran("extractB", A = unlist(A), B = numeric(6))
After that R crashes
*** caught segfault ***
address (nil), cause 'unknown'
Traceback:
1: .Fortran("extractB", A = unlist(A), B = numeric(6))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
If I change the subroutine by removing setting the dimensions by hand:
subroutine extract(A, B)
implicit none
double precision, dimension(3,4), intent(in) :: A
double precision, dimension(2,3) :: B
integer :: i, pos
pos = 1
do i = 1, size(A,1)
if(A(i,2)==56)then
B(pos,1) = A(i,1)
B(pos,2:) = A(i,3:)
pos = pos+1
end if
end do
end subroutine extract
recompile the library, and reload it. I can run
X<-.Fortran("extract", A = unlist(A), B = numeric(6))
dim(X$A) <- dim(A)
dim(X$B) <- c(2,3)
and get what I want
> X
$A
[,1] [,2] [,3] [,4]
[1,] 11 22 43 55
[2,] 15 56 65 63
[3,] 54 56 32 78
$B
[,1] [,2] [,3]
[1,] 15 65 63
[2,] 54 32 78
Any way of fixing this?
Thanks a lot for the help!
Though, as Vladimir F pointed out, this will not work with allocatables like this, you can allocate memory, that can be used by another language in Fortran. But you should use the pointer attribute in that case. However, you will not be able to make use of assumed shape arrays either. Instead of inferring the size of A, you should pass it in explicitly.
If you are willing to change the interface for this, you can get your functionality with something along these lines, (see also M. S. B.s answer to a related question):
subroutine extractB(A_ptr, lenX, lenY, B_ptr, nrowB) bind(c,name=extractB)
use iso_c_binding
implicit none
type(c_ptr), value :: A_ptr
integer(kind=c_int), value :: lenX, lenY
type(c_ptr) :: B_ptr
integer(kind=c_int) :: nrowB
real(kind=c_double), pointer :: A(:,:)
real(kind=c_double), pointer :: B(:,:)
integer :: i, pos
c_f_pointer(A_ptr, A, [lenX, lenY])
nrowB = count( A(:,2)==56)
allocate( B(nrowB, size(A,2)-1 ) )
!...
B_ptr = c_loc(B)
end subroutine extractB
I've got a matrix calculator program, but I'm getting the wrong answer for my dot product multiplier.
Here's my multiply subroutine:
subroutine multiply(m1,m2,res,row1,row2,col1,col2)
integer, intent(in) :: row1,row2,col1,col2
real, intent(in), dimension(row1,col1) :: m1
real, intent(in), dimension(row2,col2) :: m2
real, intent(out), dimension(row1,col2) :: res
integer :: i,j,k
do i = 1, col2
do j = 1, col1
res(j, i) = 0
enddo
do j = 1, col1
do k = 1, row1
res(k, i) = res(k, i) + m1(k, j)*m2(j, i)
enddo
enddo
enddo
And here's my output, just in case that's the problem.
subroutine output(r,c,matrix,name)
integer, intent(in) :: r
integer, intent(in):: c
character(len=10) :: name
real, intent(out), dimension(3,3) :: matrix
integer i,j
print *,name
do i = 1, r
write(*,"(100F6.1)") ( matrix(i,j), j=1,c )
enddo
end subroutine
If it helps, it works excellently for a 3x3 matrix, but not for two rectangular matrix. Here's what happens when I do a 2x3 * 3x2 matrix. At this point, i'm desperate for help. Anything you could suggest would be great. Thanks!
EDIT: Here's all of my code in its current state with your suggestions.
PROGRAM G6P5
integer :: r1,r2,c1,c2,i,j,k,s
real :: input
real, dimension (3,3) :: mat1, mat2, rmat
write (*,*) 'Please make a selection:'
write (*,*) 'Enter 1 to add matrices'
write (*,*) 'Enter 2 to subtract matrices'
write (*,*) 'Enter 3 to multiply matrices'
write (*,*) 'Enter 4 to transpose a matrix'
write (*,*) 'Enter 5 to quit'
read *, s
select case (s)
case (1)
print *, 'Enter # of rows & columns (1-10) (ex. 3 3 = 3x3)'
read *, r1,c1
print *, 'Matrix 1:'
call fillmatrix(r1,c1,mat1)
r2 = r1
c2 = c1
print *, 'Matrix 2:'
call fillmatrix(r2,c2,mat2)
call output(r1,c1,mat1,'Matrix 1: ')
call output(r2,c2,mat2,'Matrix 2: ')
rmat = mat1+mat2
call output(r1,c1,rmat,'Sum: ')
case (2)
print *, 'Enter # of rows & columns (1-10) (ex. 3 3 = 3x3)'
read *, r1,c1
print *, 'Matrix 1:'
call fillmatrix(r1,c1,mat1)
r2 = r1
c2 = c1
print *, 'Matrix 2:'
call fillmatrix(r2,c2,mat2)
rmat = mat1-mat2
call output(r1,c1,mat1,'Matrix 1: ')
call output(r2,c2,mat2,'Matrix 2: ')
call output(r1,c1,rmat,'Sum: ')
case (3)
print *, 'Enter # of rows & columns for matrix 1'
print *, '(1 through 10, ex: 3 3 = 3x3)'
read *, r1,c1
print *, 'Matrix 1:'
call fillmatrix(r1,c1,mat1)
print *, 'Enter # of rows & columns for matrix 2'
print *, '(1 through 10, ex: 3 3 = 3x3)'
read *, r2,c2
print *, 'Matrix 2:'
call fillmatrix(r2,c2,mat2)
if (c1.eq.r2) then
call multiply(mat1,mat2,rmat,r1,r2,c1,c2)
call output(r1,c1,mat1,'Matrix 1: ')
call output(r2,c2,mat2,'Matrix 2: ')
call output(r1,c2,rmat,'Product: ')
end if
case (4)
print *, 'Enter # of rows & columns for matrix 1'
print *, '(1 through 10, ex: 3 3 = 3x3)'
read *, r1,c1
print *, 'Matrix 1:'
call fillmatrix(r1,c1,mat1)
call transpose(mat1,rmat,r1,c1)
call output(r1,c1,rmat,'Transpose:')
case (5)
print *,'5'
case default
print *,'default'
end select
! call fillmatrix(rows,columns,mat1)
! write (*,*) matrix1
END PROGRAM
subroutine fillmatrix(r,c,matrix)
integer, intent(in) :: r
integer, intent(in):: c
real, intent(out), dimension(3,3) :: matrix
integer i,j
do i=1,r
do j = 1,c
write (*,'(A,I2,A,I2,A)') 'Enter value (',i,',',j,').'
read*, matrix(i,j)
enddo
enddo
end subroutine
subroutine multiply(m1,m2,res,row1,row2,col1,col2)
integer, intent(in) :: row1,row2,col1,col2
real, intent(in), dimension(row1,col1) :: m1
real, intent(in), dimension(row2,col2) :: m2
real, intent(out), dimension(row1,col2) :: res
integer :: i,j,k
res = 0
do i = 1, row1
do j = 1, col2
do k = 1, col1 ! col1 must equal row2
res(i, j) = res(i, j) + m1(i, k)*m2(k, j)
enddo ! k
enddo ! j
enddo ! i
end subroutine
subroutine transpose(m1,res,row,col)
integer, intent(in) :: row,col
real, intent(in), dimension(row,col) :: m1
real, intent(out), dimension(row,col) :: res
integer :: i,j,k
do i = 1,col
do j = 1,row
res(i,j) = m1(j,i)
enddo
enddo
end subroutine
subroutine output(r,c,matrix,name)
integer, intent(in) :: r
integer, intent(in):: c
character(len=10) :: name
real, intent(in), dimension(r,c) :: matrix
integer i,j
print *,name
do i = 1, r
write(*,"(100F6.1)") ( matrix(i,j), j=1,c )
enddo
end subroutine
You seem to have the indexing a little confused. Try this.
res = 0
do i = 1, col2
do j = 1, row1
do k = 1, col1
res(j, i) = res(j, i) + m1(j, k)*m2(k, i)
enddo
enddo
enddo
I also notice that in your output routine you have the line
real, intent(out), dimension(3,3) :: matrix
If you are sending matrix into this routine, it should be intent(in). Also, if you are printing a 2x2 matrix, then the dimension(3,3) is also incorrect. You should change this line to
real, intent(in) :: matrix(r,c)
One last thing you might consider. Your matrix is always 3x3, but you are not always using all the elements. That is why you pass the number of rows and columns to the subroutines. The issue here is that the actual size of the matrix needs to match these numbers. To do this, you need to use the slice notation.
Instead of
call sub(2,2,mat)
use
call sub(2,2,mat(1:2,1:2))
This is because the first method is essentially equivalent to
call sub(2,2,mat(1:3,1:3))
Which will result in a mismatch between what you are passing to the subroutine and what the subroutine expects. This can cause funny things to happen, as you saw.