Is MPI_Init equivalent to MPI_Init_thread with desired = MPI_THREAD_SINGLE?
PS. There are plenty of questions on MPI_Init vs MPI_Init_thread here (e.g., this) but they don't mention this.
The standard states this explicitly:
A call to MPI_INIT has the same effect as a call to MPI_INIT_THREAD
with a required = MPI_THREAD_SINGLE.
Related
I want to query the size of an OpenCL kernel argument so that I can ensure that I send it a variable of the correct size. I am able to query lots of other properties of each kernel argument using clGetKernelArgInfo, as follows:
clGetKernelArgInfo(k, argc, CL_KERNEL_ARG_TYPE_NAME, sizeof(argType), &argType, &retSize);
This will tell me the string name of the type, for example. But that's not good enough, especially in complex cases where it's a struct and the string name is the same on host and device, but the packing is different, so the size is different. The things that I can query, according to https://man.opencl.org/clGetKernelArgInfo.html , are:
CL_KERNEL_ARG_ADDRESS_QUALIFIER
CL_KERNEL_ARG_ACCESS_QUALIFIER
CL_KERNEL_ARG_TYPE_NAME
CL_KERNEL_ARG_TYPE_QUALIFIER
CL_KERNEL_ARG_NAME
Any ideas?
FYI, this is NOT a duplicate of Get OpenCL Kernel-argument information because that is asking how to use the argument query function, not asking how to query the argument size.
There's no standard way to check before setting the argument as far as I'm aware, but the clSetKernelArg call will return CL_INVALID_ARG_SIZE if the sizes don't match properly, so that should allow you to detect and handle errors accordingly:
CL_INVALID_ARG_SIZE if arg_size does not match the size of the data type for an argument that is not a memory object or if the argument is a memory object and arg_size != sizeof(cl_mem) or if arg_size is zero and the argument is declared with the __local qualifier or if the argument is a sampler and arg_size != sizeof(cl_sampler).
This question already has an answer here:
I cannot understand the CUDA documentation in order to use math.h functions in CUDA kernels
(1 answer)
Closed 3 years ago.
I'm writing a memory-heavy CUDA computation program. I need to use mathematical functions, like the ones in math.h within my kernel. So I did some research and stumbled upon "cuda_fp16.h", which is supposed to add a lot of mathematical functions to use on the device. However, if I want to use one of those math functions (e.g. cos(i) which is part of this library), upon compilation, it tells me that I cannot run a __host__ function on the device. Its clear to me that this is impossible, but the cuda_fp16.h library should exactly add __device__ functions for math. Within the "cuda_fp16.h", there are errors saying that the type __half is not defined.
I have looked at the definition of the cos() that I was using, and it leads me to something within math.h. So my guess is that it just takes the function from there instead of cuda_fp16.h
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "cuda_fp16.h"
__global__ void computation(double x, double y) //function that should upon being called compute the cosine of y.
{
x = cos(y);
}
This is a very simple example of what I am trying to do; just to get the kernel to compute some kind of mathematical function of a value.
I expect the whole thing to be able to compile, since I included the library that would allow such a function to be computed by a __device__ function. However it does not compile, and tells me that I can not call the __host__ function cos on the device.
I have found the problem. In the code itself, I had an int instead of a double as an argument for the function. If the argument for cos() is an int, then it uses the <math.h> version of the function instead of the CUDA one. The CUDA one gets called with a float and double. So the code I posted as an example is how it actually should work, I just hadn't realised that I had given an integer as an argument instead of the actual wanted double.
This may be a specific question, but I think it pertains to how memory is handled with these two compilers (Compaq visual Fortran Optimizing Compiler Version 6.5 and minGW). I am trying to get an idea of best practices with using pointers in Fortran 90 (which I must use). Here is an example code, which should work "out of the box" with one warning from a gfortran compiler: "POINTER valued function appears on RHS of assignment", and no warnings from the other compiler.
module vectorField_mod
implicit none
type vecField1D
private
real(8),dimension(:),pointer :: x
logical :: TFx = .false.
end type
contains
subroutine setX(this,x)
implicit none
type(vecField1D),intent(inout) :: this
real(8),dimension(:),target :: x
logical,save :: first_entry = .true.
if (first_entry) nullify(this%x); first_entry = .false.
if (associated(this%x)) deallocate(this%x)
allocate(this%x(size(x)))
this%x = x
this%TFx = .true.
end subroutine
function getX(this) result(res)
implicit none
real(8),dimension(:),pointer :: res
type(vecField1D),intent(in) :: this
logical,save :: first_entry = .true.
if (first_entry) nullify(res); first_entry = .false.
if (associated(res)) deallocate(res)
allocate(res(size(this%x)))
if (this%TFx) then
res = this%x
endif
end function
end module
program test
use vectorField_mod
implicit none
integer,parameter :: Nx = 15000
integer :: i
real(8),dimension(Nx) :: f
type(vecField1D) :: f1
do i=1,10**4
f = i
call setX(f1,f)
f = getX(f1)
call setX(f1,f)
if (mod(i,5000).eq.1) then
write(*,*) 'i = ',i,f(1)
endif
enddo
end program
This program runs in both compilers. However, changing the loop from 10**4 to 10**5 causes a serious memory problem with gfortran.
Using CTR-ALT-DLT, and opening "performance", the physical memory increases rapidly when running in gfortran, and doesn't seem to move for the compaq compiler. I usually cancel before my computer crashes, so I'm not sure of the behavior after it reaches the maximum.
This doesn't seem to be the appropriate way to use pointers (which I need in derived data types). So my question is: how can I safely use pointers while maintaining the same sort of interface and functionality?
p.s. I know that the main program does not seem to do anything constructive, but the point is that I don't think that the loop should be limited by the memory, but rather it should be a function of run-time.
Any help is greatly appreciated.
This code has a few problems, perhaps caused by misunderstandings around the language. These problems have nothing to do with the specific compiler - the code itself is broken.
Conceptually, note that:
There is one and only one instance of a saved variable in a procedure per program in Fortran 90.
The variable representing the function result always starts off undefined each time a function is called.
If you want a pointer in a calling scope to point at the result of a function with a pointer result, then you must use pointer assignment.
If a pointer is allocated, you need to have a matching deallocate.
There is a latent logic error, in that the saved first_entry variables in the getX and setX procedures are conflated with object specific state in the setX procedure and procedure instance specific state in the getX procedure.
The first time setX is ever called the x pointer component of the particular this object will be nullified due to the if statement (there's an issue of poor style there too - be careful having multiple statements after an if statement - it is only the first one that is subject to the conditional!). If setX is then called again with a different this, first_entry will have been set to false and the this object will not be correctly set-up. I suspect you are supposed to be testing this%TFX instead.
Similarly, the first time getX is called the otherwise undefined function result variable res will be nullified. However, in all subsequent calls the function result will not be nullified (the function result starts off undefined each execution of the function) and will then be erroneously used in an associated test and also perhaps erroneously in a deallocate statement. (It is illegal to call associated (or deallocate for that matter) on a pointer with an undefined association status - noting that an undefined association status is not the same thing as dissociated.)
getX returns a pointer result - one that is created by the pointer being allocated. This pointer is then lost because "normal" assignment is used to access the value that results from evaluating the function. Because this pointer is lost there can't be (and so there isn't...) a matching deallocate statement to reverse the pointer allocation. The program therefore leaks memory. What almost certainly should be happening is that the thing that captures the value of the getX function in the main program (f in this case, but f is used for multiple things, so I'll call it f_ptr...) itself should be a pointer, and it should be pointer assigned - f_ptr => getX(f1). After the value of f_ptr has been used in the subsequent setX call and write statement, it can then be explicitly deallocated.
The potential for accidental use of normal assignment when pointer assignment is intended is one of the reasons that use of functions with pointer results is discouraged. If you need to return a pointer - then use a subroutine.
Fortran 95 simplifies management of pointer components by allowing default initialization of those components to NULL. (Note that you are using default initialization in your type definition - so your code isn't Fortran 90 anyway!)
Fortran 2003 (or Fortran 95 + the allocatable TR - which is a language level supported by most maintained compilers) introduces allocatable function results - which remove many of the potential errors that can otherwise be made using pointer functions.
Fortran 95 + allocatable TR support is so ubiquitous these days and the language improvements and fixes made to that point are so useful that (unless you are operating on some sort of obscure platform) limiting the language level to Fortran 90 is frankly ridiculous.
The foreign function interface allows haskell to work with C world. Now Haskell side allows working with the pointers using Storable instances. So for example If I have an array of integers in the C world, a plausible representation of that in the haskell world would be Ptr Int. Now suppose I want to translate the C expression a[0] = a[0] + 1. The only way to do that on the haskell side is to peek int out and then poke back the result of the addition. The problem with this approach is a temporary value is created as a result of that. (I am not sure an optimizing compiler can always avoid doing that)
Now most people might think this effect to be harmless, but think of a situation where the Pointer object contains some sensitive data. I have created this pointer on the c side in such a way that it always guaranteed that its content will never be swapped out of the memory (using mlock system call). Now peeking the result on the haskell side no more guarantees the security of the sensitive data.
So what should be the best way to avoid that in the haskell world? Has anybody else ran into similar problems with low level pointer manipulations in haskell.
I just built a test case with the code:
foo :: Ptr CInt -> IO ()
foo p = peek p >>= poke p ∘ (+1)
And using GHC 7.6.3 -fllvm -O2 -ddump-asm I see the relevant instructions:
0x0000000000000061 <+33>: mov 0x7(%r14),%rax
0x0000000000000065 <+37>: incl (%rax)
So it loads an address into rax and increments the memory at that address. Seems to be what you'd get in other languages, but let's see.
With C, I think the fair comparison is:
void foo(int *p)
{
p[0]++;
}
Which results in:
0x0000000000000000 <+0>: addl $0x1,(%rdi)
All this said, I freely admit that it is not clear to me what you are concerned about so I might have missed your point and in doing so addressed the wrong thing.
I am somewhat puzzled by the following program
module test
implicit none
type TestType
integer :: i
end type
contains
subroutine foo(test)
type (TestType), intent(out) :: test
test%i = 5
end subroutine
subroutine bar(test)
type (TestType), intent(out) :: test
test%i = 6
end subroutine
end module
program hello
use test
type(TestType) :: t
call foo(t)
print *, t%i
call bar(t)
print *, t%i
end program hello
and its derivatives. More on those later. As we know, Fortran transfers routine arguments as a pass-by-reference, meaning that the entity emerging at the dummy argument test for both foo and bar is the same memory space granted on the stack in program hello. So far so good.
Suppose I define in program hello the type(TestType) :: t as a pointer, and allocate it.
program hello
use test
type(TestType), pointer :: t
allocate(t)
call foo(t)
print *, t%i
call bar(t)
print *, t%i
deallocate(t)
end program hello
The code works as before, the only difference being that the object was not allocated on the stack, but on the heap.
Now assume to go back to the stack-allocated program and that subroutine bar is instead defined as
subroutine bar(test)
type (TestType), pointer :: test
test%i = 6
end subroutine
The program does not compile anymore because you must use the heap-allocated version to make it work, or to be more accurate it is mandatory to pass a pointer to the routine when the routine is defined to accept a pointer as a dummy argument. On the other hand, if the dummy argument does not contain the pointer keyword, the routine would accept both pointers and non-pointers.
This makes me wonder... what's the point of declaring a dummy argument a pointer ?
Reposted from comp.lang.fortran, an answer by Tobias Burns:
Now assume to go back to the stack-allocated program and that
subroutine bar is instead defined as
subroutine bar(test)
type (TestType), pointer :: test
test%i = 6
end subroutine
The program does not compile anymore because you must use the
heap-allocated version to make it work,
That's not quite correct: You can also not pass an ALLOCATABLE variable
to a dummy with POINTER attribute. I think one (practical) reason is
that the pointer address can escape and you would thus cause alias
problems. A formal reason is that an ALLOCATABLE is simply not a
POINTER; additionally, the standard does not talk about heap vs. stack
vs. static memory. And in fact, local arrays [with constant bounds] will
often be created in static memory and not on the stack (unless you use
OpenMP or the RECURSIVE attribute). Thus, your "stack" example could
also be a "static memory" example, depending on the compiler and the
used options.
or to be more accurate it is
mandatory to pass a pointer to the routine when the routine is defined
to accept a pointer as a dummy argument.
That's also not completely true. In Fortran 2008 you can pass a
non-POINTER, which has the TARGET attribute, to a pointer dummy which
has the INTENT(IN) attribute. (Pointer intent is relative to the pointer
association status; for non-pointer dummies the intents are about the
value stored in the variable.)
This makes me wonder... what's the point of declaring a dummy argument
a pointer ?
Well, if the argument has the POINTER attribute, you can allocate and
free the pointer target, you can associate the pointer with some target
etc. Up to Fortran 95 it was not possible to have ALLOCATABLE dummy
arguments thus a pointer had to be used if a (dummy) argument had to be
allocated in a procedure.
If you can, you should try to use rather ALLOCATABLEs than POINTERs -
they are easier to use, do not leak memory and have pose no
alias-analysis problems to the compiler. On the other hand, if you want
to create, e.g., a linked list, you need a pointer. (Though, for a heap
usage, also Fortran 2008's allocatable components could be used.*)
*I mean:
type t
type(t), allocatable :: next
end type
where the component is of the same type as the type being defined;
before F2008 this was only allowed for pointers but not for allocatables.
and by R. Maine
As we know, Fortran transfers routine arguments as a
pass-by-reference,
We apparently know incorectly, then. The standard never specifies that
and, indeed goes quite a lot out of its way to avoid such specification.
Although yours is a common misconception, it was not strictly accurate
even in most older compilers, particularly with optimization turned on.
A strict pass-by-reference would kill many common optimizations.
With recent standards, pass-by-reference is all but disallowed in some
cases. The standard doesn't use those words in its normative text, but
there are things that would be impractical to implement with
pass-by-reference.
When you start getting into things like pointers, the error of assuming
that everything is pass-by-reference will start making itself more
evident than before. You'll have to drop that misconception or many
things wil confuse you.
I think other people have answered the rest of the post adequately. Some
also addressed the above point, but I wanted to emphasize it.
Hope this answers your question.