What are the differences between the MPI_* functions and PMPI_* functions?

What are the differences between the MPI_* functions and PMPI_* functions? - mpi

I'm looking at the mpi.h header, and I'm confused about the PMPI_Init function. It's placed right after the MPI_Init declaration, and it looks exactly the same. However, Msmpi.dll (for instance) doesn't have the MPI_Init export, only the PMPI_Init.
What are these PMPI_ functions?

You are looking at the MPI profiling interface. For each MPI function, there is also a similar PMPI function, which just differs by the prefix.
As a user, you should only call the MPI version and just ignore the PMPI version.
This is a mechanism that allows tool developers to intercept calls to the MPI functions and call the PMPI versions internally. Usually this is implemented such that all functions are implemented as PMPI functions and with MPI functions as weak symbols pointing to them. The tool can then replace the weak symbols with their own wrapper implementations and still call the PMPI functions internally.
// Normal case
user --calls--> libmpi:MPI_Init --redicrects to--> libmpi:PMPI_Init (implementation)
// Tool case
user --calls--> libtool:MPI_Init (does tool things) --calls--> libmpi:PMPI_Init (implementation)
You can find more information in Section 14.2 of the MPI standard. In general, I highly recommend to look in the standard for function signatures and such instead of the header.

The PMPI_ entry points are part of the MPI Profiling Interface.
These symbols by default simply refer to their MPI_ function namesakes, but by having them defined as part of the API, they make it easy for tools to insert themselves around MPI calls to do simple performance profiling or tracing. There are lots of examples of how they work and how to use them.
Most profiling tools for MPI codes make use of this to do things like time MPI communication routines, count the number of messages being sent/received with particular sizes, etc., without having to modify the user code; you just have to link in the profiling library.
The Profiling interface doesn't have to be used strictly for profiling, of course - there have been projects that have used the profiling interface for communications correctness checking (making sure sends and receives were matched), simple heuristic deadlock testing, etc.
The profiling interface was the only standard tools interface to the MPI library for some time, but there is now also a richer Tools interface.

Related

What are some popular frameworks available for implementing CQRS, Event Sourcing and Saga for data consistency and distributed transactions?

I would like to know some popular frameworks that are available for implementing CQRS, ES, Saga in the application.
As a part of my research, I have to compare these frameworks and evaluate them based on various -ilities.

I have to compare these [event-sourcing] frameworks and evaluate them based on various -ilities.
The premise of the question is that you need a framework to implement event sourcing but, in fact, you do not.
Greg Young, one of the most influential proponents of event sourcing, frequently expresses his misgivings about frameworks. See, for instance, his QCon London 2013 keynote, esp. mark 9'.
Event sourcing is conceptually simple and doesn't need the kind of magic that frameworks typically bring with them. For instance, rebuilding the state from a stream of events simply consists in a left fold over the stream in question. Moreover, you don't necessarily need a specialised database; I know people who have successfully implemented event sourcing by simply appending events to a file.
If your research aims at comparing event-sourcing frameworks, I would argue that you should consider the case where no framework is used at all.

Axon is a popular framework/server for building CQRS/ES applications.
EventStoreDB is a popular EventStore database for the EventSourcing part.
A simple starting point if you want to write your own framework/library is to check out some of the code I co-authored at https://www.cqrs.nu/

If you are looking for a managed solution, you can also check out what we at Serialized provide.

In addition to Axon, on the JVM there's also the Akka ecosystem (the cluster sharding, persistence, sharded daemon process, and projection modules are the most relevant to CQRS/ES/DDD). One benefit of Akka Persistence is the ability to choose from a variety of datastores to use as an event store (JDBC SQL databases and Cassandra are the most common, but there are many more supported). My experience with it has been that it is capable of exceptionally high availability and since it allows a stateful event-sourced application to be deployed as if it's stateless (e.g. in Kubernetes without needing an operator) there's a lot of deployment flexibility. Note that because it's built on the actor model, a lot of JVM observability tooling doesn't work particularly well with it (often assuming a stronger mapping of threads to tasks), so certain commercially-licensed observability tooling is recommended.
Additionally, Kalix also provides a polyglot (all you need is to express domain logic in a language which supports grpc) event-sourcing implementation.
Disclaimer: since answering this question (almost a year after answering this question), I became employed by Lightbend, the maintainers of Akka and provider of Kalix.

Is Clojure's core.async similar to Jane Street's OCaml Core Async?

In this blog post the author writes:
However, Grenchman is built on the Core and Async libraries from Jane Street, one of the largest industrial users of OCaml. Async allows for monadic faux-concurrency that avoids a lot of the callback headaches of other event-driven tools, but it is fairly monolithic.
On the Jane Street Documentation Page for Core Async they describe it as:
In particular, we think that Async does a better of controlling the concurrency of your program, making it easier to reason about possible race conditions.
My question is - are there similarities between core.async in Clojure and Core Async in OCaml? I ask because the 'faux concurrency to avoid callback headaches' sounds quite similar to the application of core.async in Clojure.

I cannot detect major similarities. The concept of Clojure's core.async seems to be mostly based upon Go's concurrency model - many of the names are the same, like channels for communication and even the go macro for executing code asynchronously, like Go's keyword that the language itself is named for.
The concept of Jane Street's Async on the other hand is summarized in this sentence from the introductory documentation:
In a nutshell, the idea is to use non-preemptive user-level threads
and first-class blocking operations with blocking expressed in the
type system.
It uses the special type Deferred.t to communicate results of asynchronous computations which is more similar to Clojure futures than to channels. It also completely eschews OS threads and uses user threads insteads, whereas core.async does make use of OS threads (at least if they're available).
Edit: Upon some further investigations, there is a clear similarity in that both libraries have a focus on providing means for combining multiple blocking operations without tying up OS threads. And Async does also provide (besides Deferred.t) channels through the Pipe module.

Byte code instrumentation - implement native or java agent?

If I want to realize a profiler using byte code instrumentation, should I write a native agent using JVMTI or should I write a java agent using the java.lang.instrument package?
If I want to use libraries like ASM - which seems to be mandatory if you want to create a serious profiler - I have to use a java agent. Which confuses me, since I thought a native agent can do everything what a java agent can do and more. But to me, it seems easier writing a java agent.
Are there alternatives? Should one use java agent and native agent combined anyway?

Nearly everyone writes a java agent (with ASM or BCEL) as they don't want to have to write a C/C++ bytecode instrumentor from scratch as there are none publicly available.
What you won't be able to do is instrument and profile/monitor the primordial JVM, and accessing native functions requires JNI calls. There are also several JVMTI calls that may be unavailable to you (if memory serves).
I wrote my own instrumentor in C several years ago, and I'm in the process of writing a new one which I hope to open source ( depending on my evil overlords :-) )
How about a half way house, a separate pre-started JVM that your native agent sends bytecode to. In that JVM your easy-peasy to write ASM based instrumentor does the hard work and sends the resulting bytecode back to the native agent over the wire. Yeah it seems a bit over-complicated but it's easier that writing your own BCI library.

Disabling ALL asynchronous execution in CUDA programs

According to the CUDA programming guide, you can disable asynchronous kernel launches at run time by setting an environment variable (CUDA_LAUNCH_BLOCKING=1).
This is a helpful tool for debugging. I also want to determine the benefit in my code from using concurrent kernels and transfers.
I want to also disable other concurrent calls, in particular cudaMemcpyAsync.
Does CUDA_LAUNCH_BLOCKING affect these kinds of calls in addition to kernel launches? I suspect not. What would be the best alternative? I can add cudaStreamSynchronize calls, but I would prefer a run time solution. I can run in the debugger, but that will affect the timing and defeat the purpose.

Setting CUDA_LAUNCH_BLOCKING won't effect the streams API at all. If you add some debug code to force all your streams code to use stream 0, all the calls other than kernel calls will revert to synchronous behaviour.

Best Publish/Subscribe "Middleware" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm in the market for a good open source network based Pub/Sub (observer pattern) library. I haven't found any I like:
JMS - tied to Java, treats message contents as dumb binary blobs
NDDS - $$, use of IDL
CORBA/ICE - Pub/Sub is built on-top of RPC, CORBA API is non-intuitive
JBOSS/ESB - not too familiar with
It would be nice if such a package could to the following:
Network based
Aware of payload data, users should not have to worry about endian/serialization issues
Multiple language support (C++, ruby, Java, python would be nice)
No auto-generated code (no IDLs!)
Intuitive subscription/topic management
For fun, I've created my own. Thoughts?

You might want to look into RabbitMQ.

As pointed-out by an earlier post in this thread, one of your options is OpenSplice DDS which is an Open Source implementation of the OMG DDS Standard (the same standard implemented by NDDS).
The main advantages of OpenSplice DDS over the other middleware you are considering can be summarized as:
Performance
Rich support for QoS (Persistence, Fault-Tolerance, Timeliness, etc.)
Data Centricity (e.g. possibility of querying and filtering data streams)
Something that I'd like to understand is what are your issues with IDL. DDS uses IDL as language-independent way of specifying user data types. However DDS is not limited to IDL, you could be using XML, if you prefer. The advantage of specifying your data types, and decoupling their representation from a specific programming language, is that the middleware can:
(1) take away from you the burden of serializing data,
(2) generate very time/space efficient serialization,
(3) ensure end-to-end type safety,
(4) allow content filtering on the whole data type (not just the header like in JMS), and
(5) enable on-the wire interoperability across programming languages (e.g. Java, C/C++, C#, etc.)
Depending on the system or application you are designing, some of the properties above might not be useful/relevant. In that case, you can simply generate one, a few, "DDS Type" which is the holder of you serialized data.
If you think about JMS, it provides you with 5 different topic types you can use to send your data. With DDS you can do the same, but you have the flexibility to define exactly the topic types.
Finally, you might want to check out this blog entry on Scala and DDS for a longer discussion on why types and static-typing are good especially in distributed systems.
-AC

We use the RTI DDS implementation. It costs $$, but it supports many quality of service parameters.
There is a free DDS implementation called OpenDDS, but I've not used it.
I don't see how you can get around the need to predefine your data types if the target language is statically typed.

Look a bit deeper into the various JMS implementations.
Most of them are not Java only, they provide client libraries for other languages too.
Suns OpenMQ have at least a C++ interface, Apache ActiveMQ provides client side libraries for many common languages.
When it comes to message formats, they're usually decoupled from the message middleware itself. You could define your own message format. You could define your own XML schema and send XML messages. You could send BER encoded ASN.1 using some 3. party library if you want.
Or format and parse the data with a JSON library.

You might be interested in the MUSCLE library (disclaimer: I wrote it, so I may be biased). I think it meets all of the criteria you specified.
https://public.msli.com/lcs/muscle/

Three I've used:
IBM MQ Series - Too Expensive, hard to work with.
Tico Rendezvous - (renamed now to EMS?) Was very fast, used UDP, could also be used with no central server. My favorite but expensive and requires a maint fee.
ActiveMQ - I'm using this currently but finding it crashes frequently. Also it's requires some projects ported from Java like spring.net. It works but I can't recommend it due to stability issues.
Also used MSMQ in an attempt to build my own Pub/Sub, but since it doesn't handle it out of the box your stuck writing a considerable amount of code.

There is also OpenSplice DDS. This one is similar to RTI's DDS, except that it's LGPL!
Check it out:

IBM Webpshere MQ, and the licence is not too expnsive if you work on a corporate level.

You might take a look at PubSubHubbub. It's a extension to Atom/RSS to alow pubsub through webhooks. The interface is HTTP and XML, so it's language-agnostic. It's gaining increasing adoption now that Google Reader, FriendFeed and FeedBurner are using it. The main use case is blogs and stuff, but of course you can have any sort of payload.
The only open source implementation I know of so far is this one for the Google AppEngine. They say support for self-hosting is coming.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

What are the differences between the MPI_* functions and PMPI_* functions? - mpi

I'm looking at the mpi.h header, and I'm confused about the PMPI_Init function. It's placed right after the MPI_Init declaration, and it looks exactly the same. However, Msmpi.dll (for instance) doesn't have the MPI_Init export, only the PMPI_Init. What are these PMPI_ functions?

Related

What are some popular frameworks available for implementing CQRS, Event Sourcing and Saga for data consistency and distributed transactions?

Is Clojure's core.async similar to Jane Street's OCaml Core Async?

Byte code instrumentation - implement native or java agent?

Disabling ALL asynchronous execution in CUDA programs

Best Publish/Subscribe "Middleware" [closed]

Categories

Resources