Neighborhood collectives-reduce operation - graph

I need the allreduce (MPI_Ineighbor_allreduce) for collective communication in MPI, unfortunately it is not included yet.
The obvious, not so efficient solution, is to use MPI_Neighbor_alltoall
at the expense of increasing buffer size, Do you have any suggestions?
Is there any plan to include this in future releases?
Thanks

there is no such thing as MPI_Neighbor_allreduce in the MPI standard.
if you want it, feel free to ask for it at http://mpi-forum.org/
btw, did you mean to use MPI_Neighbor_allgather instead of MPI_Neighbor_alltoall in order to implement the neighbor allreduce ?

Related

How to know the all the ranks that are part of a group in MPI outside that group?

Is there a way for a rank 'R' to know the processes in a particular MPI_Group 'Grp', despite 'R' not belonging to 'Grp'. I want to do it without using any point to point communication calls, collective communication calls like Gather, Gatherv, Scatter etc and shared memory. Is it possible to use MPI_Group_translate_ranks for this purpose?
If an MPI process is not in a particular group, then it won't have a handle to be able to query it. In MPI, the only way you get handles to groups, communicators, etc. is to be part of the creation of that handle.
So to answer your question, no, there probably isn't a way to learn information about groups of which you are not a member.
That being said, your question is still rather unclear. You're asking to do something that I haven't ever heard of someone wanting to do before. If you can give a better demonstration of what exactly you're trying to do and why, you might get a better answer with another solution.

Vector norms and Finding Maximum (value and index)

I'm running some performance sensitive code and looking to improve speed. I am using vnormdiff and findmax a lot and wondered whether these are the most efficient functions around? Any thoughts greatly appreciated.
Whenever you encounter a performance problem, it's good to look at your problem from two angles. First, is my overall algorithm the best it can be? If you're using an O(N^2) algorithm but an O(N) is available, that could make an enormous difference. It sounds like you're examining neighbors, so some of the more refined nearest-neighbor algorithms (which depend on dimensionality) might be of assistance.
Second, no discussion about optimization can really get started without profiling information. There's documentation on Julia's profiler here, and a graphical tool for inspecting it here.

Tuning Mathematical Parallel Codes

Assuming that I am interested in performance rather than portability of my linear algebra iterative multi-threaded solver and that I have the results of profiling my code in hand, how do I go about tuning my code to run optimally on that machine of my choice?
The algorithm involves Matrix-Vector multiplications, norms and dot-products. (FWIW, I am working on CG and GMRES).
I am working on codes which are of matrix size roughly equivalent to the full size of the RAM (~6GB). I'll be working on Intel i3 Laptop. I'll be linking my codes using Intel MKL.
Specifically,
Is there a good resource(PDF/Book/Paper) for learning manual tuning? There are numerous things that I learnt by doing for instance : Manual Unrolling isn't always optimal or about compiler flags but I would prefer a centralized resource.
I need something to translate profiler information to improved performance. For instance, my profiler tells me that my stacks of one processor are being accessed by another or that my mulpd ASM is taking too much time. I have no clue what these mean and how I could use this information for improving my code.
My intention is to spend as much time as needed to squeeze as much compute power as possible. Its more of a learning experience than for actual use or distribution as of now.
(I am concerned about manual tuning not auto-tuning)
Misc Details:
This differs from usual performance tuning since the major portions of the code are linked to Intel's proprietary MKL library.
Because of Memory Bandwidth issues in O(N^2) matrix-vector multiplications and dependencies, there is a limit to what I could manage on my own through simple observation.
I write in C and Fortran and I have tried both and as discussed a million times on SO, I found no difference in either if I tweak them appropriately.
Gosh, this still has no answers. After you've read this you'll still have no useful answers ...
You imply that you've already done all the obvious and generic things to make your codes fast. Specifically you have:
chosen the fastest algorithm for your problem (either that, or your problem is to optimise the implementation of an algorithm rather than to optimise the finding of a solution to a problem);
worked your compiler like a dog to squeeze out the last drop of execution speed;
linked in the best libraries you can find which are any use at all (and tested to ensure that they do in fact improve the performance of your program;
hand-crafted your memory access to optimise r/w performance;
done all the obvious little tricks that we all do (eg when comparing the norms of 2 vectors you don't need to take a square root to determine that one is 'larger' than another, ...);
hammered the parallel scalability of your program to within a gnat's whisker of the S==P line on your performance graphs;
always executed your program on the right size of job, for a given number of processors, to maximise some measure of performance;
and still you are not satisfied !
Now, unfortunately, you are close to the bleeding edge and the information you seek is not to be found easily in books or on web-sites. Not even here on SO. Part of the reason for this is that you are now engaged in optimising your code on your platform and you are in the best position to diagnose problems and to fix them. But these problems are likely to be very local indeed; you might conclude that no-one else outside your immediate research group would be interested in what you do, I know you wouldn't be interested in any of the micro-optimisations I do on my code on my platform.
The second reason is that you have stepped into an area that is still an active research front and the useful lessons (if any) are published in the academic literature. For that you need access to a good research library, if you don't have one nearby then both the ACM and IEEE-CS Digital Libraries are good places to start. (Post or comment if you don't know what these are.)
In your position I'd be looking at journals on 2 topics: peta- and exa-scale computing for science and engineering, and compiler developments. I trust that the former is obvious, the latter may be less obvious: but if your compiler already did all the (useful) cutting-edge optimisations you wouldn't be asking this question and compiler-writers are working hard so that your successors won't have to.
You're probably looking for optimisations which like, say, loop unrolling, were relatively difficult to find implemented in compilers 25 years ago and which were therefore bleeding-edge back then, and which themselves will be old and established in another 25 years.
EDIT
First, let me make explicit something that was originally only implicit in my 'answer': I am not prepared to spend long enough on SO to guide you through even a summary of the knowledge I have gained in 25+ years in scientific/engineering and high-performance computing. I am not given to writing books, but many are and Amazon will help you find them. This answer was way longer than most I care to post before I added this bit.
Now, to pick up on the points in your comment:
on 'hand-crafted memory access' start at the Wikipedia article on 'loop tiling' (see, you can't even rely on me to paste the URL here) and read out from there; you should be able to quickly pick up the terms you can use in further searches.
on 'working your compiler like a dog' I do indeed mean becoming familiar with its documentation and gaining a detailed understanding of the intentions and realities of the various options; ultimately you will have to do a lot of testing of compiler options to determine which are 'best' for your code on your platform(s).
on 'micro-optimisations', well here's a start: Performance Optimization of Numerically Intensive Codes. Don't run away with the idea that you will learn all (or even much) of what you want to learn from this book. It's now about 10 years old. The take away messages are:
performance optimisation requires intimacy with machine architecture;
performance optimisation is made up of 1001 individual steps and it's generally impossible to predict which ones will be most useful (and which ones actually harmful) without detailed understanding of a program and its run-time environment;
performance optimisation is a participation sport, you can't learn it without doing it;
performance optimisation requires obsessive attention to detail and good record-keeping.
Oh, and never write a clever piece of optimisation that you can't easily un-write when the next compiler release implements a better approach. I spend a fair amount of time removing clever tricks from 20-year old Fortran that was justified (if at all) on the grounds of boosting execution performance but which now just confuses the programmer (it annoys the hell out of me too) and gets in the way of the compiler doing its job.
Finally, one piece of wisdom I am prepared to share: these days I do very little optimisation that is not under one of the items in my first list above; I find that the cost/benefit ratio of micro-optimisations is unfavourable to my employers.

Can someone suggest a good way to understand how MPI works?

Can someone suggest a good way to understand how MPI works?
If you are familiar with threads, then you treat each node as a thread (to an extend)
You send a message (work) to a node and it does some work and then returns you some results.
Similar behaviors between thread & MPI:
They all involve partitioning a work and process it separately.
They all would have overhead when more node/threads involved, MPI overhead is more significant compared to thread, passing messages around nodes would cause significant overhead if work is not carefully partitioned, you might end up with the time passing messages > computational time required to process job.
Difference behaviors:
They have different memory models, each MPI node does not share memory with others and does not know anything about the rest of world unless you send something to it.
Here you can find some learning materials http://www.mcs.anl.gov/research/projects/mpi/
Parallel programming is one of those subjects that is "intrinsically" complex (as opposed to the "accidental" complexity, as noted by Fred Brooks).
I used Parallel Programming in MPI by Peter Pacheco. This book gives a good overview of the basic MPI topics, available API's, and common patterns for parallel program construction.

I am compiling a rules of programming mindset for my team: What are yours? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 3 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been working on a list for a while that helps me share the why of programming approach and thought as much as how to do something.
For this, I wanted to build a list of things that are:
best practice,
best thought,
best approach...
that help a programmer's ability to analyze, think, approach, solve and implement in the most effective way.
I have seen dozens of incredibly valuable comments in questions throughout Stack Overflow, but I couldn't find a place where we keep them together. There is the most controversial opinion on Stack Overflow. However, I'm just looking for sagely insights that can be shared and help my team, and I approach and solve problems better through better programming.
Hopefully this can be one place to gather the one or two liners that are concise, profound and easy to share, repeat, review. If we keep it to one rule per answer it might be easiest to vote up/down.
I'll start with the first.
DRY - Don't Repeat Yourself - In code, comments or documentation.
Always leave the code a little better than when you found it.
Code does not exist until entered into a versioning control system.
Don't be afraid to admit "I don't know" and ask.
10 minutes asking someone could save a day pulling your hair out!
KISS - Keep it simple, stupid.
Pick the simplest solution that works.
Don't make things (too) complicated before they need to be.
Just because everyone else is using some complicated framework to solve their problem, doesn't mean you have to.
Don't reinvent the wheel
If there ought to be a function for it in the core library - there probably is.
Maintainability is important.
Write code as if the person who will end up maintaining it is crazy and knows where you live.
Someone else won't fix it.
If a problem comes to your attention, take ownership long enough to ensure it will be taken care of one way or another.
Don't optimize unless there's a demonstrable problem.
Most of the time when people try to optimize code before it's been proved necessary, they'll spend a lot of resources, make the code harder to read and maintain, and achieve no noticeable effect. Sometimes they'll even make it worse.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
- Donald Knuth
How hard can it be?
Don't let any problem intimidate you.
Don't Gather Requirements -- Dig for Them
Requirements rarely lie on the surface. They're buried deep beneath layers of assumptions, misconceptions, and politics
via The Pragmatic Programmer
Follow the SOLID principles:
Single Responsibility Principle (SRP)
There should never be more than one reason for a class to change.
Open-Closed Principle (OCP)
Software entities (classes, modules, functions, etc.)
should be open for extension, but closed for
modification.
Liskov Substitution Principle (LSP)
Functions that use pointers or references to base
classes must be able to use objects of derived classes
without knowing it.
Interface Segregation Principle (ISP)
Clients should not be forced to depend upon interfaces
that they do not use.
Dependency Inversion Principle (DIP)
A. High level modules should not depend upon low
level modules. Both should depend upon abstractions.
B. Abstractions should not depend upon details. Details
should depend upon abstractions.
Best Practice: Use your brain
Don't follow any trend/principle/pattern without thinking about it
I think almost everything that is listed under "The Zen of Python" applies for every "Rules of Programming Mindset" list. Start with 'python -c "import this"':
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Test Driven Development (TDD) makes coders sleep better at night
Just to clarify: Some people seem to think TDD is just an incompetent coder's way of limping from A to B without borking everything up too much, and that if you know what you're doing, that means there is no need for (unit) testing methodologies. That completely misses the point of Test Driven Development. TDD is about three (update: apparently four) things:
Refactoring magic. Having a full set of tests means you can make otherwise insane refactoring stunts, juggling the entire structure of your application without missing even one of the two hundred crazy subtle side effects that result from it. Even the best programmers are reluctant to refactor their core classes and interfaces without good (unit) test coverage, because it's damn near impossible to track down all the little 'ripple effects' it causes without them.
Detecting pitfalls early. If you are writing tests the right way, it means forcing yourself to consider all the fringe cases. Often, this leads to better design choices once the actual development begins, because the coder has already considered some of the trickier situations that may call for a different inheritance structure or a more flexible design pattern. The need for these changes is often not apparent - or intuitive - during initial planning and analysis, but those exact changes can make the application much easier to extend and maintain down the line.
Ensuring that tests get written. TDD requires you to write the tests before writing the code. Sure, that can be a pain in the ass, since writing tests is tedious compared to writing actual code - and often takes longer, too. However, doing so is the only way to make sure the tests will be written at all. If you think you'll remember to write the tests once the code is done, you're almost always wrong.
Forcing you to write better code. Since TDD forces all code to be testable (you don't write code before there is a test for it), it requires you write more decoupled code so that you can test the components in isolation. So TDD forces you to write better code. (Thanks, Esko)
Google before you ask your colleague and interrupt his coding.
Less code is better than more, as long as it makes more sense than lots of code.
Habits of the lazy coder
The first time you are asked to do something, do it (right).
The second time you are asked to do it, make a tool that does it automatically.
And the third time, if the tool doesn't cut it, design a domain specific language for generating more tools.
(not to be taken too seriously)
Be a Catalyst for Change
You can't force change on people. Instead, show them how the future might be and help them participate in creating it.
via The Pragmatic Programmer
Don't Panic When Debugging
Take a deep breath and THINK! about what could be causing the bug.
via The Pragmatic Programmer
You may copy and paste to get it working, but you may not leave it that way.
Duplicated code is an intermediate step, not a final product.
It's Both What You Say and the Way You Say It
There's no point in having great ideas if you don't communicate them effectively.
via The Pragmatic Programmer
Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.
From: Coding Horror
Build Breaker Buys Lunch
Publish Early, Publish Often
Build it correct first. Make it fast second.
Frequently conduct code reviews
Code review and consequently refactoring is an ongoing task. Here is a few goodies about code review in my opinion:
It improves code quality.
It helps refactor reusable codes into reusable libraries.
It helps you learn from your fellow developers.
It helps you learn from your mistakes and refresh your memory about a genius code you have written before.
Anything that could affect how the application runs should be treated as code, and that means putting it in version control. Especially build scripts and database schema and data (.sql) files.
Take part in open source development
If you are using open source code in your projects, remember to post your bugfixes and improvements back to the community. It's not a development best practice per se, but it's definitely a programmer mindset to strive for.
Understand the tools you use
Don't use a pattern until you've understood why you're using it; don't use a tool without knowing why; don't rely on your framework or language designer always being right for your situation, but also don't assume they're wrong until proven to be!
Convention over Configuration
Especially where conventions are strong and some flexibility can be sacrificed.

Resources