MPI velocity sending - mpi

I have a simple question, but it requires a depp knowledge of computer architecture.
I'm using MPI, all processors need to read some data from the same text file. It is quicker to let each processor accedes to the txt file or is better to read the file with a single processor and then send the data to the other?

Related

Is it possible for Opencl to cache some data while running between kernels?

I currently have a problem scenario where I'm doing graph computation tasks and I always need to update my vertex data on the host side, iterating through the computations to get the results. But in this process, the data about the edge is unchanged. I want to know if there is a way that I can use OpenCL to repeatedly write data, run the kernel, and read the data, some unchanged data can be saved on the device side to reduce communication costs. By the way, I am currently only able to run OpenCL under version 1.2.
Question 1:
Is it possible for Opencl to cache some data while running between kernels
Yes, it is possible in OpenCL programming model. Please check Buffer Objects, Image Objects and Pipes in OpenCL official documentation. Buffer objects can be manipulated by the host using OpenCL API calls.
Also the following OpenCL StackOverflow posts will further clarify your concept regarding caching in OpenCL:
OpenCL execution strategy for tree like dependency graph
OpenCL Buffer caching behaviour
Memory transfer between host and device in OpenCL?
And you need to check with caching techniques like double buffering in OpenCL.
Question 2:
I want to know if there is a way that I can use OpenCL to repeatedly write data, run the kernel, and read the data, some unchanged data can be saved on the device side to reduce communication costs
Yes, it is possible. You can either do it through batch processing or data tiling. Because as the overhead associated with each transfer, batching many small
transfers into one larger transfer performs significantly better than making each
transfer separately. There can be many examples of batching or data tiling. One cane be this:
OpenCL Kernel implementing im2col with batch
Miscellaneous:
If it is possible, please use the latest version of OpenCL. Version 1.2 is old.
Since you have not mentioned, programming model can differ between hardware accelerators like FPGA and GPU.

Hex file verification inside microcontroller

As we all know, the hex file is the heart of our application code which will be programmed into the microcontroller's flash memory for execution. My doubt is before the execution of this hex file, will it be verified by a microcontroller or it will just execute once all start-up processes finished?
Disclaimer: Because I don't know all microcontrollers, this is not a complete answer.*
The flashed binary executable will just be executed.
Some microcontrollers check for a certain value at a fixed address to decide whether to start the built-in bootloader or a flashed user program.
If you need the user program to be checked, you will need to implement this yourself. I have worked with such systems, it is quite common, especially in safety-related environments.
Concerning the format of hex files:
Intelhex as well as other format like SREC are human readable text representations of binary data. The common reason for the checksums in these formats is to ensure data consistency during transmission, which was done via unreliable channels back at the time when the formats were invented.
Another advantage is the limitation to 7-bit ASCII characters that can be transferred losslessly via old internet protocols.
However, the "real" contents, the binary data, is stored directly in the flash memory of the microcontrollers. Checksums might be used by the receiving software (for example the bootloader) in the microcontroller when the user program is to be flashed. But after flashing they are gone.

MPI one-sided file I/O

I have some questions on performing File I/Os using MPI.
A set of files are distributed across different processes.
I want the processes to read the files in the other processes.
For example, in one-sided communication, each process sets a window visible to other processors. I need the exactly same functionality. (Create 'windows' for all files and share them so that any process can read any file from any offset)
Is it possible in MPI? I read lots of documentations about MPI, but couldn't find the exact one.
The simple answer is that you can't do that automatically with MPI.
You can convince yourself by seeing that MPI_File_open() is a collective call taking an intra-communicator as first argument and returning a file handler to the opened file as last argument. In this communicator, all processes open the file and therefore, all processes must see the file. So unless a process sees a file, it cannot get a MPI_file handler to access it.
Now, that doesn't mean there's no solution. A possibility could be to do by hand exactly what you described, namely:
Each MPI process opens individually the file they see and are responsible of; then
Each of theses processes reads this local file into a buffer;
Theses individual buffers are all exposed, using either a global MPI_Win memory windows, or several individual ones, ready for one-sided read accesses; and finally
All read accesses to any data that were previously stored in these individual local files, are now done through MPI_Get() calls using the memory window(s).
The true limitation of this approach is that it requires to fully read all of the individual files, therefore, you need to have sufficient memory per node for storing each of them. I'm well aware that this is a very very big caveat that could just make the solution completely impractical. However, if the memory is sufficient, this is an easy approach.
Another even simpler solution would be to store the files into a shared file system, or having them all copied on all local file systems. I imagine this isn't an option since the question wouldn't have been asked otherwise...
Finally, in last resort, a possibility I see would be to dedicate a MPI process (or an OpenMP thread of a MPI process) per node to serve each files. This process would just act as a "file server", answering "read" request coming from the other MPI processes, and serving them by reading the requested data from the file, and sending it back via MPI. It's a bit lengthy to write, but it should work.

Intercept outputs from a Program in Windows 7

I have an executable program which outputs data to the harddisk e.g. C:\documents.
I need some means to intercept the data in Windows 7 before they get to the hard drive. Then I will encrypt the data and send it back to the harddisk. Unfortunately, the .exe file does not support redirection command i.e. > in command prompt. Do you know how I can achieve such a thing in any programming language (c, c++, JAVA, php).
The encryption can only be done before the plain data is sent to the disk not after.
Any ideas most welcome. Thanks
This is virtually impossible in general. Many programs write to disk using memory-mapped files. In such a scheme, a memory range is mapped to (part of) a file. In such a scheme, writes to file can't be distinguished from writes to memory. A statement like p[OFFSET_OF_FIELD_X] = 17; is a logically write to file. Furthermore, the OS will keep track of the synchronization of memory and disk. Not all logical writes to memory are directly translated into physical writes to disk. From time to time, at the whim of the OS, dirty memory pages are copied back to disk.
Even in the simpler case of CreateFile/WriteFile, there's little room to intercept the data on the fly. The closest you could achieve is the use of Microsoft Detours. I know of at least one snakeoil encyption program (WxVault, crapware shipped on Dells) that does that. It repeatedly crashed my application in the field, which is why my program unpatches any attempt to intercept data on the fly. So, not even such hacks are robust against programs that dislike interference.

How is MPI I/O Implemented?

Long-Winded Background
I'm working on parallelising some code for cardiac electrophysiology simulations. Since users can specify their own simulations using an in-built scripting language, I have no way of knowing how to manage the trade-off of communication vs. computation. To combat this, I'm making a sort-of runtime profiler, which will decide how to handle the domain decomposition once it's seen the simulation to be run and the hardware environment that it has to work with.
My question is this:
How is MPI I/O implemented behind the scenes? Is each process actually writing to a single file on some other node, or is each process writing to some sparse file, which will get spliced back together when the file is closed?
Knowing this will help me decide whether to consider I/O operations as communication or computation, and adjust the balance accordingly…
Thanks in advance for any insight you can offer.
Ross
The mechanism for I/O is implementation dependent. In addition, there is not a single style of I/O. Some I/O is cached by the remote ranks and collected by the mpirun process at the end of the run. Some I/O is written to local scratch space as required. Some I/O is written to a NAS/SAN style high performance shared file system.
Some MPI's use 3rd party libraries to support I/O to parallel file systems, and those details may be proprietary. Some file systems are local discs, others are SAN over fiber or InfinBand.
How are you planning to actually measure the time spent in I/O? Are you planning to use the pMPI interface to intercept all the calls into the library?

Resources