MPI parallel write to a TIFF file - mpi

I'm trying to write a TIFF file in MPI code. Different processors have different parts of the image, and I want to write the image to the file in parallel.
The write fails, only the 1st processor can write to it.
How do I do this?
There is no error in my implementation, just it does not work.
I used h=TIFFOpen(file, "a+") on each processor to open the same file (I am not sure whether this is a right way or not), then each processor who is responsible for a directory will write the header at its own place using TIFFSetDirectory(h, directorynumber), then the content of each directory will be written. I will finalize with TIFFWriteDirectory(h). The result would be the first directory which is written on the file.
I thought that I need to open the file using MPI_IO but doing this way it is not TIFFOpen?

Different MPI tasks are independent programs, running on independent hosts from the OS point of view. In your case the TIFF library is not designed to handle parallel operations, so opening the file will lead the first process to succeed, all the rest to fail because they found the file already opened (on a shared filesystem).
Except in case you are dealing with huge images (eg: astronomical images) where it's important for performance to perform parallel I/O (you need a filesystem supporting it however... I am aware of IBM GPFS), I would avoid to write a custom TIFF driver with MPI_IO.
Instead the typical solution is to gather (MPI_Gather()) the image parts on the process with rank==0 and let it only save the tiff file.

Related

How to use LibTiff.NET Tiff2Pdf in .NET 6

I want to provide support to convert single-page and multi-page tiff files into PDFs. There is an executable in Bit Miracle's LibTiff.NET called Tiff2Pdf.
How do I use Tiff2Pdf in my application to convert tiff data stream (not a file) into a pdf data stream (not a file)?
I do not know if there is an API exposed because the documentation only lists Tiff2Pdf as a tool. I also do not see any examples in the examples folder using it in a programmatic way to determine if it can handle data streams or how to use it in my own program.
libtiff tools expect a filename so the background run shown below is simply from upper right X.tif to various destinations, first is default
tiff2pdf x.tif
and we can see it writes a tiff2pdf file stream to console (Standard Output) however it failed in memory without a directory to write to. However on second run we can redirect
tiff2pdf x.tif > a.pdf
or alternately specify a destination
tiff2pdf -o b.pdf x.tif
So in order to use those tools we need a File System to receive the file objects, The destination folder/file directory can be a Memory File System drive or folder.
Thus you need to initiate that first.
NuGet is a package manager simply bundling the lib and as I don't use .net your a bit out on a limb as BitMiricle are not offering free support (hence point you at Stack Overflow, a very common tech support PLOY, Pass Liability Over Yonder) however looking at https://github.com/BitMiracle/libtiff.net/tree/master/Samples
they suggest memory in some file names such as https://github.com/BitMiracle/libtiff.net/tree/master/Samples/ConvertToSingleStripInMemory , perhaps get more ideas there?

What are the differences between a Program, an Executable, and a Process?

What are the differences between a Program, an Executable, and a Process?
In simple words -
Program: Program is a set of instructions which is in human readable format.(HelloWorld.c)
Executable: Executable is a compiled form of a Program (HelloWorld.exe file)
Process: Process is the executable being run by OS. The one you see in Task Manager or Task List (HelloWord.exe Process when we double click it.)
A Program or Computer Program essentially provides a sequence instructions (or algorithms if you rather) to the operating system or computer. These computer programs come in an executable form that the Operating System recognizes and can use to directly execute the instructions.
Essentially, an Executable is a file in a format that the computer can directly execute as opposed to source files which cannot be directly executed and must first be compiled. An executable is the result of a compilation. I mentioned that the operating system recognizes executable, it does so via the extension. A common extension used for windows executable files is .exe.
Once an executable has been executed a process begins. A process is simply an instance of a computer program. You can think of a process as the execution of the instructions contained in a computer program. When you view the Task Manager on a Windows computer you can see all of the current processes. Processes own resources such as virtual memory, operating system descriptions (handles, data sources, sinks etc), security attributes and various other elements required to process effectively.
A process is basically a program in execution. Associated with each process is its address space, a list of memory locations from 0 to some maximum, which the process can read and write. The address space contains the executable program, the program’s data, and its stack. Also associated with each process is a set of resources, commonly including registers (including the program counter and stack pointer), a list of open files, out- standing alarms, lists of related processes, and all the other information needed to run the program. A process is fundamentally a container that holds all the information needed to run a program, which is a set of instructions defined by a user/developer.
A program is a set of instruction and a passive entity.Program is a part of process while a process is running state of the program and it is a unit of work in a system.
Program: It is a passive entity, like the contents of a file stored on the Hard disk. In other words, It is just like another text file on your disk. Mostly, it will be in human readable format (ex: .java file).
Executable: It is again a passive entity. It is just another file on the disk which is derived by compiling the Program. So, it is a machine readable version of the Program file (ex: .class file.). Please note that it is still sitting out there on disk with not being executed currently.
Process: It is the active part of the Program/Executable. A Program/Executable loaded into memory(RAM) and executing is called as Process. A Process consists of set of instructions. CPU executes these instructions one by one.(ex: JVM loads your .class file and gives instructions to the CPU).
Also you can have two processes executing the same Program/Executable.
A program is a collection of source files in some high level language that you write to do some
function, for example, C++ files that implement sorting lists. An executable is the file that the compiler
creates from these source files containing machine instructs that can execute on the CPU. A process is the
active execution of the executable on the CPU and in the memory. It includes the memory management
information, the current PC, SP, HP, registers, etc.
Process is a part of a program. Process is the part where logic of that particular program exsists.
Program is given as a set of process. In some cases we may divide a problem into number of parts. At these times we write a seperate logic for each part known as process.
Consider it like this.
A program is a blueprint. Like a blueprint for a building. There is no building, but an abstraction of how a building would look like.
Process is the actual construction of Building which is built according to the blueprint.
While constructing a Building, there are many things happening at the same time. You are preparing the concrete, building multiple rooms at the same time, Laying the electrical cables etc. These would be Threads.
No difference. Remember, there is no spoon.
Program is a static entity but process is a dinamic entity.
Program is nothing but the contained in a file.Where process is a program in execution.
3.Program does not use the CPU resister set but process use the CPU resister set to store the intermediate and final result.

Killing a Unix zipping process

I'm using the xz zipping utility on a PBS cluster; I've just realised that the time I've allowed for my zipping jobs won't be long enough, and so would like to restart them (and then, presumably, I'll need to include the .xz that has already been created in the new archive file?). Is it safe to kill the jobs, or is this likely to corrupt the .xz files that have already been created?
I am not sure about the implications of using xz in a cluster, but in general killing an xz process (or any decent compression utility) should only affect the file being compressed at the time the process terminates. More specifically:
Any output files from input files that have already been compressed should not be affected. The resulting .xz compressed files should remain perfectly usable.
Any input files that have not been processed yet should not be altered at all.
The input file that was being compressed at the time of termination should not be affected.
Provided that the process is terminated using the SIGTERM signal, rather than a signal than cannot be caught like SIGKILL, xz should clean-up after itself before exiting. More specifically, it should not leave any partial output files around.
If xz is killed violently, the worst that should (as opposed to might) happen is for a partial compressed file to remain on the disk, right along its corresponding input file. You may want to ensure that such files are cleaned up properly - a good way is to have xz work in a separate directory from the actual storage area and move files in and out for compression.
That said, depending on the importance of the compressed data, you may still want to incorporate measures to detect and deal with any corrupt files. There can be a lot of pathological situations where things do not happen as they are supposed to...

Is it better to execute a file over the network or copy it locally first?

My winforms app needs to run an executable that's sitting on a share. The exe is about 50MB (it's a setup.exe type of file). My app will run on many different machines/networks with varying speeds (some fast, but some awfully slow, like barely 10baseT speeds).
Is it better to execute the file straight from the share or is it more efficient to copy it locally and then execute it? I am talking in terms of annoying the user the least.
Locally is better. A copy will read each byte of the file a single time, no more, no less. As you execute, you may revisit code that is out of cache, etc and gets pulled again.
As a setup program, I would assume that the engine will want to do some kind of CRC or other integrity check too, which means it's reading the entire file anyway.
It is always better to execute it locally than running it over the network.
If you're application is small, and does not need to load many different resource during runtime then it is ok to run it over the network. It might even be preferable because if you run it over the network the code is read (download and load to memory) once as oppose of manually downloading the file then run it which take 2 read code. For example you can run a clock widget application over the network.
On the other hand, if your application does read a lot of resources during runtim, then it is absolutely a bad idea to run it over the network because each read of the resource will go over the network, which is very slow. For example, you probably don't want to be running Eclipse over the network.
Another factor to take into consideration is how many concurrent user will be accessing the application at the same time. If there are many, you should copy the application to local and run from there.
I believe the OS always copy the file to a local temp folder before it is actually executed. There are no round trips from/to the network after it gets a copy, it only happens once. This is sort of like how a browser works... it first retrieves the file, saves it locally, then it runs if off of the local temp where it saved it. In other words, there is no need to copy it manually unless you want to keep a copy for yourself.

Writing to and reading from the same file, at the same time (disk being asynchronous?)

We're creating a web service where we're writing files to disk. Sometimes these files will be read at the same time as they are written.
If we do this - writing and reading from the same file - we sometimes end up with files that are of the same length, but where some of the data inside are not the same. So with a 350mb file we get maybe 20-40 bytes that differs.
This problem mostly occur if we have 3-4 files that are being written and read at the same time. Could this problem be because there is no guarantee that after a "write" to a disk, that the data is actually written, i.e., the disk is being asynchronous.
Also, the computer we're testing on is just a standard macbook pro, so no fancy disks of any kind.
The bug might be somewhere else, but we just wanted to ask the question and see if anybody knew something about this writing+reading thing.
All modern OSs support concurrent reading and writing to files (obviously, given a single writer). So this is not an OS level bug. But do make sure you do not have multiple threads/processes trying to append data to the file.
Check your application code. Check the buffers you are using. Make sure your application is synchronized and there are no race conditions between readers and writers.

Resources