size of _POSIX_PATH_MAX - unix

Is size of _POSIX_PATH_MAX is same for all unix flovors(linux,solaris)..

No, it's not even necessarily the same for given instances of the exact same version of the kernel. In most kernel's its a configurable parameter. It will often require a kernel recompile or relink to change, but it can change without having a whole new kernel.
On some (I think most nowadays) systems that macro doesn't translate into an integer literal, it translates to a system call that returns an integer. So if the kernel allows the system to be reconfigured at runtime it will return the current value for the parameter.
I would simply assume that it can't change during the lifetime of your program. If you assume it can change at any time you end up with race conditions where the value changes in between the time you read it and the time you use it. If you just explicitly state that your program assumes it never changes during the lifetime of the program, then system admins who run it will have to adopt the practice they should be adopting anyway and only change the kernel parameter at startup.
There are three POSIX specified calls that will interest you here:
pathconf and fpathconf
sysconf
I would recommend hunting down other sources as well to get a good feel for which variables are widely supported and which aren't.

Related

How can the processor discern a far return from a near return?

Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.

Asterisk pre-emption and callers in a channel

I would like to have pre-emption calls in Asterisk. I think there is no Asterisk support for this feature so i'm trying to implement it following a simliar algorithm like the one showed in this thread: Asterisk - Pre-emption calls
So I'm having problems in this step:
check if B in call with lower priority caller( ASTDB or REALTIME or fastagi script).
I know how to check if B is in a call using for example DEVICE_STATE(device) cmd, but i can't achieve to know who is the other caller in order to see his priority.
So, How can I know if one users is in a call and who is the other caller inside this call?
Thanks a lot.
You can read variables of any channel using
SHARED(varname[,channel])
-= Info about function 'SHARED' =-
[Synopsis]
Gets or sets the shared variable specified.
[Description]
Implements a shared variable area, in which you may share variables between
channels.
The variables used in this space are separate from the general namespace
of the channel and thus ${SHARED(foo)} and ${foo} represent two completely
different variables, despite sharing the same name.
Finally, realize that there is an inherent race between channels operating
at the same time, fiddling with each others' internal variables, which is
why this special variable namespace exists; it is to remind you that variables
in the SHARED namespace may change at any time, without warning. You should
therefore take special care to ensure that when using the SHARED namespace,
you retrieve the variable and store it in a regular channel variable before
using it in a set of calculations (or you might be surprised by the
result).
Sure you have set variables first.
You can set in variables or in ASTDB name of current speaking channel using in-call macro
General complexity of any solution like you want is above average, need person with at least 1-2 year of extensive experience with *.

How to write with a single node in MPI

I want to implement some file io with the routines provided by MPI (in particular Open MPI).
Due to possible limitations of the environment, I wondered, if it is possible to limit the nodes, which are responsible for IO, so that all other nodes are required to perform a hidden mpi_send to this group of processes, to actually write the data. This would be nice in cases, where e.g. the master node is placed on a node with high-performance filesystem and the other nodes have only access to a low-performance filesystem, where the binaries are stored.
Actually, I already found some information, which might be helpful, but I couldn't find further information, how to actually implement these things:
1: There is an info key MPI_IO belonging to the communicator, which tells which ranks provide standard-conforming IO-routines. As this is listed as an environmental inquiry, I don't see, where I could modify this.
2: There is an info key io_nodes_list which seems to belong to file-related info-objects. Unfortunately, the possible values for this key are not documented and Open MPI doesn't seem to implement them in any way. Actually, I can't even get the filename from the info-object which is returned by mpi_file_get_info...
As a workaround, I could imagine two things: On the one hand, I could perform the IO with standard Fortran routines, or on the other hand, create a new communicator, which is responsible for IO. But in both cases, the processes, which are responsible for IO have to check for possible IO from the other processes to perform manual communication and file interaction.
Is there a nice and automatic way to restrict the IO to certain nodes? If yes, how could I implement this?
You explicitly asked about OpenMPI, but there are two MPI-IO implementations in OpenMPI. The old workhorse is ROMIO, the MPI-IO implementation shared among just about every MPI implementation. OpenMPI also has OMPIO, but I don't know a whole lot about tuning that one.
Next, if you want things to happen automatically for you, you'll have to use collective i/o. The independent I/O routines cannot send a message to anyone else -- they are independent and there's no way to know if the other side will be listening.
With those preliminaries out of the way...
You are asking about "i/o aggregaton". There is a bit of information here in the context of another optimization called "deferred open" (and which OMPIO calls Lazy Open)
https://press3.mcs.anl.gov/romio/2003/08/05/deferred-open/
In short, you can definitely say "only these N processes should do I/O", and then the collective I/O library will exchange data and make sure that happens. The optimization was developed some 15-odd years ago for just the situation you proposed: some nodes being better connected to storage than others (as was the case on the old ASCI Red machine, to give you a sense for how old this optimization is...)
I don't know where you got io_nodes_list. You probably want to use the MPI-IO info keys cb_config_list and cb_nodes
So, you've got a cluster with master1, master2, master3, and compute1, compute2, compute3 (or whatever the hostnames actually are). You can do something like this (in c, sorry. I'm not proficient in Fortran):
MPI_Info info;
MPI_File fh;
MPI_Info_create(&info);
MPI_Info_set(info, "cb_config_list", "master1:1,master2:1,master3:1");
MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE|MPI_MODE_WRONLY, info, &fh)
With these hints, MPI_File_write_all will aggregate all the I/O through the MPI processes on master1, master2, and master3. ROMIO won't blow up your memory because it will chunk up the I/O into a smaller working set (specified with the "cb_buffer_size" hint: cranking this up, if you have the memory, is a good way to get better performance).
There is a ton of information about the hints you can set in the ROMIO users guide:
http://www.mcs.anl.gov/research/projects/romio/doc/users-guide/node6.html

Why does System V shared memory have separate get and attach functions?

Using System V shared memory IPC requires calls to the following two functions:
int shmget(key_t key, size_t size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
Why are they designed to be separate, instead of having a single function that accepts these arguments, performs both functions and simply returns the address?
We can consider files as an analogy. open on a string (the file path) gives us a file descriptor, and we use that to read/write from the file. We close on the file descriptor when we're done. This design seems natural, we don't have to open with a string to get a descriptor, and then attach to the descriptor.
As an example of what I have in mind, take a look at the FreeBSD sendmail shared memory implementation.
This kind of separation (shm_open and mmap) also exists with POSIX shared memory, but the reason was that mmap existed before shm_open was implemented and could be reused, and mmap requires a descriptor (source: UNIX Network Programming Vol. 2, R. Stevens, chapter 13, page 326).
Shared memory is probably one of the fastest ways of allowing for IPC as data need not be copied, the problem associated with it though is synchronizing access between multiple threads. You could do this using semaphores or record locks , we end up using the later in unix fro shared memory even though they are not as efficient as they are simple, the system cleans up well, and you don't need some of the bling that semaphores bring along.
Lets look into how these work to understand why they are implemented as such.
In comes the shmid_ds used by the linux kernel (http://www.tldp.org/LDP/lpg/node68.html)
the shm_nattch is the unsigned int counter for current attaches. shmget gets you an shm id and sets stuff like the ipc_perm , dates, pid, atime ctime, request of the segment size (shm_segsz)
next the shmctl kicks in and does stuff for ipc using IPC_STAT, IPC_RMID, IPC_SET like setting perms, getting or removing shm_id for a segment or even locking or unlocking it.
Once the segment is ready shmat is used by a process to attach to its address space, depending on the flags and address parameters. Once it attaches the kernel increments the shm_nattch. When detaching we call shmdt to detach . Removal of the identifier and the associated data structure is not automated some process has to do this calling shmctl with the IPC_RMID and depending on shm_perm
As you can see this is all very similar to how one would use semaphores and the implementation makes sense.
One possible reason I could think of is this:
(From the manpage of shmget)
After a fork(2) the child inherits the attached shared memory segments.
After an execve(2) all attached shared memory segments are detached from the process.
Upon _exit(2) all attached shared memory segments are detached from the process.
Well, technically attaching and detaching is basic reference counting on the shared memory segment that is reserved during shmget.
The functionalities of allocating the shared memory segment, via shmget and reference counting them (up or down, via shmat and shmdt respectively), are separate so that, code can be reused during fork and exec.
If they were both packed into the same function, you would anyways need a separate function, which just does reference counting (to be invoked during fork/exec). So, I think this design is simply to promote code reuse, and avoid code duplication.

Are there any reasons why one should use MPI's Wtime?

I've been wondering whether there are any particular reasons why one should use Wtime instead of other time measurement methods? Is it more accurate or reliable?
The only reason I see is platform independence.
Since MPI_Wtime() guarantees that the beginning time at all ranks is the same, it can not only be used for calculating time between any two points at the same rank, but also to compare the the time taken by different ranks to reach a certain point very conveniently.
There can be other applications too for this globally synched clock, but right now i can think only about this.
MPI_Wtime() does not guarantee the global synchronization among process lying on different nodes. It does provide the synchronous clock for process lying on same node but also gettimeofday() provides the same.
According to the manual for MPI_Wtime (Open MPI 4.0.0):
On POSIX platforms, this function may utilize a timer that is cheaper to invoke than the gettimeofday() system call, but will fall back to gettimeofday() if a cheap high-resolution timer is not available. The ompi_info command can be consulted to see if Open MPI supports a native high-resolution timer on your platform; see the value for "MPI_WTIME support" (or "options:mpi-wtime" when viewing the parsable output). If this value is "native", a method that is likely to be cheaper than gettimeofday() will be used to obtain the time when MPI_Wtime is invoked.

Resources