What are the differences between a Program, an Executable, and a Process? - unix

What are the differences between a Program, an Executable, and a Process?

In simple words -
Program: Program is a set of instructions which is in human readable format.(HelloWorld.c)
Executable: Executable is a compiled form of a Program (HelloWorld.exe file)
Process: Process is the executable being run by OS. The one you see in Task Manager or Task List (HelloWord.exe Process when we double click it.)

A Program or Computer Program essentially provides a sequence instructions (or algorithms if you rather) to the operating system or computer. These computer programs come in an executable form that the Operating System recognizes and can use to directly execute the instructions.
Essentially, an Executable is a file in a format that the computer can directly execute as opposed to source files which cannot be directly executed and must first be compiled. An executable is the result of a compilation. I mentioned that the operating system recognizes executable, it does so via the extension. A common extension used for windows executable files is .exe.
Once an executable has been executed a process begins. A process is simply an instance of a computer program. You can think of a process as the execution of the instructions contained in a computer program. When you view the Task Manager on a Windows computer you can see all of the current processes. Processes own resources such as virtual memory, operating system descriptions (handles, data sources, sinks etc), security attributes and various other elements required to process effectively.

A process is basically a program in execution. Associated with each process is its address space, a list of memory locations from 0 to some maximum, which the process can read and write. The address space contains the executable program, the program’s data, and its stack. Also associated with each process is a set of resources, commonly including registers (including the program counter and stack pointer), a list of open files, out- standing alarms, lists of related processes, and all the other information needed to run the program. A process is fundamentally a container that holds all the information needed to run a program, which is a set of instructions defined by a user/developer.

A program is a set of instruction and a passive entity.Program is a part of process while a process is running state of the program and it is a unit of work in a system.

Program: It is a passive entity, like the contents of a file stored on the Hard disk. In other words, It is just like another text file on your disk. Mostly, it will be in human readable format (ex: .java file).
Executable: It is again a passive entity. It is just another file on the disk which is derived by compiling the Program. So, it is a machine readable version of the Program file (ex: .class file.). Please note that it is still sitting out there on disk with not being executed currently.
Process: It is the active part of the Program/Executable. A Program/Executable loaded into memory(RAM) and executing is called as Process. A Process consists of set of instructions. CPU executes these instructions one by one.(ex: JVM loads your .class file and gives instructions to the CPU).
Also you can have two processes executing the same Program/Executable.

A program is a collection of source files in some high level language that you write to do some
function, for example, C++ files that implement sorting lists. An executable is the file that the compiler
creates from these source files containing machine instructs that can execute on the CPU. A process is the
active execution of the executable on the CPU and in the memory. It includes the memory management
information, the current PC, SP, HP, registers, etc.

Process is a part of a program. Process is the part where logic of that particular program exsists.
Program is given as a set of process. In some cases we may divide a problem into number of parts. At these times we write a seperate logic for each part known as process.

Consider it like this.
A program is a blueprint. Like a blueprint for a building. There is no building, but an abstraction of how a building would look like.
Process is the actual construction of Building which is built according to the blueprint.
While constructing a Building, there are many things happening at the same time. You are preparing the concrete, building multiple rooms at the same time, Laying the electrical cables etc. These would be Threads.

No difference. Remember, there is no spoon.

Program is a static entity but process is a dinamic entity.
Program is nothing but the contained in a file.Where process is a program in execution.
3.Program does not use the CPU resister set but process use the CPU resister set to store the intermediate and final result.

Related

What is the pro's and con's of locking the actual file vs an empty lock file?

My program is writing to a binary file, and there could be multiple instances of the program accessing the same binary file for the same user. In Unix/Linux, I see some programs (particularly daemon processes) locking an empty lock file instead of the actual shared data that needs to be locked (so instead of locking ~/.data/foo they lock ~/.data/foo.lck). What are the pros and cons of locking the actual file vs an empty lock file?
flock is not supported over NFS or other network file systems for all version of unix (it wasn't even supported by Linux until 2.6.12). On the other hand O_CREAT|O_EXCL is much more reliable over many more file systems, and has been so for much longer.
Even on systems that do support flock on network filesystems (or cases where you don't need that flexibility), O_CREAT|O_EXCL together with flock is very useful because it distinguishes between a clean shutdown and a non-clean shutdown. flock helpfully goes away automatically, but it also, unhelpfully, doesn't distinguish why it went away.
flocking the file itself prevents atomic writes (copy, erase old, rename), or any other case where you might erase the existing file. Sometimes "the actual file" doesn't always have the same inode over the entire run of the program. So a separate file is much more convenient in those cases as well. This is very common in those foo.lck cases, because often you're locking foo for a short period of time, and might erase it in the process.
I see three cons of an empty lock file:
The user permissions of the directory should allow you to create a file.
In case of disk space issues, this might fail.
In case your program crashes, the lockfile is still present.
I see one con of modifying the actual file's name:
In case your program crashes, your file has been altered (only the filename, but it might generate confusion).
Obviously, I see one big advantage of the empty lock file:
your original file does not change at all.
By the way, I believe this question is better suited for the SoftwareEngineering community.

Is there a way to automatically make a copy of a file each time it is updated in Unix?

I have an application that updates some files in Unix server. Since I cannot modify this application, is there any way I can make sure that these files are copied before each update so I can have a history of the changes?
Is there a way/tool in Unix so I can do that?
If on Linux (specifically) you could use inotify(7) facilities (perhaps via incrontab ...)
Alternatively, you might run periodically (thru some crontab(5) entry) a script doing some make with your particular Makefile (since GNU make is designed to care about timestamps) managing e.g. backups. Or you could periodically run some rsync command.
However, it smells like you need some revision control (also known as version control system). I strongly recommend git; you could use it before and after running your application (e.g. write some wrapping shell script doing that).
But there is probably no universal solution (e.g. what if the monitored application is keeping a file descriptor opened for a long time, and write the file little by little...). You should explain much more what is happening and what do you want ...

MPI parallel write to a TIFF file

I'm trying to write a TIFF file in MPI code. Different processors have different parts of the image, and I want to write the image to the file in parallel.
The write fails, only the 1st processor can write to it.
How do I do this?
There is no error in my implementation, just it does not work.
I used h=TIFFOpen(file, "a+") on each processor to open the same file (I am not sure whether this is a right way or not), then each processor who is responsible for a directory will write the header at its own place using TIFFSetDirectory(h, directorynumber), then the content of each directory will be written. I will finalize with TIFFWriteDirectory(h). The result would be the first directory which is written on the file.
I thought that I need to open the file using MPI_IO but doing this way it is not TIFFOpen?
Different MPI tasks are independent programs, running on independent hosts from the OS point of view. In your case the TIFF library is not designed to handle parallel operations, so opening the file will lead the first process to succeed, all the rest to fail because they found the file already opened (on a shared filesystem).
Except in case you are dealing with huge images (eg: astronomical images) where it's important for performance to perform parallel I/O (you need a filesystem supporting it however... I am aware of IBM GPFS), I would avoid to write a custom TIFF driver with MPI_IO.
Instead the typical solution is to gather (MPI_Gather()) the image parts on the process with rank==0 and let it only save the tiff file.

Is it better to execute a file over the network or copy it locally first?

My winforms app needs to run an executable that's sitting on a share. The exe is about 50MB (it's a setup.exe type of file). My app will run on many different machines/networks with varying speeds (some fast, but some awfully slow, like barely 10baseT speeds).
Is it better to execute the file straight from the share or is it more efficient to copy it locally and then execute it? I am talking in terms of annoying the user the least.
Locally is better. A copy will read each byte of the file a single time, no more, no less. As you execute, you may revisit code that is out of cache, etc and gets pulled again.
As a setup program, I would assume that the engine will want to do some kind of CRC or other integrity check too, which means it's reading the entire file anyway.
It is always better to execute it locally than running it over the network.
If you're application is small, and does not need to load many different resource during runtime then it is ok to run it over the network. It might even be preferable because if you run it over the network the code is read (download and load to memory) once as oppose of manually downloading the file then run it which take 2 read code. For example you can run a clock widget application over the network.
On the other hand, if your application does read a lot of resources during runtim, then it is absolutely a bad idea to run it over the network because each read of the resource will go over the network, which is very slow. For example, you probably don't want to be running Eclipse over the network.
Another factor to take into consideration is how many concurrent user will be accessing the application at the same time. If there are many, you should copy the application to local and run from there.
I believe the OS always copy the file to a local temp folder before it is actually executed. There are no round trips from/to the network after it gets a copy, it only happens once. This is sort of like how a browser works... it first retrieves the file, saves it locally, then it runs if off of the local temp where it saved it. In other words, there is no need to copy it manually unless you want to keep a copy for yourself.

Strategy for handling user input as files

I'm creating a script to process files provided to us by our users. Everything happens within the same UNIX system (running on Solaris 10)
Right now our design is this
User places file into upload directory
Script placed on cron to run every 10 minutes.
Script looks for files in upload directory, processes them, deletes immediately afterward
For historical/legacy reasons, #1 can't change. Also, deleting the file after processing is a requirement.
My primary concern is concurrency. It is very likely that the situation will arise where the analysis script runs while an input file is still being written to. In this case, data will be lost and this (obviously) unacceptable.
Since we have no control over the user's chosen means of placing the input file, we cannot require them to obtain a file lock. As I understand, file locks are advisory only on UNIX. Therefore a user must choose to adhere to them.
I am looking for advice on best practices for handling this problem. Thanks
Obviously all the best solutions involve the client providing some kind of trigger indicating that it has finished uploading. That could be a second file, an atomic move of the file to a processing directory after writing it to a stage directory, or a REST web service. I will assume you have no control over your clients and are unable or unwilling to change anything about them.
In that case, you still have a few options:
You can use a pretty simple heuristic: check the file size, wait 5 seconds, check the file size. If it didn't change, it's probably good to go.
If you have super-user privileges, you can use lsof to determine if anyone has this file open for writing.
If you have access to the thing that handles upload (HTTP, FTP, a setuid script that copies files?) you can put triggers in there of course.

Resources