Unix architecture - Is daemon process run in kernel level or user level? - unix

What level do daemon processes like init, httpd, ftpd, dhcpd, etc run? Is it in kernel level or user level like shell, library functions and applications?
I read several Unix books and internet articles but none mention where do they run.

They run in userspace but with root privileges for some of them. There is no requirement for a daemon (in general) to run in kernel space. Kernel space is restricted for tasks that handle the lowest level of interaction with the hardware (drivers) and back the vital functions of the OS (memory management, file system, etc.).

Related

Running performance tools using RBAC

I recently started a job that will involve a lot of performance tweaking.
I was wondering whether tools like eBPF and perf can be used with RBAC? Or will full root access be required? Getting root access might be difficult. We're mainly using fairly old Linux machines - RHEL 6.5. I'm not too familiar with RBAC. It home I have used Dtrace on Solaris, macOS and FreeBSD, but there I have the root password.
RHEL lists several profiling and tracing solutions for RHEL6 including perf in its
Performance Tuning Guide and Developer Guide:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-analyzperf-perf
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/perf-using
Chapter 3. Monitoring and Analyzing System Performance of Performance Tuning Guide mentions several tools: Gnome System Monitor, KDE System Guard, Performance Co-Pilot (PCP), top/ps/vmstat/sar, tuned and ktune, MRG Tuna, and application profilers SystemTap, Oprofile, Valgrind (which is not real profiler, but cpu emulator with instruction and cache event counting), perf.
Chapter 5. Profiling of Developer Guide lists Valgrind, oprofile, SystemTap, perf, and ftrace.
Usually profiling of kernel or whole system is allowed only for root, or for user with CAP_SYS_ADMIN capability. Some profiling is limited by sysctl variables
kernel.perf_event_paranoid (documented in https://www.kernel.org/doc/Documentation/sysctl/kernel.txt):
perf_event_paranoid:
Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN). The default value is 2.
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
kernel.kptr_restrict (https://www.kernel.org/doc/Documentation/sysctl/kernel.txt), which also change perf ability to profile kernel
kptr_restrict:
This toggle indicates whether restrictions are placed on
exposing kernel addresses via /proc and other interfaces.
More recent versions of ubuntu and rhel (7.4) has also kernel.yama.ptrace_scope http://security-plus-data-science.blogspot.com/2017/09/some-security-updates-in-rhel-74.html
... use kernel.yama.ptrace_scope to set who can ptrace. The different
values have the following meaning:
# 0 - Default attach security permissions.
# 1 - Restricted attach. Only child processes plus normal permissions.
# 2 - Admin-only attach. Only executables with CAP_SYS_PTRACE.
# 3 - No attach. No process may call ptrace at all. Irrevocable until next boot.
You can temporarily set it like this:
echo 2 > /proc/sys/kernel/yama/ptrace_scope
To profile a program you should have access to debugging it, like attaching with gdb (ptrace capability) or strace. I don't know RHEL or its RBAC so you should check what is available to you. Generally perf profiling of own userspace programs on software events is available for more cases. Access to per-process cpu hardware counters, profiling of programs of other users, profiling of kernel is more limited. I can expect that correctly enabled RBAC should not allow you or root to profile kernel, as perf can inject tracing probes and leak information from kernel or other users.
Qeole says in comment that eBPF is not implemented for RHEL6 (added in RHEL7.6; with XDP - eXpress Data Path in RHEL8), so you only can try ftrace for tracing or stap (SystemTap) for advanced tracing.

How to see the process table in unix?

What's the UNIX command to see the processes table, remember that table contains:
process status
pointers
process size
user ids
process ids
event descriptors
priority
etc
The "process table" as such lives in the kernel's memory. Some systems (such as AIX, Solaris and Linux--which is not "unix") have a /proc filesystem which makes those tables visible to ordinary programs. Without that, programs such as ps (on very old systems such as SunOS 4) required elevated privileges to read the /dev/kmem (kernel memory) special device, as well as having detailed knowledge about the kernel memory layout.
Your question is open ended, and an answer to a specific question you may have had can be looked up in any man page as #Alfasin suggests in his answer. A lot depends on what you are trying to do.
As #ThomasDickey points out in his response, in UNIX and most of its' derivatives, the command for viewing processes being run in the background or foreground is in fact the ps command.
ps stands for 'process status', answering your first bullet item. But the command uses over 30 options and depending on what information you seek, and permissions granted to you by the systems administrator, you can get various types of information from the command.
For example, for the second bullet item on your list above, depending on what you are looking for, you can get information on 3 different types of pointers - the session pointer (with option 'sess'), the terminal session pointer (tsess), and the process pointer (uprocp).
The rest of your items that you have listed are mostly available as standard output of the command.
Some UNIX variants implement a view of the system process table inside of the file system to support the running of programs such as ps. This is normally mounted on /proc (see #ThomasDickey response above)
Typical reasons for understanding the working of the command include system-administration responsibilities such as tracking the origin of the initiated processes, killing runaway or orphaned processes, examining the file size of the process and setting limits where necessary, etc. UNIX developers can also use it in conjunction with ipc features, etc. An understanding of the process table and status will help with associated UNIX features such as the kvm interface to examine crash dump, etc. or to get or set the kernal state.
Hope this helps

Single node, multiple MPI tasks

I need to debug an MPI code for which I only have access to a single node/machine. The problem is the bug I am looking for only arises when running on more than node but it doesn't when running, for example, two MPI tasks in the same node, everything goes fine. I assume that my MPI implementation (mviapich2) cleverly treats tasks running on the same node by, for example, replacing network communications by IPC strategies or even memcpy.
So my question is: how could I run two MPI tasks on a single node but making MPI treat them as tasks on different nodes? Is that possible?
You can disable the MVAPICH2 shared memory device by setting the MV2_USE_SHARED_MEM environment variable to 0:
mpiexec ... -env MV2_USE_SHARED_MEM 0 ... ./executable
Make sure that your MVAPICH2 was built with the TCP/IP device, otherwise your ranks won't be able to communicate with shared memory support turned off.

System processes in Unix

One book on Unix programming says
The init process never dies. It is a normal user process, not a system process within the kernel, like the swapper, although it does run with superuser privileges.
What makes a process a system process? Is the system process embedded within the kernel code? Do all system processes run with superuser privileges?
The book probably refers to processes that run entirely in kernel mode. In some versions of Unix, there isn't any actual executable file that implements these process - the kernel "fakes" an entry into the process (and/or thread) list, just so it has something to schedule, and something to account CPU time to. In other implementations, there is an executable, but that invokes one system call that never returns.
IOW, it's your first interpretation ("embedded within the kernel code").
I think there is a confusion between kernel mode process and process with super-user privileges.
The book probably wants to say that init does not run in kernel mode, but still runs with super-administrative privileges. I hope I am correct.
There are two kind of modes - user and kernel modes. All kind of system calls execute in kernel mode so that they have access to operating system functionality.
Read more on Protected Mode

A program to kill long-running runaway programs

I manage Unix systems where, sometimes, programs like CGI scripts run forever, sometimes eating a lot of CPU time and wasting resources.
I want a program (typically invoked from cron) which can kill these runaways, based on the following criteria (combined with AND and OR):
Name (given by a regexp)
CPU time used
elapsed time (for programs which are blocked on an I/O)
I do not really know what to type in a search engine for this sort of program. I certainly could write it myself in Python but I'm lazy and there is may be a good program already existing?
(I did not tag my question with a language name since a program in Perl or Ruby or whatever would work as well)
Try using system-level quota enforcement instead. Most systems will allow to set per-process CPU time limit for different users.
Examples:
Linux: /etc/security/limits.conf
FreeBSD: /etc/login.conf
CGI scripts can usually be run under their own user ID, for example using mod_suid for Apache.
This might be something more like what you were looking for:
http://devel.ringlet.net/sysutils/timelimit/
Most of the watchdig-like programs or libraries are just trying to see whether a given process is running, so I'd say you'd better off writing your own, using the existing libraries that give out process information.

Resources