runc container cpu usage - cpu-usage

Where to look for cpu usage for a specific runc container? .
There is no file present in /proc/<cid>/ (<cid> is the pid obtained from runc list command) by any cpu related name which gives cpu usage for that specific container.
In /sys/fs/cgroup there are files present under cpu, cpuacct, cpu,cpuacct directories. I don't see if there's any way to extract cpu usage form those files for a specific container.
Is there any way i can get this information?

Got this, There are folders created by the name of the container which you are running under /sys/fs/cgroup/cpu,cpuacct/user.slice/<container_folder>/cpuacct.usage, cpuacct.usage gives information about the cpu time used by that container.
Note: Specifically talking about runc containers, don't know about others.

Related

MemSQL: High CPU usage

My cluster has one MASTER AGGREGATOR and one LEAF. After running two months, the CPU usage in LEAF is very high, almost at 100%. So, is this normal?
By the way, its size is 545 MB for table data.
This is not normal for MemSQL operation. Note that the Ops console is showing you all CPU use on that host, not just what MemSQL is using. I recommend running 'top' or similar to determine what process(es) are consuming resources.
You can also run 'SHOW PROCESSLIST' on any node to see if there is a long-running MemSQL process.

On what parameters boot sequence varies?

Does every Unix flavor have same boot sequence code ? I mean there are different kernel version releases going on for different flavors, so is there possibility of different code for boot sequence once kernel is loaded? Or they keep their boot sequence (or code) common always?
Edit: I want to know into detail how boot process is done.
Where does MBR finds a GRUB? How this information is stored? Is it by default hard-coded?
Is there any block level partion architecture available for boot sequence?
How GRUB locates the kernel image? Is it common space, where kernel image is stored?
I searched a lot on web; but it shows common architecture BIOS -> MBR -> GRUB -> Kernel -> Init.
I want to know details of everything. What should I do to know this all? Is there any way I could debug boot process?
Thanks in advance!
First of all, the boot process is extremely platform and kernel dependent.
The point is normally getting the kernel image loaded somewhere in memory and run it, but details may differ:
where do I get the kernel image? (file on a partition? fixed offset on the device? should I just map a device in memory?)
what should be loaded? (only a "core" image? also a ramdisk with additional data?)
where should it be loaded? Is additional initialization (CPU/MMU status, device initialization, ...) required?
are there kernel parameters to pass? Where should they be put for the kernel to see?
where is the configuration for the bootloader itself stored (hard-coded, files on a partition, ...)? How to load the additional modules? (bootloaders like GRUB are actually small OSes by themselves)
Different bootloaders and OSes may do this stuff differently. The "UNIX-like" bit is not relevant, an OS starts being ostensibly UNIXy (POSIX syscalls, init process, POSIX userland,...) mostly after the kernel starts running.
Even on common x86 PCs the start differs deeply between "traditional BIOS" and UEFI mode (in this last case, the UEFI itself can load and start the kernel, without additional bootloaders being involved).
Coming down to the start of a modern Linux distribution on x86 in BIOS mode with GRUB2, the basic idea is to quickly get up and running a system which can deal with "normal" PC abstractions (disk partitions, files on filesystems, ...), keeping at minimum the code that has to deal with hardcoded disk offsets.
GRUB is not a monolithic program, but it's composed in stages. When booting, the BIOS loads and executes the code stored in the MBR, which is the first stage of GRUB. Since the amount of code that can be stored there is extremely limited (few hundred bytes), all this code does is to act as a trampoline for the next GRUB stage (somehow, it "boots GRUB");
the MBR code contains hard-coded the address of the first sector of the "core image"; this, in turn, contains the code to load the rest of the "core image" from disk (again, hard-coded as a list of disk sectors);
Once the core image is loaded, the ugly work is done, since the GRUB core image normally contains basic file system drivers, so it can load additional configuration and modules from regular files on the boot partition;
Now what happens depends on the configuration of the specific boot entry; for booting Linux, usually there are two files involved: the kernel image and the initrd:
initrd contains the "initial ramdrive", containing the barebones userland mounted as / in the early boot process (before the kernel has mounted the filesystems); it mostly contains device detection helpers, device drivers, filesystem drivers, ... to allow the kernel to be able to load on demand the code needed to mount the "real" root partition;
the kernel image is a (usually compressed) executable image in some format, which contains the actual kernel code; the bootloader extracts it in memory (following some rules), puts the kernel parameters and initrd memory position in some memory location and then jumps to the kernel entrypoint, whence the kernel takes over the boot process;
From there, the "real" Linux boot process starts, which normally involves loading device drivers, starting init, mounting disks and so on.
Again, this is all (x86, BIOS, Linux, GRUB2)-specific; points 1-2 are different on architectures without an MBR, and are are skipped completely if GRUB is loaded straight from UEFI; 1-3 are different/avoided if UEFI (or some other loader) is used to load directly the kernel image. The initrd thing may be not involved if the kernel image already bundles all that is needed to start (typical of embedded images); details of points 4-5 are different for different OSes (although the basic idea is usually similar). And, on embedded machines the kernel may be placed directly at a "magic" location that is automatically mapped in memory and run at start.

Confusion as to how fork() and exec() work

Consider the following:
Where I'm getting confused is in the step "child duplicate of parent". If you're running a process such as say, skype, if it forks, is it copying skype, then overwriting that process copy with some other program? Moreover, what if the child process has memory requirements far different from the parent process? Wouldn't assigning the same address space as the parent be a problem?
I feel like I'm thinking about this all wrong, perhaps because I'm imagining the processes to be entire programs in execution rather than some simple instruction like "copy data from X to Y".
All modern Unix implementations use virtual memory. That allows them to get away with not actually copying much when forking. Instead, their memory map contains pointers to the parent's memory until they start modifying it.
When a child process exec's a program, that program is copied into memory (if it wasn't already there) and the process's memory map is updated to point to the new program.
fork(2) is difficult to understand. It is explained a lot, read also fork (system call) wikipage and several chapters of Advanced Linux Programming. Notice that fork does not copy the running program (i.e. the /usr/bin/skype ELF executable file), but it is lazily copying (using copy-on-write techniques - by configuring the MMU) the address space (in virtual memory) of the forking process. Each process has its address space (but might share some segments with some other processes, see mmap(2) and execve(2) ....). Since each process has its own address space, changes in the address space of one process does not (usually) affect the parent process. However, processes may have shared memory but then need to synchronize: see shm_overview(7) & sem_overview(7)...
By definition of fork, just after the fork syscall the parent and child processes have nearly equal state (in particular the address space of the child is a copy of the address space of the parent). The only difference being the return value of fork.
And execve is overwriting the address space and registers of the current process.
Notice that on Linux all processes (with a few exceptions, like kernel started processes such as /sbin/modprobe etc) are obtained by fork-ing -from the initial /sbin/init process of pid 1.
At last, system calls -listed in syscalls(2)- like fork are an elementary operation from the application's point of view, since the real processing is done inside the Linux kernel. Play with strace(1). See also this answer and that one.
A process is often some machine state (registers) + its address space + some kernel state (e.g. file descriptors), etc... (but read about zombie processes).
Take time to follow all the links I gave you.

Upgrading an Amazon EC2 instance from t1.micro to medium, instance storage remains same

We have been using micro instance till our development phase. But now, as we are about to go live, we want to upgrade our instance to type medium.
I followed these simple steps: stop the running instance, change instance type to medium and then start the instance again. I can see the instance is upgraded in terms of the memory. But the storage still shows to be 8GB. But according to the configuration mentioned, a m1.medium type instance should have 1x410GB storage.
Am I doing anything wrong or missing out something? Please help!
Keep in mind, EBS storage (which you are currently using) and Instance storage (which is what you are looking for) are two different things in EC2.
EBS storage is similar to a SAN volume. It exists outside of the host. You can create multiple EBS volumes of up to 1TB and attach them to any instance size. Smaller instances have lower available bandwidth to EBS volumes so they will not be able to effectively take advantage of all that many volumes.
Instance storage is essentially hard drives attached to the host. While its included in the instance cost, it comes with some caveats. It is not persistent. If you stop your instance, or the host fails for any reason, the data stored on the instance store will be lost. For this reason, it has to be explicitly enabled when the instance is first launched.
Generally, its not recommended that to use instance storage unless you are conformable with and have designed your infrastructure around the non-persistance of instance storage.
The sizes mentioned for the instance types are just these defaults. If you create an image from a running micro instance, it will get that storage size as default, even if this image later is started as medium.
But you can change the storage size when launching the instance:
You also can change the default storage size when creating an image:
WARNING: This will resize the storage size. It will not necessarily resize the partition existing on it nor will it necessarily resize the file system on that partition. On Linux it resized everything automagically (IIRC), on a Windows instance you will have to resize your stuff yourself. For other OSes I have no idea.
I had a similar situation. I created a m2.medium instance of 400 GB, but when I log into the shell and issue the command
df -h
... it shows an 8 GB partition.
However, the command
sudo fdisk -l
showed that the device was indeed 400 GB. The problem is that Amazon created a default 8 GB partition on it, and that partition needs to be expanded to the full size of the device. The command to do this is:
sudo resize2fs -f /dev/xvda1
where /dev/xvda1 is the mounted root volume. Use the 'df -h' command to be sure you have the right volume name.
Then simply reboot the instance, log in again, and you'll see the fdisk command now says there's nearly 400 GB available space. Problem solved.

numactl --physcpubind processor migration

I'm trying to launch my mpi-application (Open MPI 1.4.5) with numactl. Since apparently the load balancing using --cpu-nodebind doesn't distribute my processes in a round-robbin manner among the available nodes I wanted to specifically restrict my processes to a closed set of cpus. In this way I plan to ensure a balanced load between the nodes in terms of the number of threads running on each node. --physcpubind seems to do the job according to the numactl manual.
The problem is - from what I could extract from this post - that, using --phycpubind, processes are allowed to migrate inside this cpu-set. Another problem is, that some cpus from this set remain unused while others are being assigned two or more processes and thus running with only 50% or less CPU usage. Why is this happening and is there any workaround for this phenomenon?
Kind regards
I think you can try this (It worked for me):
numactl --cpunodebind={cpu-core} chrt -r 98 {your-app}
The chrt command lets you establish a scheduling policy, you can choose among the following:
Policy options:
-b, --batch set policy to SCHED_BATCH
-d, --deadline set policy to SCHED_DEADLINE
-f, --fifo set policy to SCHED_FIFO
-i, --idle set policy to SCHED_IDLE
-o, --other set policy to SCHED_OTHER
-r, --rr set policy to SCHED_RR (default)
EDIT: The number 98 is the priority, in my case I am running a time critical process.
Also, you may need to isolate the cpus you are using to prevent the scheduler from assigning/moving processes to/from them.

Resources