calculate network traffic per process zabbix - unix

I'm using Zabbix 3.2. I want to calculate the traffic statistics on network interface based on the program name.
Like for getting total incoming traffic, we use net.if.in[if,] , by same way is it possible to retreive traffic utilized by each running process like in Nethogs. If so, pls share the Item key. Or, if there is any sh script to do the same.
Thanks in advance.

You haven't specified the operating system, but the question is tagged 'unix' and you mention nethogs and shell scripts - I'll assume Linux.
It might be a bit too much to monitor traffic for all of the processes - there could be hundreds of them, and even though many would not use the network, on a server system many would.
It is also important how you want to structure the data. For example, do you want to split it up per process name, or per individual process? Or maybe even process name and its parameters - in case of running several Java JVMs on the same box. You would have to decide on all this, as it will affect the data collection.
As sending data to Zabbix, the simplest way on the Zabbix side would be monitoring by process name only, and creating items in advance, if you know all the process names you will be interested in. If you do not know them, you will have to use Zabbix low level discovery to automatically create items as new processes appear.
And we finally get to the data collection part. Here, it indeed might be the easiest to use nethogs (keeping in mind that UDP is not supported). You can run nethogs in "trace" mode, which is pretty much the same as the "batch" mode for top. In this mode, output is simply printed to stdout.
nethogs -c 1 -d 60 -t
Here, the parameters mean:
-c - how many times to print output
-d - for how long to sleep between iterations, including the time before the first output
-t - tracing or batch mode
Nethogs also supports setting traffic output type with the -v flag. You'd have to decide how you want to visualise this:
0 - KB/s
1 - total KB
2 - total B
3 - total MB
With Zabbix, you probably will not want to use modes 1 or 3 - it is better to store data in bytes and allow Zabbix to add the multiplier as needed. In case of the KB/s mode (0), it is probably worth adding an item multiplier of 1024 to store data in bytes and again benefiting from the automatic unit application at Zabbix. Note that in any case you will want to run nethog instances back-to-back, to avoid windows where you are not collecting data. One way to minimise possibility of any windows would be running nethogs constantly (without supplying -c option) and redirecting output to a file. A script would then parse the file and send the data to Zabbix with zabbix_sender.
You wouldn't run this as a normal Zabbix user parameter, neither as an active, nor passive check - it would block for too long. Consider using atd (see this howto) or nohup to launch a script that sends data to Zabbix with zabbix_sender instead.
Note that you must run nethogs as root - use sudo for that.
I'm not aware of any existing scripts for this, but the following might get you started:
nethogs -c 1 -d 1 -t | awk 'BEGIN {FS="[[:space:]/]+"}; /Refreshing/,0 \
{if ($1 != "Refreshing:" && $1 != "unknown") {print $(NF-4), $(NF-1), $NF}}'
Here, awk grabs only program lines and prints out program name and sent/received traffic.

Related

unix pipe is streaming?

zcat big.txt.gz | split -l 1000000 - prefix
where big.txt.gz is 150 GB, say it has ~1 billion lines.
In this case, does the unix pipe "stream" the data into split, or is the zcat operation completed, and then split is performed afterwards?
It was not clear to me from other pages if the above command would crash because it couldn't hold all of the gunzipped data in the pipe buffer before executing split, or if the gunzipped data would be "streamed" into split.
In general, the streaming behavior of unix pipe is unclear - when does pipe wait until all previous operations are finished before feeding input into the next commands stdin?
For example, if I were to link several more commands, would it crash due to lack of memory? e.g.
zcat big.txt.gz | tr 'a' 'b' | sed 's/foo/bar/g' | grep 'hello'
A pipe has a limited capacity. [...] Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
I'm not sure why there's any doubt here: the processes are running simultaneously, and the upstream process is writing while the downstream process is reading. Or at least that's the ideal specifically requested by this man page.
Now, it's possible that a given command may try to suck in all of its input before doing anything, and that too large an input may crash that command. But that's very different from the pipe buffer getting overfilled.

Is using the -L flag and a addprocs script the more powerful version of -p and --machinefile?

So I have a moderately complex set of requirements for my worker processes.
I want to use a the master slave topology, and a nondefault working directory.
I also want to mix both local and remote workers.
As far as I can tell from readying the --machine-file section of the documentation.
It will not let me do that.
So I am looking at the -L <file parameter
>julia -h
...
-L, --load Load immediately on all processors
...
So if I do not use the -p or --machine-file` flags, then there is initially only one processer so the all processors just mean on the only processor.
So I tried this out
start_workers.jl
addprocs([
("cluster_c4_1",:auto),
("cluster_c4_2",:auto)
],
dir="/mnt/",
topology=:master_slave
)
addprocs(
dir="/mnt/",
topology=:master_slave
)
test.jl
println("*************")
println(workers())
println("-------------")
Running it:
>julia -L start_workers.jl pl.jl
*************
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
-------------
So it looks all good, got my 20 workers.
Have I done anything unreasonable? Is this the best way?
That's exactly how I'm deploying it on a HPC cluster under Torque scheduler. In fact I'm in the process of re-writing the the cluster manager to support more options when adding processes through the Torque scheduling systems in particular, so I've spent quite a bit of time looking into this.
You might also want to be aware there are various ClusterManagers, Pkg.add("ClusterManagers") that extend the ability of addprocs under a variety of environments, such as when you need to request the resources from a scheduler. It looks like passwordless ssh is possible for you, so the default cluster manager is sufficient in your case.
I don't believe there is any way of defining the extra topology and directory parameters on the command line, so your approach is correct.

How to see the process table in unix?

What's the UNIX command to see the processes table, remember that table contains:
process status
pointers
process size
user ids
process ids
event descriptors
priority
etc
The "process table" as such lives in the kernel's memory. Some systems (such as AIX, Solaris and Linux--which is not "unix") have a /proc filesystem which makes those tables visible to ordinary programs. Without that, programs such as ps (on very old systems such as SunOS 4) required elevated privileges to read the /dev/kmem (kernel memory) special device, as well as having detailed knowledge about the kernel memory layout.
Your question is open ended, and an answer to a specific question you may have had can be looked up in any man page as #Alfasin suggests in his answer. A lot depends on what you are trying to do.
As #ThomasDickey points out in his response, in UNIX and most of its' derivatives, the command for viewing processes being run in the background or foreground is in fact the ps command.
ps stands for 'process status', answering your first bullet item. But the command uses over 30 options and depending on what information you seek, and permissions granted to you by the systems administrator, you can get various types of information from the command.
For example, for the second bullet item on your list above, depending on what you are looking for, you can get information on 3 different types of pointers - the session pointer (with option 'sess'), the terminal session pointer (tsess), and the process pointer (uprocp).
The rest of your items that you have listed are mostly available as standard output of the command.
Some UNIX variants implement a view of the system process table inside of the file system to support the running of programs such as ps. This is normally mounted on /proc (see #ThomasDickey response above)
Typical reasons for understanding the working of the command include system-administration responsibilities such as tracking the origin of the initiated processes, killing runaway or orphaned processes, examining the file size of the process and setting limits where necessary, etc. UNIX developers can also use it in conjunction with ipc features, etc. An understanding of the process table and status will help with associated UNIX features such as the kvm interface to examine crash dump, etc. or to get or set the kernal state.
Hope this helps

numactl --physcpubind processor migration

I'm trying to launch my mpi-application (Open MPI 1.4.5) with numactl. Since apparently the load balancing using --cpu-nodebind doesn't distribute my processes in a round-robbin manner among the available nodes I wanted to specifically restrict my processes to a closed set of cpus. In this way I plan to ensure a balanced load between the nodes in terms of the number of threads running on each node. --physcpubind seems to do the job according to the numactl manual.
The problem is - from what I could extract from this post - that, using --phycpubind, processes are allowed to migrate inside this cpu-set. Another problem is, that some cpus from this set remain unused while others are being assigned two or more processes and thus running with only 50% or less CPU usage. Why is this happening and is there any workaround for this phenomenon?
Kind regards
I think you can try this (It worked for me):
numactl --cpunodebind={cpu-core} chrt -r 98 {your-app}
The chrt command lets you establish a scheduling policy, you can choose among the following:
Policy options:
-b, --batch set policy to SCHED_BATCH
-d, --deadline set policy to SCHED_DEADLINE
-f, --fifo set policy to SCHED_FIFO
-i, --idle set policy to SCHED_IDLE
-o, --other set policy to SCHED_OTHER
-r, --rr set policy to SCHED_RR (default)
EDIT: The number 98 is the priority, in my case I am running a time critical process.
Also, you may need to isolate the cpus you are using to prevent the scheduler from assigning/moving processes to/from them.

Sender and receiver to transfer files over ssh on request?

I created a program that iterates over a bunch of files and invokes for some of them:
scp <file> user#host:<remotefile>
However, in my case, there may be thousands of small files that need to transferred, and scp is opening a new ssh connection for each of them, which has quite some overhead.
I was wondering if there is no solution where I keep one process running that maintains the connection and I can send it "requests" to copy over single files.
Ideally, I'm looking for a combination of some sender and receiver program, such that I can start a single process (1) at the beginning:
ssh user#host receiverprogram
And for each file, I invoke a command (2):
senderprogram <file> <remotefile>
and pipe the output of (2) to the input of (1), and this would cause the file to be transferred. In the end, I can just send process (1) some signal to terminate.
Preferably the sender and receiver programs are open source C programs for Unix. They may communicate using a socket instead of a pipe, or any other creative solution.
However, it is an important constraint that each file gets transferred at the moment I iterate over it: it is not acceptable to collect a list of files and then invoke one instance of scp to transfer all the files at once at the end. Also, I have only simple shell access to the receiving host.
Update: I found a solution for the problem of the connection overhead using the multiplexing features of ssh, see my own answer below. Yet, I'm starting a bounty because I'm curious to find if there exists a sender/receiver program as I describe here. It seems there should exist something that can be used, e.g. xmodem/ymodem/zmodem?
I found a solution from another angle. Since version 3.9, OpenSSH supports session multiplexing: a single connection can carry multiple login or file transfer sessions. This avoids the set-up cost per connection.
For the case of the question, I can first open a connection with sets up a control master (-M) with a socket (-S) in a specific location. I don't need a session (-N).
ssh user#host -M -S /tmp/%r#%h:%p -N
Next, I can invoke scp for each file and instruct it to use the same socket:
scp -o 'ControlPath /tmp/%r#%h:%p' <file> user#host:<remotefile>
This command starts copying almost instantaneously!
You can also use the control socket for normal ssh connections, which will then open immediately:
ssh user#host -S /tmp/%r#%h:%p
If the control socket is no longer available (e.g. because you killed the master), this falls back to a normal connection. More information is available in this article.
This way would work, and for other things, this general approach is more or less right.
(
iterate over file list
for each matching file
echo filename
) | cpio -H newc -o | ssh remotehost cd location \&\& | cpio -H newc -imud
It might work to use sftp instead of scp, and to place it into batch mode. Make the batch command file a pipe or UNIX domain socket and feed commands to it as you want them executed.
Security on this might be a little tricky at the client end.
Have you tried sshfs?
You could:
sshfs remote_user#remote_host:/remote_dir /mnt/local_dir
Where
/remote_dir was the directory you want to send files to on the system you are sshing into
/mnt/local_dir was the local mount location
With this setup you can just cp a file into the local_dir and it would be sent over sftp to remote_host in its remote_dir
Note that there is a single connection, so there is little in the way of overhead
You may need to use the flag -o ServerAliveInterval=15 to maintain an indefinite connection
You will need to have fuse installed locally and an SSH server supporting (and configured for) sftp
May be you are looking for this:
ZSSH
zssh (Zmodem SSH) is a program for interactively transferring files to a remote machine while using the secure shell (ssh). It is intended to be a convenient alternative to scp , allowing to transfer files without having to open another session and re-authenticate oneself.
Use rsync over ssh if you can collect all the files to send in a single directory (or hierarchy of directories).
If you don't have all the files in a single place, please give some more informations as to what you want to achieve and why you can't pack all the files into an archive and send that over. Why is it so vital that each file is sent immediately? Would it be OK if the file was sent with a short delay (like when 4K worth of data has accumulated)?
It's a nice little problem. I'm not aware of a prepackaged solution, but you could do a lot with simple shell scripts. I'd try this at the receiver:
#!/bin/ksh
# this is receiverprogram
while true
do
typeset -i length
read filename # read filename sent by sender below
read size # read size of file sent
read -N $size contents # read all the bytes of the file
print -n "$contents" > "$filename"
done
At the sender side I would create a named pipe and read from the pipe, e.g.,
mkfifo $HOME/my-connection
ssh remotehost receiver-script < $HOME/my-connection
Then to send a file I'd try this script
#!/bin/ksh
# this is senderprogram
FIFO=$HOME/my-connection
localname="$1"
remotename="$2"
print "$remotename" > $FIFO
size=$(stat -c %s "$localname")
print "$size" > $FIFO
cat "$localname" > $FIFO
If the file size is large you probably don't want to read it at one go, so something on the order of
BUFSIZ=8192
rm -f "$filename"
while ((size >= BUFSIZ)); do
read -N $BUFSIZE buffer
print -n "$buffer" >> "$filename"
size=$((size - BUFSIZ))
done
read -N $size buffer
print -n "$contents" >> "$filename"
Eventually you'll want to extend the script so you can pass through chmod and chgrp commands. Since you trust the sending code, it's probably easiest to structure the thing so that the receiver simply calls shell eval on each line, then send stuff like
print filename='"'"$remotename"'"' > $FIFO
print "read_and_copy_bytes " '$filename' "$size" > $FIFO
and then define a local function read_and_copy_bytes. Getting the quoting right is a bear, but otherwise it should be straightforward.
Of course, none of this has been tested! But I hope it gives you some useful ideas.
Seems like a job for tar? Pipe its output to ssh, and on the other side pipe the ssh output back to tar.
I think that the GNOME desktop uses a single SSH connection when accessing a share through SFTP (SSH). I'm guessing that this is what's happening because I see a single SSH process when I access a remote share this way. So if this is true you should be able to use the same program for this purpose.
The new version of GNOME used GVFS through GIO in order to perform all kind of I/O through different backends. The Ubuntu package gvfs-bin provides various command line utilities that let you manipulate the backends from the command line.
First you will need to mount your SSH folder:
gvfs-mount sftp://user#host/
And then you can use the gvfs-copy to copy your files. I think that all file transfers will be performed through a single SSH process. You can even use ps to see which process is being used.
If you feel more adventurous you can even write your own program in C or in some other high level language that provides an API to GIO.
One option is Conch is a SSH client and server implementation written in Python using the Twsited framework. You could use it to write a tool which accepts requests via some other protocol (HTTP or Unix domain sockets, FTP, SSH or whatever) and triggers file transfers over a long running SSH connection. In fact, I have several programs in production which use this technique to avoid multiple SSH connection setups.
There was a very similar question here a couple of weeks ago. The accepted answer proposed to open a tunnel when ssh'ing to the remote machine and to use that tunnel for scp transfers.
Perhapse CurlFTPFS might be a valid solution for you.
It looks like it just mounts an external computer's folder to your computer via SFTP. Once that's done, you should be able to use your regular cp commands and everything will be done securely.
Unfortunately I was not able to test it out myself, but let me know if it works for ya!
Edit 1: I have been able to download and test it. As I feared it does require that the client have a FTP server. However, I have found another program which does has exactly the same concept as what you are looking for. sshfs allows you to connect to your client computer without needing any special server. Once you have mounted one of their folders, you can use your normal cp commands to move whatever files you need to more. Once you are done, it should then be a smile matter of umount /path/to/mounted/folder. Let me know how this works out!
rsync -avlzp user#remotemachine:/path/to/files /path/to/this/folder
This will use SSH to transfer files, in a non-slow way
Keep it simple, write a little wrapper script that does something like this.
tar the files
send the tar-file
untar on the other side
Something like this:
tar -cvzf test.tgz files ....
scp test.tgz user#other.site.com:.
ssh user#other.site.com tar -xzvf test.tgz
/Johan

Resources