Overrunning User Process Monitoring at Unix over a period of Time

Overrunning User Process Monitoring at Unix over a period of Time - unix

I am new to Administration front. I have a requirement :
Identify the user processes (have a list of users who submits the process) which are still active and which have been submitted 3/4 days ago.
My Approach on this:
Have a text file with list of users.
Loop and find the current processes spawned by the users and store it a file.
Substitute a date variable with the format and grep.
However, I am stuck how to get : Submitted 3/4 days ago. With my code its equating to a day.
#!/bin/sh
rm -f psinfo.txt
rm -f psinfo_backdated.txt
for i in `cat user.lst `;
do
ps -ef | grep "$i" >> psinfo.txt
done
grep `date -d'2 day ago' +%b%d` psinfo.txt > psinfo_backdated.txt
I really need your comments and answer on this Gurus.

If some can tell if we can do grep of date range from a file like Less
than Apr27 format. I can make my script work. Waiting for the guru's
to respond on this.
A time format like Apr27 is not suitable for the task, also because it doesn't contain the year, which is necessary at the start of a year. Fortunately, there is a much better format for the start time of a process. Replace
ps -ef | grep "$i" >> psinfo.txt
with
ps -oetime=ELAPSED_TIME -ouser,pid,ppid,cpu,tty,time,args -u$i >>psinfo.txt
(you might want to drop fields you don't need from the second -o…). The time since start is then represented in the form
[[days-]hours:]minutes:seconds
- this you can easily filter with awk, e. g. to get processes started 3 or more days ago:
awk '{ if ($1~/-/ && int($1)>2) print }' psinfo.txt >psinfo_backdated.txt

Related

Get all lines that meet time condition from log file

Here is what my log file look like
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:05","k":"7322","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:06","k":"2115","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:07","k":"1511","h":"178","s":-53.134575556764}
There are multiple log files with similar entries and they are updated every second.
here "t" : "20:50:05" is Time.
What I want to do is, get all logs between specific time from all files from the end of the files.
I tried with tail files*.log | grep -e "20:50:07 | 20:50:05" but it does not return anything.
How do I get get all log entries between given time, starting from the end of file from all logs files?

If you're looking for a range for records, and the format of the lines is consistent, the easiest way is probably to isolate the time field, strip out the colons, and leverage the power of arithmetic operators.
A one-liner awk solution, for example:
tail files*.log | awk -v from="205006" -v to="205007" -F"\"" '{ timeasint=$4; gsub(":","",timeasint); if (timeasint >= from && timeasint <= to) print $0 }'
would get you:
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:06","k":"2115","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:07","k":"1511","h":"178","s":-53.134575556764}
Of course you couldn't span across midnight (i.e., 25:59:59 - 00:00:01), but for that you'd need dates as well as times in your log anyway.
If you had dates, my suggestion would be converting them to epoch stamps (using date -d "string" or some other suitable method) and comparing the epoch stamps as integers.

Process not listed on "ps -ef" (AIX 7.1)

I have an unusual problem involving the output from the ps -ef command on AIX 7.1.
A shell script monitors processes by parsing this output. I've noticed on two occasions a process (a Perl program) was omitted from this list. Everything I've read on the subject says this is not possible. The program in question starts via crontab at 6am and runs until 11pm, when it self terminates. I checked the output of ps -ef immediately after being omitted by the monitor script, and it displays:
user 1249864 9569338 0 06:00:00 - 0:19 /usr/bin/perl -w /path/to/omittedProgram.pl
... which means it's the same process that was started at 6am. The program did not terminate, then restart.
What is causing it to be omitted from the ps -ef output?
Edit: This is the program that examines the output of ps -ef, which has been running successfully for about five years. I've only noticed this problem twice, but both have been in the last 2 months:
# set global variables
PROCESS_FILE=/tmp/processList.txt
TEMP_FILE=/tmp/greppedProcesses.tmp
BOX=`uname -n`
DATE=`date`
EMAIL_LIST="Support#email.address"
# Get list of running processes
ps -ef > $PROCESS_FILE
checkProcess() {
PROCESS_NAME=$1
PROCESS_ABBREVIATION=$2
PROCESS_COUNT=$3
UNIQUE_PROCESS_IDENTIFIER=$4
GREPPED_LINES=$TEMP_FILE-$PROCESS_ABBREVIATION
grep $UNIQUE_PROCESS_IDENTIFIER $PROCESS_FILE | grep -v grep > $GREPPED_LINES
NUM=`cat $GREPPED_LINES | wc -l`
if [[ $NUM -ne $PROCESS_COUNT ]]
# Incorrect number of processes running!
then MESSAGE=`printf "The \"$PROCESS_NAME\" process count is %1d, but it should be $PROCESS_COUNT!!!" $NUM`
echo "Monitor - starting on $DATE\n\n$MESSAGE\n\n`cat $GREPPED_LINES`" | mail -s "Problem with $PROCESS_NAME on $BOX" $EMAIL_LIST
fi
# Delete the temp file
rm $GREPPED_LINES
}
checkProcess "Full Name of Program" "Program Abbreviation" <expected number of processes running> "Unique string to identify program in ps output"
checkProcess ... (for other processes) ...
exit 0

This might be a long shot in your case but I had same experience with "ps -ef" in the past (don't remember the exact OS type where I seen it, but my script had to work on any Linux, AIX, Solaris and HP-UX).
The "ps -ef" output might be limited to a certain number of columns when used inside a script executed without a terminal. The user, pid, ppid, cputime columns are dynamic and breaking the format sometimes (when the data is larger then the reserved space).
For example if the PID of the process gets to large then the name of the process might be "cut" so that it doesn't appear in the already limited number of column displayed by "ps -ef" then your monitor script would fail.
You could try to keep the file containing the "ps -ef" output and check if it's this problem. No need to wait for when the issue happens, just check if you have the extra long process names in the file (anything longer then the process you're looking for).
My workaround for this problem is to specify a large enough number of columns to be used, like this: COLUMNS=8192 ps -ef > file.out the variable is set just for this 1 purpose.

I just heard from my server support team that the AIX 7.1 TL4 SP4 patch will fix this! We're installing it on our servers now and hopefully this won't happen again.

nested for loop too slow: 1MN record traversal

I've huge file count, around 200,000 records in a file. I have been testing some cases where in I have to figure out the naming pattern of the files match to some specific strings. Here's how I preceded-
Test Strings, I stored in a file (let's say for one case, they are 10). The actual file that contains string records, separated by newline; totaling upto 200,000 records. To check if the test string patterns are present in the large file, I wrote a small nested for loop.
for i in `cat TestString.txt`
do
for j in `cat LargeFile.txt`
do
if [[ $i == $j ]]
then
echo "Match" >> result.txt
fi
done
done
This nested loop actual has to do the traversal (if I'm not wrong in the concepts), 10x200000 times. Normally I don't see that's too much of a load on the server, but the time taken is like all along. The excerpt is running for the past 4 hours, with ofcourse some "matched" results.
Does anyone has any idea on speeding this up? I've found so many answers with python or perl touch, but I'm honestly searching for something in Unix.
Thanks

Try the following:
grep -f TestString.txt LargeFile.txt >> result.txt

Check out grep
while read line
do
cat LargeFile.txt | grep "$line" >> result.txt
done < TestString.txt
grep will output any matching strings. This may be faster. Note that your TestString.txt file should not have any blank lines or grep will return everything from LargeFile.txt.

unix command to read line from a file by passing line number

I am looking for a unix command to get a single line by passing line number to a big file (with around 5 million records). For example to get 10th line, I want to do something like
command file-name 10
Is there any such command available? We can do this by looping through each record but that will be time consuming process.

This forum entry suggests:
sed -n '52p' (file)
for printing the 52th line of a file.

Going forward, There are a lot of ways to do it, and other related stuffs.
If you want multiple lines to be printed,
sed -n -e 'Np' -e 'Mp'
Where N and M are lines which will only be printed. Refer this 10 Awesome Examples for Viewing Huge Log Files in Unix

command | sed -n '10p'
or
sed -n '10p' file

You could do something like:
head -n<lineno> <file> | tail -n1
That would give you the <lineno> lines, then only give the last line of output (your line).
Edit: It seems all the solutions here are pretty slow. However, by definition you'll have to iterate through all the records since the operating system has no way to parse line-oriented files since files are byte-oriented. (In some sense, all these programs are going to do is count the number of \n or \r characters.) In lieu of a great answer, I'll also present the timings on my system of several of these commands!
[mjschultz#mawdryn ~]$ time sed -n '145430980p' br.txt
0b10010011111111010001101111010111
real 0m25.871s
user 0m17.315s
sys 0m2.360s
[mjschultz#mawdryn ~]$ time head -n 145430980 br.txt | tail -n1
0b10010011111111010001101111010111
real 0m41.112s
user 0m39.385s
sys 0m4.291s
[mjschultz#mawdryn ~]$ time awk 'NR==145430980{print;exit}' br.txt
0b10010011111111010001101111010111
real 2m8.835s
user 1m38.076s
sys 0m3.337s
So, on my system, it looks like the sed -n '<lineno>p' <file> solution is fastest!

you can use awk
awk 'NR==10{print;exit}' file
Put an exit after printing the 10th line so that awk won't process the 5 million records file further.

How to tail -f a file (or similar) for a specified interval?

I am working on adding some nagios alerts to our system -- some of which will monitoring the rate of certain events hitting the nginx/apache logs (or parsing values from those logs.) The way I've approached the problem so far is with a simple shell script tail -f'ing the log for 25 seconds or so to a temporary file, killing the process, and then running awk, etc over the temp file. The goal here being to get a log "sample" over 25 seconds and then perform analysis.
This is less than ideal obviously because of the increase in disk IO due to these temp files -- what I really would like is an "enhanced" tail -f that would terminate the pipe cleanly after a certain number of seconds. Ie:
tail -f --interval '5 seconds' | grep "/serve"
Would tail the log for 5 seconds and show me all the lines that have "/serve".
I'd imagine I can whip up a ruby script to do this pretty quickly, but I wanted to make sure there wasn't a more unixy way to accomplish this. At a high level, is there a better way to be taking samples of a log from the last N seconds (and no, I'd rather not be parsing timestamps, etc.)

Found the solution. "apt-get install timeout" :)
Edit: Actually this kills tail, doesn't cause it to exit gracefully, so we lose the entire pipe. What I want to work is:
timeout -15 5 tail -f /mnt/log/nginx/nginx-access.log | grep '/javascripts' | wc -l
To tell me how many javascript files served in last 5 seconds, etc.

A slightly different approach:
(tail -f /var/log/messages & P=$! ; sleep 5; kill -9 $P) | grep /serve

I'm thinking that, as a Nagiios user myself, you do not want probe processes pausing for arbitrary amounts of time. That's going to, in the worst case, make Nagios check other things less often, or "clump" the checks.
What about a script that runs quickly (instantly) and parses the last few lines of the file, returning only interesting things with a timestamp later than a given time?

GNU's tail has a --pid flag that can be used for this (tail will exit once a process with that PID no longer exists). Just start up a sleep process in the background and tell tail to exit when it does. Like so:
sleep 5 & tail --pid=$! -f /var/log/system.log
tail will exit with a 0 exit code when time is out.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Overrunning User Process Monitoring at Unix over a period of Time - unix

Related

Get all lines that meet time condition from log file

Process not listed on "ps -ef" (AIX 7.1)

nested for loop too slow: 1MN record traversal

unix command to read line from a file by passing line number

How to tail -f a file (or similar) for a specified interval?

Categories

Resources