Unix get lines between timestamps on multiple files - unix

I keep daily log files (like logfile-2022-01-01.log, logfile-2022-01-02.log, and so on).
Every line on the files starts with a timestamp, e.g: [2022-05-01 10:00:34.550] ...some strings.... --> this being YYYY-MM-DD HH:MM:SS.sss
I need to filter all the lines between two timestamps, this could mean search in more than one file.
For instance:
logfile-2022-01-01.log
[2022-01-01 00:00:25.550] here comes some logging info
[2022-01-01 00:02:25.550] here comes some more logging info
....
[2022-01-01 23:58:29.480] here comes some more logging info
logfile-2022-01-02.log
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
....
[2022-01-02 23:59:29.480] here comes some more logging info from the next day
I wish to extract the lines between 2022-01-01 20:00:00 (this is contained in the first file) and 2022-01-02 08:00:00 (this is contained in the second file).
I'm expecting to get something like this:
[2022-01-01 23:58:29.480] here comes some more logging info
[2022-01-02 00:01:25.550] here comes some logging info from the next day
[2022-01-02 00:04:25.550] here comes some more logging info from the next day
Any ideas on how to achieve this?
So far I've tried using this:
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
'/^2022-01-01 20:00/,/^2022-01-02 08:00/ {print}'
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk
'$1" "$2 > "2022-01-01 20:00" && $1" "$2 < "2022-01-02 08:00"'
grep logfile-2022-01-01.log logfile-2022-01-02.log | grep "here comes some" | awk -v beg='2022-01-01 20:00' -v end='2022-01-02 08:00' '{cur=$1" "$2} beg<=cur && cur<=end'
Both run without errors but didn't print anything

Adding some lines to both input files so we can confirm matching on specific strings; also updating file names to match the timestamp date (05 instead of 01):
$ head logfile*
==> logfile-2022-05-01.log <==
[2022-05-01 00:00:25.550] here comes some logging info
[2022-05-01 00:02:25.550] here comes some more logging info
[2022-05-01 23:56:30.332] here comes more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
==> logfile-2022-05-02.log <==
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:02:39.224] here comes logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
Tweaking one of OP's current set of code:
$ cat logfile-2022-05-01.log logfile-2022-05-02.log | grep "here comes some" | awk -F'[][]' '$2 >= "2022-05-01 20:00" && $2 <= "2022-05-02 08:00"'
Where:
replace first grep with cat
add awk dual field delimiter of ] and [
modifyawk to only compare the 2nd field
modify awk tests to use inclusive ranges
update file names and datetime stamps for May (05) instead of Jan (01)
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day
While this generates OP's desired results (per comment OP has stated duplicate lines are ok), once you decide to use awk there's typically no need for separate cat and grep calls.
One unified awk idea that utilizes input variables while also removing duplicate (consecutive) lines:
start='2022-05-01 20:00:00'
end='2022-05-02 08:00:00'
string='here comes some'
awk -F'[][]' -v start="$start" -v end="$end" -v str="$string" '
$2 >= start { printme=1 }
$2 > end ".000" { printme=0 } # assumes "end" does not include milliseconds
printme && $0 ~ str { if ($0==last) next # skip duplicate consecutive lines
print last=$0
}
' logfile-2022-05-??.log
This generates:
[2022-05-01 23:58:29.480] here comes some more logging info
[2022-05-02 00:01:25.550] here comes some logging info from the next day
[2022-05-02 00:04:25.550] here comes some more logging info from the next day

Related

Get all lines that meet time condition from log file

Here is what my log file look like
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:05","k":"7322","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:06","k":"2115","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:07","k":"1511","h":"178","s":-53.134575556764}
There are multiple log files with similar entries and they are updated every second.
here "t" : "20:50:05" is Time.
What I want to do is, get all logs between specific time from all files from the end of the files.
I tried with tail files*.log | grep -e "20:50:07 | 20:50:05" but it does not return anything.
How do I get get all log entries between given time, starting from the end of file from all logs files?
If you're looking for a range for records, and the format of the lines is consistent, the easiest way is probably to isolate the time field, strip out the colons, and leverage the power of arithmetic operators.
A one-liner awk solution, for example:
tail files*.log | awk -v from="205006" -v to="205007" -F"\"" '{ timeasint=$4; gsub(":","",timeasint); if (timeasint >= from && timeasint <= to) print $0 }'
would get you:
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:06","k":"2115","h":"178","s":-53.134575556764}
[BCDID::16 T::LIVE_ANALYZER GCDID::16] {"t":"20:50:07","k":"1511","h":"178","s":-53.134575556764}
Of course you couldn't span across midnight (i.e., 25:59:59 - 00:00:01), but for that you'd need dates as well as times in your log anyway.
If you had dates, my suggestion would be converting them to epoch stamps (using date -d "string" or some other suitable method) and comparing the epoch stamps as integers.

Unix Script is taking longer time, can we optimise it

seq_no=1
for line in `cat temp1_other.txt`
do
pk=`echo "$line" | cut -d '|' -f41`
seq_no=`expr "$seq_no" + 1`
line1=`sed -n ''$seq_no'p' temp1_other.txt`
pk_next=`echo "$line1" | cut -d '|' -f41`
if [ "$pk" == "$pk_next" ]; then
echo $line >> exam_duplicate.txt
else
echo $line >> exam_non_duplicate.txt
fi
done
Trying to read a file and comparing current line column and next line column to check for duplicate record, for 60k-70k file it's taking more than 20 mins, can we optimise it or achieve it with some other logic. while loop also taking longer time. The records are sorted using "sort" command.
Sample Data:
Sam|1|IT|1st_Sem
Sam|1|CS|1st_Sem
Sam|1|CS|2nd_Sem
Peter|2|IT|2nd_sem
Ron|2|ECE|3rd_sem
Suppose 2nd column is the key column, if the 2nd column is matching with the next line 2nd column, it should go to duplicate file, if not matching then it should go to non duplicate file.
Sam|1|IT|1st_Sem
Sam|1|CS|1st_Sem
Peter|2|IT|2nd_sem
Should goto duplicate file and rest to non duplicate.
Are you running Linux/bash ? Than you can try
tac temp1_other.txt | sort -k2,2 -t'|' -u > exam_non_duplicate.txt
The sort only looks to the second field and wants to keep the first record it sees.
You want to have the last record as non-duplicate, so we reverse the cat into tac.
Now you want to have the file with all duplicates, you can try
grep -vFxf exam_non_duplicate.txt temp1_other.txt > exam_duplicate.txt
This solution will fail when you have real duplicates (complete identical lines) when one of these is mentioned in the exam_non_duplicate.txt.
Spawning external cut is going to kill your performance. Do the whole thing in awk:
awk '{this=$2}
NR>1 {
output = "exam" (this != prev ? "_non" : "") "_duplicate.txt";
print last > output
}
{prev=this; last = $0} ' FS=\| input-file
(This uses your example keying on column 2. Change $2 as necessary.) Note that this will not write the final line of the file anywhere, but that's easy enough to handle.

Using grep to filter calls to service in logs

I'm trying to filter the number of times a service is called by a different user in a log file.
I was thinking about using uniq -c, but almost all lines are unique thanks to the timestamp. What I want is to ignore the parts of the line I don't need and just focus on the service name and the call id which identifies each separate call.
The log format is something like this:
27/02/2017 00:00:00 [useremail#email.com] [sessioninfo(**callId**)] **serviceName**
Being callId and serviceName the strings I want to filter.
And my required output would be the count of each different callId that is found in the same line as the service call.
For example for the input :
27/02/2017 00:00:00 [useremail#email.com] [sessioninfo(12345)] service1
27/02/2017 00:00:01 [useremail1#email.com] [sessioninfo(12346)] service1
27/02/2017 00:00:02 [useremail2#email.com] [sessioninfo(12347)] service1
27/02/2017 00:00:00 [useremail#email.com] [sessioninfo(12345)] service1
The output would be 3, because one of the lines is using the same callId.
Is there any way I could achieve this with a grep, or would I need to create more advanced script to do the job?
You may use the following awk:
awk -F '[\\(\\)\\]]+' '{ print $3 " " $4 }' somelog.log
You may combine it later with sort and then uniq and get the count:
awk -F '[\\(\\)\\]]+' '{ print $3 " " $4 }' somelog.log | sort | uniq
What I want is to ignore the parts of the line I don't need.
In your case, what you need is the -f option of uniq:
-f num Ignore the first num fields in each input line when doing comparisons. A
field is a string of non-blank characters separated from adjacent fields
by blanks. Field numbers are one based, i.e., the first field is field one.
So you would sort the log file, find unique lines (discounting the first three fields) with uniq -f3 and then find the number of such lines with wc -l.
i.e.
sort out.log | uniq -f 3 | wc -l

Overrunning User Process Monitoring at Unix over a period of Time

I am new to Administration front. I have a requirement :
Identify the user processes (have a list of users who submits the process) which are still active and which have been submitted 3/4 days ago.
My Approach on this:
Have a text file with list of users.
Loop and find the current processes spawned by the users and store it a file.
Substitute a date variable with the format and grep.
However, I am stuck how to get : Submitted 3/4 days ago. With my code its equating to a day.
#!/bin/sh
rm -f psinfo.txt
rm -f psinfo_backdated.txt
for i in `cat user.lst `;
do
ps -ef | grep "$i" >> psinfo.txt
done
grep `date -d'2 day ago' +%b%d` psinfo.txt > psinfo_backdated.txt
I really need your comments and answer on this Gurus.
If some can tell if we can do grep of date range from a file like Less
than Apr27 format. I can make my script work. Waiting for the guru's
to respond on this.
A time format like Apr27 is not suitable for the task, also because it doesn't contain the year, which is necessary at the start of a year. Fortunately, there is a much better format for the start time of a process. Replace
ps -ef | grep "$i" >> psinfo.txt
with
ps -oetime=ELAPSED_TIME -ouser,pid,ppid,cpu,tty,time,args -u$i >>psinfo.txt
(you might want to drop fields you don't need from the second -o…). The time since start is then represented in the form
[[days-]hours:]minutes:seconds
- this you can easily filter with awk, e. g. to get processes started 3 or more days ago:
awk '{ if ($1~/-/ && int($1)>2) print }' psinfo.txt >psinfo_backdated.txt

Send multiple outputs to file without overriding

I'm trying to send the output of two commands in UNIX to a file called "log.txt"
Right now I've been trying:
# date ; quota -v myName > log.txt
The intent is to have my log.txt file look like:
Mon Sep 11 14:13:34 PDT 2006
Disk Quota for ....
...
...
Where the first line represents the date command and the rest represent the quota command.
Is there a way to send the outputs of both of these commands to the same log.txt file without overriding each other?
Use parentheses to group your commands for the redirect of standard output.
(date ; quota -v myName) > log.txt
For example:
# (date; echo "hi") > foo
# cat foo
Sat Feb 9 23:09:15 PST 2013
hi
Louis's answer of >> works better if you want to have many commands in a big script. The first command should use > so that it truncates any existing contents of the file. All the other commands use >> to append to the file.
You want to append, use the >> So something like:
date >> log.txt && quota -v myName >> log.txt

Resources