Unix Shell Scripting to calculate - unix

I have following data ;
24692 -rw-rw-r--+ 1 da01 da01 25284427 Aug 31 09:06 collected_BOT.227031
24660 -rw-rw-r--+ 1 da01 da01 25248756 Aug 31 09:35 collected_BOT.227032
24748 -rw-rw-r--+ 1 da01 da01 25338868 Aug 31 10:03 collected_BOT.227033
24740 -rw-rw-r--+ 1 da01 da01 25331322 Aug 31 10:31 collected_BOT.227034
sample:
grep 1303 collected_BOT.227034 | more
1559254293,151840703,AJ1X,10178801756650692,VA,VB,0,0,2,2,1303,1,L1O,6797,129,1,3,601,0,GVW1,9110,551,17,000000,0001,000000,,6,4,,1,1,,0
1559254294,151840704,AJ2X,10178801756650693,VA,VB,0,0,2,2,1303,1,L2O,6797,203,1,3,601,0,GVW2,9110,552,17,000000,0001,000000,,6,4,,1,1,,0
1559254295,151840705,AJ3X,10178801756650694,VA,VB,0,0,2,2,1303,1,L3O,6797,664,1,3,601,0,GVW3,9110,552,17,000000,0001,000000,,6,4,,1,1,,0
$15 = duration
I just want to calculate the total amount $15 on file collected_BOT.227034 (only if $11=1303)

awk -F, '$11==1303{sum+=$15} END {print sum}' collected_BOT.227034
-F, field separator is ,
$11==1303 check if 11th field exactly matches the number 1303
If so, add the value of 15th field to sum variable (whose initial value is zero by default)
END {print sum} after processing all the lines of input file, print the value of sum variable
Edit:
Thanks #Mark Setchell for pointing out that $11==1303 can be used instead of $11 ~ /^1303$/
Also, use print sum + 0 if output is needed as '0' even when no lines match. Or an explicit BEGIN{sum=0} block

Great solution #sp asic.
No need to use regular expression for field $11 though:
awk -F, '$11=="1303" {sum+=$15} END {print sum}' collected_BOT.227034
(beware: use == and not =, because this last one will do nothing except do a (successful) assignment to field $11

Related

grep and awk, combine commands?

I have file that looks like:
This is a RESTRICTED site.
All connections are monitored and recorded.
Disconnect IMMEDIATELY if you are not an authorized user!
sftp> cd outbox
sftp> ls -ltr
-rw------- 1 0 0 1911 Jun 12 20:40 61N0584832_EDIP000749728818_MFC_20190612203409.txt
-rw------- 1 0 0 1878 Jun 13 06:01 613577165_EDIP000750181517_MFC_20190613055207.txt
I want to print only the .txt file names, ideally in one command.
I can do:
grep -e '^-' outfile.log > outfile.log2
..which gives only the lines that start with '-'.
-rw------- 1 0 0 1911 Jun 12 20:40 61N0584832_EDIP000749728818_MFC_20190612203409.txt
-rw------- 1 0 0 1878 Jun 13 06:01 613577165_EDIP000750181517_MFC_20190613055207.txt
And then:
awk '{print $9}' outfile.log2 > outfile.log3
..which gives the desired output:
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt
And so the question is, can these 2 commands be combined into 1?
You may use a single awk:
awk '/^-/{ print $9 }' file > outputfile
Or
awk '/^-/{ print $9 }' file > tmp && mv tmp file
It works like this:
/^-/ - finds each line starting with -
{ print $9 } - prints Field 9 of the matching lines only.
Seems like matching the leading - is not really want you want. If you want to just get the .txt files as output, filter on the file name:
awk '$9 ~ /\.txt$/{print $9}' input-file
Using grep with PCRE enabled (-P) flag:
grep -oP '^-.* \K.*' outfile.log
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt
'^-.* \K.*' : Line starting with - till last white space are matched but ignored (anything left of \K will be matched and ignored) and matched part right of \K will be printed.
Since he clearly writes I want to print only the .txt file names, we should test for txt file and since file name are always the latest column we make it more portable by only test the latest filed line this:
awk '$NF ~ /\.txt$/{print $NF}' outfile.log > outfile.log2
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt

Drop 4 first columns

I have a command that can drop first 4 columns, but unfortunately if 2nd column name and 4th column name likely similar, it will truncate at 2nd column but if 2nd column and 4th column name are not same it will truncate at 4th column. Is it anything wrong to my commands?
**
awk -F"|" 'NR==1 {h=substr($0, index($0,$5)); next}
{file= path ""$1""$2"_"$3"_"$4"_03042017.csv"; print (a[file]++?"": "DETAILS 03042017" ORS h ORS) substr($0, index($0,$5)) > file}
END{for(file in a) print "EOF " a[file] > file}' filename
**
Input:
Account Num | Name | Card_Holder_Premium | Card_Holder| Type_Card | Balance | Date_Register
01 | 02 | 03 | 04 | 05 | 06 | 07
Output
_Premium | Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
My desired output:
Card_Holder| Type_Card | Balance | Date_Register
05 | 06 |07
Is this all you're trying to do?
$ sed -E 's/([^|]+\| ){4}//' file
April | May | June
05 | 06 | 07
$ awk '{sub(/([^|]+\| ){4}/,"")}1' file
April | May | June
05 | 06 | 07
The method you use to remove columns using index is not correct. As you have figured out, index can be confused and match the previous field when the previous field contains the same words as the next field.
The correct way is the one advised by Ed Morton.
In this online test, bellow code based on Ed Morton suggestion, gives you the output you expect:
awk -F"|" 'NR==1 {sub(/([^|]+\|){3}/,"");h=$0;next} \
{file=$1$2"_"$3"_"$4"_03042017.csv"; sub(/([^|]+\|){3}/,""); \
print (a[file]++?"": "DETAILS 03042017" ORS h ORS) $0 > file} \
END{for(file in a) print "EOF " a[file] > file}' file1.csv
#Output
DETAILS 03042017
Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
EOF 1
Due to the whitespace that you have include in your fields, the filename of the generated file appears as 01 02 _ 03 _ 04 _03042017.csv. With your real data this filename should appear correct.
In any case, i just adapt Ed Morton answer to your code. If you are happy with this solution you should accept Ed Morton answer.
PS: I just removed a space from Ed Morton answer since it seems to work a bit better with your not so clear data.
Ed Suggested:
awk '{sub(/([^|]+\| ){4}/,"")}1' file
#Mind this space ^
This space here it might fail to catch your data if there is no space after each field (i.e April|May).
On the other hand, by removing this space it seems that Ed Solution can correctly match either fields in format April | May or in format April|May

What does ^ character mean in grep ^d?

When I do ls -l | grep ^d it lists only directories in the current directory.
What I'd like to know is what does the character ^ in ^d mean?
The caret ^ and the dollar sign $ are meta-characters that respectively match the empty string at the beginning and end of a line.The grep is matching only lines that start with "d".
To complement the good answer by The New Idiot, I want to point out that this:
ls -l | grep ^d
Shows all directories in the current directory. That's because the ls -l adds a d in the beginning of the directories info.
The format of ls -l is like:
-rwxr-xr-x 1 user group 0 Jun 12 12:25 exec_file
-rw-rw-r-- 1 user group 0 Jun 12 12:25 normal_file
drwxr-xr-x 16 user group 4096 May 24 12:46 dir
^
|___ see the "d"
To make it more clear, you can ls -lF to include a / to the end of the directories info:
-rwxr-xr-x 1 user group 0 Jun 12 12:25 exec_file*
-rw-rw-r-- 1 user group 0 Jun 12 12:25 normal_file
drwxr-xr-x 16 user group 4096 May 24 12:46 dir/
So ls -lF | grep /$ will do the same as ls -l | grep ^d.
It has two meanings. One as 'The New Idiot' above pointed out. The other, equally useful, is within character class expression, where it means negation: grep -E '[^[:digit:]]' accepts any character except a digit. The^` must be the first character within [].

Unix: Increment date column by one day in csv file

Help needed. I want to increment Date (which is a string) column in csv by one day.
e.g. (Date Format yyyy-MM-dd)
Col1,Col2,Col3
ABC,001,1900-01-01
XYZ,002,2000-01-01
Expected OutPut
Col1,Col2,Col3
ABC,001,1900-01-02
XYZ,002,2000-01-02
There's one standard Unix utility that has all the date magic from September 14, 1752 through December 31, 9999 built-in: the calendar cal. Instead of reinventing the wheel and do messy date calculations we will use its intelligence to our advantage. The basic problem is: given a date, is it the last day of a month? If not, simply increment the day. If yes, reset day to 1 and increment month (and possibly year).
However, the output of cal is unspecified and it may look like this:
$ cal 2 1900
February 1900
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28
What we would need is a list of days, 1 2 3 ... 28. We can do this by skipping everything up to the "1":
set -- $(cal 2 1900)
while test $1 != 1; do shift; done
Now the number of args gives us the number of days in February 1900:
$ echo $#
28
Putting it all together in a script:
#!/bin/sh
read -r header
printf "%s\n" "$header"
while IFS=,- read -r col1 col2 y m d; do
case $m-$d in
(12-31) y=$((y+1)) m=01 d=01;;
(*)
set -- $(cal $m $y)
# Shift away the month and weekday names.
while test $1 != 1; do shift; done
# Is the day the last day of a month?
if test ${d#0} -eq $#; then
# Yes: increment m and reset d=01.
m=$(printf %02d $((${m#0}+1)))
d=01
else
# No: increment d.
d=$(printf %02d $((${d#0}+1)))
fi
;;
esac
printf "%s,%s,%s-%s-%s\n" "$col1" "$col2" $y $m $d
done
Running it on this input:
Col1,Col2,Col3
ABC,001,1900-01-01
ABC,001,1900-02-28
ABC,001,1900-12-31
XYZ,002,2000-01-01
XYZ,002,2000-02-28
XYZ,002,2000-02-29
yields
Col1,Col2,Col3
ABC,001,1900-01-02
ABC,001,1900-03-01
ABC,001,1901-01-01
XYZ,002,2000-01-02
XYZ,002,2000-02-29
XYZ,002,2000-03-01
I made one little assumption: The first two columns don't contain a - or escaped comma. If they do, the IFS=,- read will act up.
Using the date command, this can be done in awk:
awk 'BEGIN{FS=OFS=","}NR>1{("date -d\""$3" +1 day\" +%Y-%m-%d")|getline newdate; $3=newdate; print}' file.in
If you can extract the date from the file, you can use this:
d="1900-01-01" # date from file
date --date '#'$(( $(date --date $d +"%s") + 86400 ))

How to properly grep filenames only from ls -al

How do I tell grep to only print out lines if the "filename" matches when I'm piping through ls? I want it to ignore everything on each line until after the timestamp. There must be some easy way to do this on a single command.
As you can see, without it, if I searched for the file "rwx", it would return not only the line with rwx.c, but also the first three lines because of permissions. I was going to use AWK but I want it to display the whole last line if I search for "rwx".
Any ideas?
EDIT: Thanks for the hacks below. However, it would be great to have a more bug-free method. For example, if I had a file named "rob rob", I wouldn't be able to use the stated solutions.
drwxrwxr-x 2 rob rob 4096 2012-03-04 18:03 .
drwxrwxr-x 4 rob rob 4096 2012-03-04 12:38 ..
-rwxrwxr-x 1 rob rob 13783 2012-03-04 18:03 a.out
-rw-rw-r-- 1 rob rob 4294 2012-03-04 18:02 function1.c
-rw-rw-r-- 1 rob rob 273 2012-03-04 12:54 function1.c~
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rwx.c
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
The following will list only file name, and one file in each row.
$ ls -1
To include . files
$ ls -1a
Please note that the argument is number "1", not letter "l".
Why don't you use grep and match the file name following the timestamp?
grep -P "[0-9]{2}:[0-9]{2} $FILENAME(\.[a-zA-Z0-9]+)?$"
The [0-9]{2}:[0-9]{2} is for the time, the $FILENAME is where you'd put rob rob or rwx, and the trailing (\.[a-zA-Z0-9]+)? is to allow for an optional extension.
Edit: #JonathanLeffler below points out that when files are older than bout 6 months the time column gets replaced by a year - this is what happens on my computer anyhow. You could do ([0-9]{2}:[0-9]{2}|(19|20)[0-9]{2}) to allow time OR year, but you may be best of using awk (?).
[foo#bar ~/tmp]$ls -al
total 8
drwxrwxr-x 2 foo foo 4096 Mar 5 09:30 .
drwxr-xr-- 83 foo foo 4096 Mar 5 09:30 ..
-rw-rw-r-- 1 foo foo 0 Mar 5 09:30 foo foo
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 rwx.c
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 tmp
[foo#bar ~/tmp]$export filename='foo foo'
[foo#bar ~/tmp]$echo $filename
foo foo
[foo#bar ~/tmp]$ls -al | grep -P "[0-9]{2}:[0-9]{2} $filename(\.[a-zA-Z0-9]+)?$"
-rw-rw-r-- 1 cha66i cha66i 0 Mar 5 09:30 foo foo
(You could additionally extend to matching the whole line if you wanted:
^ # start of line
[d-]([r-][w-][x-]){3} + # permissions & space (note: is there a 't' or 's'
# sometimes where the 'd' can be??)
[0-9]+ # whatever that number is
[\w-]+ [\w-]+ + # user/group (are spaces allowed in these?)
[0-9]+ + # file size (modify for -h switch??)
(19|20)[0-9]{2}- # yyyy (modify if you want to allow <1900)
(1[012]|0[1-9])- # mm
(0[1-9]|[12][0-9]|3[012]) + # dd
([01][0-9]|2[0-3]):[0-6][0-9] +# HH:MM (24hr)
$filename(\.[a-zA-Z0-9]+)? # filename & optional extension
$ # end of line
. You get the point, tailor to your needs.)
Assuming that you aren't prepared to do:
ls -ld $(ls -a | grep rwx)
then you need to exploit the fact that there are 8 columns with space separation before the file name starts. Using egrep (or grep -E), you could do:
ls -al | egrep "^([^ ]+ +){8}.*rwx"
This looks for 'rwx' after the 8th column. If you want the name to start with rwx, omit the .*. If you want the name to end with rwx, add a $ at the end. Note that I used double quotes so you could interpolate a variable in place of the literal rwx.
This was tested on Mac OS X 10.7.3; the ls -l command consistently gives three columns for the date field:
-r--r--r-- 1 jleffler staff 6510 Mar 17 2003 README,v
-r--r--r-- 1 jleffler staff 26676 Mar 3 21:44 ccs.nmd
Your ls -l seems to be giving just two columns, so you'd need to change the {8} to {7} for your machine - and beware migrating between systems.
Well, if you're working with filenames that don't have spaces in them, you could do something like this:
grep 'rwx\S*$'
Aside frrm the fact that you can use pattern matching with ls, exaple ksh and bash,
which is probably what you should do, you can use the fact that filename occur in a
fixed position. awk (gawk, nawk or whaever you have) is a better choice for this.
If you have to use grep it smells like homework to me. Please tag it that way.
Assume the filename starting position is based on this output from ls -l in linux: 56
-rwxr-xr-x 1 Administrators None 2052 Feb 28 20:29 vote2012.txt
ls -l | awk ' substr($0,56) ~/your pattern even with spaces goes here/'
e.g.,
ls -l | awk ' substr($0,56) ~/^val/'
will find files starting with "val"
As a simple hack, just add a space before your filename so you don't match the beginning of the output:
ls -al | grep '\srwx'
Edit: OK, this is not as robust as it should be. Here's awk:
ls -l | awk ' $9 ~ /rwx/ { print $0 }'
This works for me, unlike ls -l & others as some folks pointed out. I like this because its really generic & gives me the base file name, which removes the path names before the file.
ls -1 /path_name |awk -F/ '{print $NF}'
Only one command you needed for this --
ls -al | gawk '{print $9}'
You can use this:
ls -p | grep -v /
this is super old, but i needed the answer and had a hard time finding it. i didn't really care about the one-liner part; i just needed it done. this is down and dirty and requires that you count the columns. i'm not looking for an upvote here, just leaving some options for future searcher-ers.
the helpful awk trick is here -- Using awk to print all columns from the nth to the last
if
YOUR_FILENAME="rob rob"
and
WHERE_FILENAMES_START=8
ls -al | while read x; do
y=$(echo "$x" | awk '{for(i=$WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')
[[ "$YOUR_FILENAME " = "$y" ]] && echo "$x"
done
if you save it as a bash script and swap out the vars with $2 and $1, throw the script in your usr bin... then you'll have your clean simple one-liner ;)
output will be:
> -rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
the question was for a one-liner so...
ls -al | while read x; do [[ "$YOUR_FILENAME " = "$(echo "$x" | awk '{for(i=WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')" ]] && echo "$x" ; done
(lol ;P)
on another note: mathematical.coffee your answer was rad. it didn't solve my version of this problem, so i didn't upvote, but i liked your regex breakdown :D

Resources