Search quantities on specific lines of text file and sum - unix

I have a text file some miscellaneous data and with some money expenses. I want to search all dollar quantities between specific lines and sum them. Specific lines meaning search for dollar quantities between lines 6 and 8.
Here's an example of my text file:
Mary had a little $5.00 lamb
Bing bang bow
Blah blah blah
STARBUCKS Jan 8th, 2019 $7.00
MCDONALD'S Jan 10th, 2019 $6.00
UBER Jan 11th, 2019 $20.01
The expected answer is $33.01
I found that in VI I can search dollar quantities like this:
/$\d\{2}\|\$\d\{1}
I also saw in my search results that AWK can search numbers and sum them, but I couldn't figure out how to tailor those suggestions to my problem.

Use $ as field separator. If there is a second column (NF==2) sum values in second column.
awk -F '$' 'NF==2{sum+=$2} END{print sum}' file

You could use awk with some pattern matching :
awk '$NF ~/^\$.*$/{amt+=substr($NF,2)}END{print "$" amt}' file
$33.01

A very generic solution uses a regular expression with positive lookbehind:
grep -oP --regexp='(?<=\$)[0-9\.]*' inputFile | paste -s -d+ | bc
The regex (?<=\$)[0-9\.]* matches only sequences of digits and '.' if they are preceded by '$'
A modified solution using awk looks like this:
grep -oP --regexp='(?<=\$)[0-9\.]*' inputFile | awk '{s+=$1} END {print s}'
Both commands return 33.01
To limit the summation to specified lines, you can prepend awk 'NR>5 && NR<9{print $0}':
awk 'NR>5 && NR<9{print $0}' inputFile | grep -oP --regexp='(?<=\$)[0-9\.]*' | awk '{s+=$1} END {print s}'

You can try Perl
$ perl -ne ' /\$(\S+)/ and $sum+=$1 ; END { print $sum } ' quantile.txt
38.01
the given input is
$ cat quantile.txt
Mary had a little $5.00 lamb
Bing bang bow
Blah blah blah
STARBUCKS Jan 8th, 2019 $7.00
MCDONALD'S Jan 10th, 2019 $6.00
UBER Jan 11th, 2019 $20.01

if your data in 'd'
perl -ne 'BEGIN{$s=0} if($.>=6) {/\$([\d.]+)/; $s+=$1} END{print "total=$s"}' d

Related

Replace last 9 delimeters "," with "|" in Unix

I want to replace last 9 "," delimeters with "|" in a file.
For example, from:
abcd,3,5,5,7,7,1,2,3,4
"ashu,pant,something",3,5,5,7,7,8,7,8,8,8
to:
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8
Help would be really appreciated.
Not exactly the same but replace all after the second occurrence with GNU sed:
$ echo \"ashu,pant\",3,5,5,7,7,87,8,8,8 |
sed 's/,/|/2g'
"ashu,pant"|3|5|5|7|7|87|8|8|8
Edit to match your changed requirements:
Hackish, but first reverse lines and replace all commas with pipes, then replace pipes with commas starting from 10th occurrence:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
rev |
sed 's/,/|/g; s/|/,/10g' |
rev
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
You could also use GNU awk and FPAT to replace all comma outside of quotes:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")";OFS="|"}{$1=$1}1'
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
awk '{gsub(/[[:digit:]]/," |&")gsub(/, /,"")}1' file
output
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8

Unix utilities, sum the data under the save entries

I have this little problem that I want to ask:
So I have a file named "quest", which has:
Tom 100 John 10 Tom 100
How do I use awk to output something like:
Tom 200
I'd appreciate your help. I tried to look up online but I am not sure what I am look for. Thanks ahead!!
I do know how to use regular expression /Tom/ to grep the entry, but I am not sure how to proceed from there.
You can try something like:
$ awk '{
for(i=1; i<=NF; i+=2)
names[$i] = ((names[$i]) ? names[$i]+$(i+1) : $(i+1))
}
END{
for (name in names) print name, names[name]
}' quest
Tom 200
John 10
You basically iterate over the fields creating keys for all odd fields and assigning values of even fields to them. If the key already exists, you just add to the existing value.
This expects your file format to have Names on odd fields (for eg. 1, 3, 5 .. etc) and values on even fields (eg 2, 4, 6 .. etc).
In the END block, you just print entire array content.
I guess you need calculate all users' mark, not only Tom, here is the code:
xargs -n2 < file|awk '{a[$1]+=$2}END{for (i in a) print i,a[i]}'
Tom 200
John 10
and one-liner of awk
awk '{for (i=1;i<=NF;i+=2) a[$i]+=$(i+1)}END{for (i in a) print i,a[i]}' file
Tom 200
John 10
$ echo 'Tom 100 John 10 Tom 100' | grep -o '[0-9]*' | paste -sd+ | bc
210
grep -o '[0-9]*' produces
100
10
100
paste -sd+ produces
100+10+100
bc calculates the result.
However, this only works for small input since bc has limitation in input size.
In that case you can use awk '{s+=$0}END{print s}' instead of paste -sd+ | bc.
However note that GNU Awk treats all number as floting point, it produces inaccurate result when number is too large.
awk '/Tom/{
for(i=1;i<=NF;i++)
if($i=="Tom")s+=$(i+1);
print "Tom",s;s=0}' your_file
Test
Here is a way to do it in awk (no loop):
awk -v RS=" " '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Tom 200
John 10
If there are more than one line like this
cat quest
Tom 100 John 10 Tom 100
Paul 20 Tom 40 John 10
Then do this with gnu awk:
awk -v RS=" |\n" '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Paul 20
Tom 240
John 20
And if you do not like getline
awk -v RS=" |\n" 'NR%2 {n=$1;next}{a[n]+=$1} END {for (i in a) print i,a[i]}' quest

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

How to properly grep filenames only from ls -al

How do I tell grep to only print out lines if the "filename" matches when I'm piping through ls? I want it to ignore everything on each line until after the timestamp. There must be some easy way to do this on a single command.
As you can see, without it, if I searched for the file "rwx", it would return not only the line with rwx.c, but also the first three lines because of permissions. I was going to use AWK but I want it to display the whole last line if I search for "rwx".
Any ideas?
EDIT: Thanks for the hacks below. However, it would be great to have a more bug-free method. For example, if I had a file named "rob rob", I wouldn't be able to use the stated solutions.
drwxrwxr-x 2 rob rob 4096 2012-03-04 18:03 .
drwxrwxr-x 4 rob rob 4096 2012-03-04 12:38 ..
-rwxrwxr-x 1 rob rob 13783 2012-03-04 18:03 a.out
-rw-rw-r-- 1 rob rob 4294 2012-03-04 18:02 function1.c
-rw-rw-r-- 1 rob rob 273 2012-03-04 12:54 function1.c~
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rwx.c
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
The following will list only file name, and one file in each row.
$ ls -1
To include . files
$ ls -1a
Please note that the argument is number "1", not letter "l".
Why don't you use grep and match the file name following the timestamp?
grep -P "[0-9]{2}:[0-9]{2} $FILENAME(\.[a-zA-Z0-9]+)?$"
The [0-9]{2}:[0-9]{2} is for the time, the $FILENAME is where you'd put rob rob or rwx, and the trailing (\.[a-zA-Z0-9]+)? is to allow for an optional extension.
Edit: #JonathanLeffler below points out that when files are older than bout 6 months the time column gets replaced by a year - this is what happens on my computer anyhow. You could do ([0-9]{2}:[0-9]{2}|(19|20)[0-9]{2}) to allow time OR year, but you may be best of using awk (?).
[foo#bar ~/tmp]$ls -al
total 8
drwxrwxr-x 2 foo foo 4096 Mar 5 09:30 .
drwxr-xr-- 83 foo foo 4096 Mar 5 09:30 ..
-rw-rw-r-- 1 foo foo 0 Mar 5 09:30 foo foo
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 rwx.c
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 tmp
[foo#bar ~/tmp]$export filename='foo foo'
[foo#bar ~/tmp]$echo $filename
foo foo
[foo#bar ~/tmp]$ls -al | grep -P "[0-9]{2}:[0-9]{2} $filename(\.[a-zA-Z0-9]+)?$"
-rw-rw-r-- 1 cha66i cha66i 0 Mar 5 09:30 foo foo
(You could additionally extend to matching the whole line if you wanted:
^ # start of line
[d-]([r-][w-][x-]){3} + # permissions & space (note: is there a 't' or 's'
# sometimes where the 'd' can be??)
[0-9]+ # whatever that number is
[\w-]+ [\w-]+ + # user/group (are spaces allowed in these?)
[0-9]+ + # file size (modify for -h switch??)
(19|20)[0-9]{2}- # yyyy (modify if you want to allow <1900)
(1[012]|0[1-9])- # mm
(0[1-9]|[12][0-9]|3[012]) + # dd
([01][0-9]|2[0-3]):[0-6][0-9] +# HH:MM (24hr)
$filename(\.[a-zA-Z0-9]+)? # filename & optional extension
$ # end of line
. You get the point, tailor to your needs.)
Assuming that you aren't prepared to do:
ls -ld $(ls -a | grep rwx)
then you need to exploit the fact that there are 8 columns with space separation before the file name starts. Using egrep (or grep -E), you could do:
ls -al | egrep "^([^ ]+ +){8}.*rwx"
This looks for 'rwx' after the 8th column. If you want the name to start with rwx, omit the .*. If you want the name to end with rwx, add a $ at the end. Note that I used double quotes so you could interpolate a variable in place of the literal rwx.
This was tested on Mac OS X 10.7.3; the ls -l command consistently gives three columns for the date field:
-r--r--r-- 1 jleffler staff 6510 Mar 17 2003 README,v
-r--r--r-- 1 jleffler staff 26676 Mar 3 21:44 ccs.nmd
Your ls -l seems to be giving just two columns, so you'd need to change the {8} to {7} for your machine - and beware migrating between systems.
Well, if you're working with filenames that don't have spaces in them, you could do something like this:
grep 'rwx\S*$'
Aside frrm the fact that you can use pattern matching with ls, exaple ksh and bash,
which is probably what you should do, you can use the fact that filename occur in a
fixed position. awk (gawk, nawk or whaever you have) is a better choice for this.
If you have to use grep it smells like homework to me. Please tag it that way.
Assume the filename starting position is based on this output from ls -l in linux: 56
-rwxr-xr-x 1 Administrators None 2052 Feb 28 20:29 vote2012.txt
ls -l | awk ' substr($0,56) ~/your pattern even with spaces goes here/'
e.g.,
ls -l | awk ' substr($0,56) ~/^val/'
will find files starting with "val"
As a simple hack, just add a space before your filename so you don't match the beginning of the output:
ls -al | grep '\srwx'
Edit: OK, this is not as robust as it should be. Here's awk:
ls -l | awk ' $9 ~ /rwx/ { print $0 }'
This works for me, unlike ls -l & others as some folks pointed out. I like this because its really generic & gives me the base file name, which removes the path names before the file.
ls -1 /path_name |awk -F/ '{print $NF}'
Only one command you needed for this --
ls -al | gawk '{print $9}'
You can use this:
ls -p | grep -v /
this is super old, but i needed the answer and had a hard time finding it. i didn't really care about the one-liner part; i just needed it done. this is down and dirty and requires that you count the columns. i'm not looking for an upvote here, just leaving some options for future searcher-ers.
the helpful awk trick is here -- Using awk to print all columns from the nth to the last
if
YOUR_FILENAME="rob rob"
and
WHERE_FILENAMES_START=8
ls -al | while read x; do
y=$(echo "$x" | awk '{for(i=$WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')
[[ "$YOUR_FILENAME " = "$y" ]] && echo "$x"
done
if you save it as a bash script and swap out the vars with $2 and $1, throw the script in your usr bin... then you'll have your clean simple one-liner ;)
output will be:
> -rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
the question was for a one-liner so...
ls -al | while read x; do [[ "$YOUR_FILENAME " = "$(echo "$x" | awk '{for(i=WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')" ]] && echo "$x" ; done
(lol ;P)
on another note: mathematical.coffee your answer was rad. it didn't solve my version of this problem, so i didn't upvote, but i liked your regex breakdown :D

sed/awk or other: one-liner to increment a number by 1 keeping spacing characters

EDIT: I don't know in advance at which "column" my digits are going to be and I'd like to have a one-liner. Apparently sed doesn't do arithmetic, so maybe a one-liner solution based on awk?
I've got a string: (notice the spacing)
eh oh 37
and I want it to become:
eh oh 36
(so I want to keep the spacing)
Using awk I don't find how to do it, so far I have:
echo "eh oh 37" | awk '$3>=0&&$3<=99 {$3--} {print}'
But this gives:
eh oh 36
(the spacing characters where lost, because the field separator is ' ')
Is there a way to ask awk something like "print the output using the exact same field separators as the input had"?
Then I tried yet something else, using awk's sub(..,..) method:
' sub(/[0-9][0-9]/, ...) {print}'
but no cigar yet: I don't know how to reference the regexp and do arithmetic on it in the second argument (which I left with '...' for now).
Then I tried with sed, but got stuck after this:
echo "eh oh 37" | sed -e 's/\([0-9][0-9]\)/.../'
Can I do arithmetic from sed using a reference to the matching digits and have the output not modify the number of spacing characters?
Note that it's related to my question concerning Emacs and how to apply this to some (big) Emacs region (using a replace region with Emacs's shell-command-on-region) but it's not an identical question: this one is specifically about how to "keep spaces" when working with awk/sed/etc.
Here is a variation on ghostdog74's answer that does not require the number to be anchored at the end of the string. This is accomplished using match instead of relying on the number to be in a particular position.
This will replace the first number with its value minus one:
$ echo "eh oh 37 aaa 22 bb" | awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH) - 1; sub(/[0-9]+/, n); print }'
eh oh 36 aaa 22 bb
Using gsub there instead of sub would replace both the "37" and the "22" with "36". If there's only one number on the line, it doesn't matter which you use. By doing it this way, though, it will handle numbers with trailing whitespace plus other non-numeric characters that may be there (after some whitespace).
If you have gawk, you can use gensub like this to pick out an arbitrary number within the string (just set the value of which):
$ echo "eh oh 37 aaa 22 bb 19" |
awk -v which=2 'BEGIN { regex = "([0-9]+)\\>[^0-9]*";
for (i = 1; i < which; i++) {regex = regex"([0-9]+)\\>[^0-9]*"}}
{ match($0, regex, a);
n = a[which] - 1; # do the math
print gensub(/[0-9]+/, n, which) }'
eh oh 37 aaa 21 bb 19
The second (which=2) number went from 22 to 21. And the embedded spaces are preserved.
It's broken out on multiple lines to make it easier to read, but it's copy/pastable.
$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/[0-9]+$/,n) }1'
eh oh 38
or
$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/..$/,n) }1'
eh oh 38
something like
number=`echo "eh oh 37" | grep -o '[0-9]*'`
sed 's/$number/`expr $number + 1`/'
How about:
$ echo "eh oh 37" | awk -F'[ \t]' '{$NF = $NF - 1;} 1'
eh oh 36
The solution will not preserve the number of decimals, so if the number is 10, then the result is 9, even if one would like to have 09.
I did not write the shortest possible code, it should stay readable
Here I construct the printf pattern using RLENGTH so it becomes %02d (2 being the length of the matched pattern)
$ echo "eh oh 10 aaa 22 bb" |
awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH)-1 ;
nn=sprintf("%0" RLENGTH "d", n)
sub(/[0-9]+/, nn);
print
}'
eh oh 09 aaa 22 bb

Resources