sed/awk or other: one-liner to increment a number by 1 keeping spacing characters - unix

EDIT: I don't know in advance at which "column" my digits are going to be and I'd like to have a one-liner. Apparently sed doesn't do arithmetic, so maybe a one-liner solution based on awk?
I've got a string: (notice the spacing)
eh oh 37
and I want it to become:
eh oh 36
(so I want to keep the spacing)
Using awk I don't find how to do it, so far I have:
echo "eh oh 37" | awk '$3>=0&&$3<=99 {$3--} {print}'
But this gives:
eh oh 36
(the spacing characters where lost, because the field separator is ' ')
Is there a way to ask awk something like "print the output using the exact same field separators as the input had"?
Then I tried yet something else, using awk's sub(..,..) method:
' sub(/[0-9][0-9]/, ...) {print}'
but no cigar yet: I don't know how to reference the regexp and do arithmetic on it in the second argument (which I left with '...' for now).
Then I tried with sed, but got stuck after this:
echo "eh oh 37" | sed -e 's/\([0-9][0-9]\)/.../'
Can I do arithmetic from sed using a reference to the matching digits and have the output not modify the number of spacing characters?
Note that it's related to my question concerning Emacs and how to apply this to some (big) Emacs region (using a replace region with Emacs's shell-command-on-region) but it's not an identical question: this one is specifically about how to "keep spaces" when working with awk/sed/etc.

Here is a variation on ghostdog74's answer that does not require the number to be anchored at the end of the string. This is accomplished using match instead of relying on the number to be in a particular position.
This will replace the first number with its value minus one:
$ echo "eh oh 37 aaa 22 bb" | awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH) - 1; sub(/[0-9]+/, n); print }'
eh oh 36 aaa 22 bb
Using gsub there instead of sub would replace both the "37" and the "22" with "36". If there's only one number on the line, it doesn't matter which you use. By doing it this way, though, it will handle numbers with trailing whitespace plus other non-numeric characters that may be there (after some whitespace).
If you have gawk, you can use gensub like this to pick out an arbitrary number within the string (just set the value of which):
$ echo "eh oh 37 aaa 22 bb 19" |
awk -v which=2 'BEGIN { regex = "([0-9]+)\\>[^0-9]*";
for (i = 1; i < which; i++) {regex = regex"([0-9]+)\\>[^0-9]*"}}
{ match($0, regex, a);
n = a[which] - 1; # do the math
print gensub(/[0-9]+/, n, which) }'
eh oh 37 aaa 21 bb 19
The second (which=2) number went from 22 to 21. And the embedded spaces are preserved.
It's broken out on multiple lines to make it easier to read, but it's copy/pastable.

$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/[0-9]+$/,n) }1'
eh oh 38
or
$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/..$/,n) }1'
eh oh 38

something like
number=`echo "eh oh 37" | grep -o '[0-9]*'`
sed 's/$number/`expr $number + 1`/'

How about:
$ echo "eh oh 37" | awk -F'[ \t]' '{$NF = $NF - 1;} 1'
eh oh 36

The solution will not preserve the number of decimals, so if the number is 10, then the result is 9, even if one would like to have 09.
I did not write the shortest possible code, it should stay readable
Here I construct the printf pattern using RLENGTH so it becomes %02d (2 being the length of the matched pattern)
$ echo "eh oh 10 aaa 22 bb" |
awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH)-1 ;
nn=sprintf("%0" RLENGTH "d", n)
sub(/[0-9]+/, nn);
print
}'
eh oh 09 aaa 22 bb

Related

using stream editor with specified character positions

I am trying to come up with a code that satisfy
1.) replace the first instance of 3 to 5.
2.) 1.) can only take place, if 3 is the first number of the digit.
for example,
38765 -> 58765
43765 will not be transformed.
so far I have,
sed 's/^3/5/' *.txt
but I just cannot figure out a way to specify the condition when position 1 == 3.
What can I do to make improvements?
Sed:
$ echo 38765 | sed 's/^3/5/'
58765
$ echo 43765 | sed 's/^3/5/'
43765
ie. just replace leading 3 with a 5r.
To replace 3 in the second position:
$ echo 33765 | sed 's/\(^.\)3/\15/'
35765
More generic approach:
$ echo 33333 | sed 's/\(^.\{3\}\)3/\15/'
33353 ^
number of characters before the one to replace, 0-4

Removing blank spaces in specific column in pipe delimited file in AIX

Good morning. Long time reader, first time emailer so please be gentle.
I'm working on AIX 5.3 and have a 42 column pipe delimited file. There are telephone numbers in columns 15 & 16 (land|mobile) which may or may not contain spaces depending on who has keyed in the data.
I need to remove these space from columns 15 & 16 only ie
Column 15 | Column 16 **Currently**
01942 665432|07865346122
01942756423 |07855 333567
Column 15 | Column 16 **Needs to be**
01942665432|07865346122
01942756423|07855333567
I have a quick & dirty script which unfortunately is proving to be anything but quick because it's a while loop reading every single line, cutting the field on the pipe delimiter, assigning it to a variable, using sed on column 15 & 16 only to strip blank spaces then writing it out to a new file ie
cat $file | while read
output
do
.....
fourteen=$( echo $output | cut -d'|' -f14 )
fifteen=$( echo $output | cut -d'|' -f15 | sed 's/ //g' )
echo ".....$fourteen|$fifteen..." > $new_file
done
I know there must be a better way to do this, probably using AWK, but am open to any kind of suggestion anyone can offer as the script as it stands is taking half an hour plus to process 176,000 records.
Thanks in advance.
Yes, awk is better suited here
$ cat ip.txt
a|foo bar|01942 665432|07865346122|123
b|i j k |01942756423 |07855 333567|90870
$ awk 'BEGIN{FS=OFS="|"} {gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt
a|foo bar|01942665432|07865346122|123
b|i j k |01942756423|07855333567|90870
BEGIN{FS=OFS="|"} set | as input and output field separator
gsub(" ","",$3) replace all spaces with nothing only for column 3
gsub(" ","",$4) replace all spaces with nothing only for column 4
1 idiomatic way to print the input record (including any modification done )
Change 3 and 4 to whatever field you need
In case first line should not be affected, add a condition
awk 'BEGIN{FS=OFS="|"} NR>1{gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt

How to convert multiple lines into fixed column lengths

To convert rows into tab-delimited, it's easy
cat input.txt | tr "\n" " "
But I have a long file with 84046468 lines. I wish to convert this into a file with 1910147 rows and 44 tab-delimited columns. The first column is a text string such as chrXX_12345_+ and the other 43 columns are numerical strings. Is there a way to perform this transformation?
There are NAs present, so I guess sed and substituting "\n" for "\t" if the string preceding is a number doesn't work.
sample input.txt
chr10_1000103_+
0.932203
0.956522
1
0.972973
1
0.941176
1
0.923077
1
1
0.909091
0.9
1
0.916667
0.8
1
1
0.941176
0.904762
1
1
1
0.979592
0.93617
0.934783
1
0.941176
1
1
0.928571
NA
1
1
1
0.941176
1
0.875
0.972973
1
1
NA
0.823529
0.51366
chr10_1000104_-
0.952381
1
1
0.973684
sample output.txt
chr10_1000103_+ 0.932203 (numbers all tab-delimited)
chr10_1000104_- etc
(sorry alot of numbers to type manually)
sed '
# use a delimiter
s/^/M/
:Next
# put a counter
s/^/i/
# test counter
/^\(i\)\{44\}/ !{
$ !{
# not 44 line or end of file, add the next line
N
# loop
b Next
}
}
# remove marker and counter
s/^i*M//
# replace new line by tab
s/\n/ /g' YourFile
some limite if more than 255 tab on sed (so 44 is ok)
Here's the right approach using 4 columns instead of 44:
$ cat file
chr10_1000103_+
0.932203
0.956522
1
chr10_1000104_-
0.952381
1
1
$ awk '{printf "%s%s", $0, (NR%4?"\t":"\n")}' file
chr10_1000103_+ 0.932203 0.956522 1
chr10_1000104_- 0.952381 1 1
Just change 4 to 44 for your real input.
If you are seeing control-Ms in your output it's because they are present in your input so use dos2unix or similar to remove them before running the tool or with GNU awk you could just set -v RS='\n\r'.
When posting questions it's important to make it as clear, simple, and brief as possible so that as many people as possible will be interested in helping you.
BTW, cat input.txt | tr "\n" " " is a UUOC and should just be tr "\n" " " < input.txt
Not the best solution, but should work:
line="nonempty"; while [ ! -z "$line" ]; do for i in $(seq 44); do read line; echo -n "$line "; done; echo; done < input.txt
If there is an empty line in the file, it will terminate. For a more permanent solution I'd try perl.
edit:
If you are concerned with efficiency, just use awk.
awk '{ printf "%s\t", $1 } NR%44==0{ print "" }' < input.txt
You may want to strip the trailing tab character with | sed 's/\t$//' or make the awk script more complicated.
This might work for you (GNU sed):
sed '/^chr/!{H;$!d};x;s/\n/\t/gp;d' file
If a line does not begin with chr append it to the hold space and then delete it unless it is the last. If the line does start chr or it is the last line, then swap to the hold space and replace all newlines by tabs and print out the result.
N.B. the start of the next line will be left untouched in the pattern space which becomes the new hold space.

Unix utilities, sum the data under the save entries

I have this little problem that I want to ask:
So I have a file named "quest", which has:
Tom 100 John 10 Tom 100
How do I use awk to output something like:
Tom 200
I'd appreciate your help. I tried to look up online but I am not sure what I am look for. Thanks ahead!!
I do know how to use regular expression /Tom/ to grep the entry, but I am not sure how to proceed from there.
You can try something like:
$ awk '{
for(i=1; i<=NF; i+=2)
names[$i] = ((names[$i]) ? names[$i]+$(i+1) : $(i+1))
}
END{
for (name in names) print name, names[name]
}' quest
Tom 200
John 10
You basically iterate over the fields creating keys for all odd fields and assigning values of even fields to them. If the key already exists, you just add to the existing value.
This expects your file format to have Names on odd fields (for eg. 1, 3, 5 .. etc) and values on even fields (eg 2, 4, 6 .. etc).
In the END block, you just print entire array content.
I guess you need calculate all users' mark, not only Tom, here is the code:
xargs -n2 < file|awk '{a[$1]+=$2}END{for (i in a) print i,a[i]}'
Tom 200
John 10
and one-liner of awk
awk '{for (i=1;i<=NF;i+=2) a[$i]+=$(i+1)}END{for (i in a) print i,a[i]}' file
Tom 200
John 10
$ echo 'Tom 100 John 10 Tom 100' | grep -o '[0-9]*' | paste -sd+ | bc
210
grep -o '[0-9]*' produces
100
10
100
paste -sd+ produces
100+10+100
bc calculates the result.
However, this only works for small input since bc has limitation in input size.
In that case you can use awk '{s+=$0}END{print s}' instead of paste -sd+ | bc.
However note that GNU Awk treats all number as floting point, it produces inaccurate result when number is too large.
awk '/Tom/{
for(i=1;i<=NF;i++)
if($i=="Tom")s+=$(i+1);
print "Tom",s;s=0}' your_file
Test
Here is a way to do it in awk (no loop):
awk -v RS=" " '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Tom 200
John 10
If there are more than one line like this
cat quest
Tom 100 John 10 Tom 100
Paul 20 Tom 40 John 10
Then do this with gnu awk:
awk -v RS=" |\n" '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Paul 20
Tom 240
John 20
And if you do not like getline
awk -v RS=" |\n" 'NR%2 {n=$1;next}{a[n]+=$1} END {for (i in a) print i,a[i]}' quest

How to delete words which start with some specific pattern in a file in unix

I want to delete all words in my file that start with 3: and 4:
For Example -
Input is
13 1:12 2:14 3:11
10 1:9 2:7 4:10 5:2
16 3:7 8:24
7 4:7 6:54
Output should be
13 1:12 2:14
10 1:9 2:7 5:2
14 8:24
7 6:54
Can someone tell me if it is possible using sed command or awk command.
This might work for you (GNU sed):
sed 's/\b[34]:\S*\s*//g' file
Looks for a word boundary and then either 3 or 4 followed by : and zero or more non-spaces followed by zero or more spaces and deletes them throughout the line.
With sed
sed -r 's/ 3:[0-9]*| 4:[0-9]*//g'
$ cat input.txt
13 1:12 2:14 3:11 10 1:9 2:7 4:10 5:2 16 3:7 8:24 7 4:7 6:54
$ cat input.txt | sed -r 's/ 3:[0-9]*| 4:[0-9]*//g'
13 1:12 2:14 10 1:9 2:7 5:2 16 8:24 7 6:54
Explanation:
-r = Regex search
3:[0-9]*: Search for a space, then 3, then :, then [0-9] or a number between 0 and 9, the * means that he will search for zero or more hits in the pervious regex search, which is [0-9], so * means on this case that will search for zero or more numbers behind the first number after :
| : Means OR
4:[0-9]*: Same as above except that instead of 3 it will search for 4
//: The substitution strings, if you put POTATOE behind bars it will type it, on this case, sed will simply don't type anything.
/g: Search in all the input passed to sed.
With awk:
awk '{for (i=1; i<=NF; i++)
{if (! sub("^[34]:", "", $i)) d=d$i" "}
print d; d=""
}' file
It loops through the fields and just store in the variable d those that do not start with 3: or 4:. This is done by checking if sub() function returns true or not. When the loop through the line is done, the d variable is printed.
For your given file:
$ awk '{for (i=1; i<=NF; i++) {if (! sub("^[34]:", "", $i)) d=d$i" "} print d; d=""}' file
13 1:12 2:14
10 1:9 2:7 5:2
16 8:24
7 6:54
sed 's/[[:blank:]][34]:[^[:blank:]]\{1,\}[[:blank:]]*/ /g' YourFile
Posix compliant and assuming there is no (as in sample) first word stating with 3: or 4:.
Assuming all words contains : and has at least one digit after the :
sed "s/ \([34]:[^\b]+\)//g" inputfile
This matches SPACE, 3 or 4, colon and then at least one non word boundary. It replaces it forth nothing and does so for the whole line.

Resources