using stream editor with specified character positions - unix

I am trying to come up with a code that satisfy
1.) replace the first instance of 3 to 5.
2.) 1.) can only take place, if 3 is the first number of the digit.
for example,
38765 -> 58765
43765 will not be transformed.
so far I have,
sed 's/^3/5/' *.txt
but I just cannot figure out a way to specify the condition when position 1 == 3.
What can I do to make improvements?

Sed:
$ echo 38765 | sed 's/^3/5/'
58765
$ echo 43765 | sed 's/^3/5/'
43765
ie. just replace leading 3 with a 5r.
To replace 3 in the second position:
$ echo 33765 | sed 's/\(^.\)3/\15/'
35765
More generic approach:
$ echo 33333 | sed 's/\(^.\{3\}\)3/\15/'
33353 ^
number of characters before the one to replace, 0-4

Related

Removing blank spaces in specific column in pipe delimited file in AIX

Good morning. Long time reader, first time emailer so please be gentle.
I'm working on AIX 5.3 and have a 42 column pipe delimited file. There are telephone numbers in columns 15 & 16 (land|mobile) which may or may not contain spaces depending on who has keyed in the data.
I need to remove these space from columns 15 & 16 only ie
Column 15 | Column 16 **Currently**
01942 665432|07865346122
01942756423 |07855 333567
Column 15 | Column 16 **Needs to be**
01942665432|07865346122
01942756423|07855333567
I have a quick & dirty script which unfortunately is proving to be anything but quick because it's a while loop reading every single line, cutting the field on the pipe delimiter, assigning it to a variable, using sed on column 15 & 16 only to strip blank spaces then writing it out to a new file ie
cat $file | while read
output
do
.....
fourteen=$( echo $output | cut -d'|' -f14 )
fifteen=$( echo $output | cut -d'|' -f15 | sed 's/ //g' )
echo ".....$fourteen|$fifteen..." > $new_file
done
I know there must be a better way to do this, probably using AWK, but am open to any kind of suggestion anyone can offer as the script as it stands is taking half an hour plus to process 176,000 records.
Thanks in advance.
Yes, awk is better suited here
$ cat ip.txt
a|foo bar|01942 665432|07865346122|123
b|i j k |01942756423 |07855 333567|90870
$ awk 'BEGIN{FS=OFS="|"} {gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt
a|foo bar|01942665432|07865346122|123
b|i j k |01942756423|07855333567|90870
BEGIN{FS=OFS="|"} set | as input and output field separator
gsub(" ","",$3) replace all spaces with nothing only for column 3
gsub(" ","",$4) replace all spaces with nothing only for column 4
1 idiomatic way to print the input record (including any modification done )
Change 3 and 4 to whatever field you need
In case first line should not be affected, add a condition
awk 'BEGIN{FS=OFS="|"} NR>1{gsub(" ","",$3); gsub(" ","",$4)} 1' ip.txt

Always show trailing zeros in bc

How to show trailing zeros in integer or how to convert it to float in bc?
I know about scale, but it works only with floats:
$ echo "scale=3; 3/2" | bc
1.500
$ echo "scale=3; 1+1" | bc
2
I want to get result seems like 2.000. I guess it's not difficult to do it with sed, but I'm a novice at this.
Divide by 1 to convert to a number with fractional part:
$ echo "scale=3; 1+1/1" | bc
2.000
once a divide of any kind has been done, if that value ends up being part of the output, it will be printed with the specified number of digits.
They're not "floats" in bc -- they're "fixed-point" -- numbers with a fixed number of digits after the decimal point. Internally, they're just integers divided by a fixed power of 10 (set by the scale command)
echo "scale=3; $1+$2" | bc | sed 's/^[0-9]*$/&\.000/g'
It works fine for me:
$ echo "scale=3; 3/2" | bc | sed 's/^[0-9]*$/&\.000/g'
1.500
$ echo "scale=3; 3+2" | bc | sed 's/^[0-9]*$/&\.000/g'
5.000

How to convert multiple lines into fixed column lengths

To convert rows into tab-delimited, it's easy
cat input.txt | tr "\n" " "
But I have a long file with 84046468 lines. I wish to convert this into a file with 1910147 rows and 44 tab-delimited columns. The first column is a text string such as chrXX_12345_+ and the other 43 columns are numerical strings. Is there a way to perform this transformation?
There are NAs present, so I guess sed and substituting "\n" for "\t" if the string preceding is a number doesn't work.
sample input.txt
chr10_1000103_+
0.932203
0.956522
1
0.972973
1
0.941176
1
0.923077
1
1
0.909091
0.9
1
0.916667
0.8
1
1
0.941176
0.904762
1
1
1
0.979592
0.93617
0.934783
1
0.941176
1
1
0.928571
NA
1
1
1
0.941176
1
0.875
0.972973
1
1
NA
0.823529
0.51366
chr10_1000104_-
0.952381
1
1
0.973684
sample output.txt
chr10_1000103_+ 0.932203 (numbers all tab-delimited)
chr10_1000104_- etc
(sorry alot of numbers to type manually)
sed '
# use a delimiter
s/^/M/
:Next
# put a counter
s/^/i/
# test counter
/^\(i\)\{44\}/ !{
$ !{
# not 44 line or end of file, add the next line
N
# loop
b Next
}
}
# remove marker and counter
s/^i*M//
# replace new line by tab
s/\n/ /g' YourFile
some limite if more than 255 tab on sed (so 44 is ok)
Here's the right approach using 4 columns instead of 44:
$ cat file
chr10_1000103_+
0.932203
0.956522
1
chr10_1000104_-
0.952381
1
1
$ awk '{printf "%s%s", $0, (NR%4?"\t":"\n")}' file
chr10_1000103_+ 0.932203 0.956522 1
chr10_1000104_- 0.952381 1 1
Just change 4 to 44 for your real input.
If you are seeing control-Ms in your output it's because they are present in your input so use dos2unix or similar to remove them before running the tool or with GNU awk you could just set -v RS='\n\r'.
When posting questions it's important to make it as clear, simple, and brief as possible so that as many people as possible will be interested in helping you.
BTW, cat input.txt | tr "\n" " " is a UUOC and should just be tr "\n" " " < input.txt
Not the best solution, but should work:
line="nonempty"; while [ ! -z "$line" ]; do for i in $(seq 44); do read line; echo -n "$line "; done; echo; done < input.txt
If there is an empty line in the file, it will terminate. For a more permanent solution I'd try perl.
edit:
If you are concerned with efficiency, just use awk.
awk '{ printf "%s\t", $1 } NR%44==0{ print "" }' < input.txt
You may want to strip the trailing tab character with | sed 's/\t$//' or make the awk script more complicated.
This might work for you (GNU sed):
sed '/^chr/!{H;$!d};x;s/\n/\t/gp;d' file
If a line does not begin with chr append it to the hold space and then delete it unless it is the last. If the line does start chr or it is the last line, then swap to the hold space and replace all newlines by tabs and print out the result.
N.B. the start of the next line will be left untouched in the pattern space which becomes the new hold space.

remove white space line from same named files in multiple subdirectories in unix

I have multiple files ( > 1000) with same name in different subdirectories
dir1/out.txt
# white row
1 2 3 4 5
3 3 4 5 6
4 1 4 5 8
# white row
dir2/out.txt
# white row
1 2 3 4 5
3 3 4 5 6
4 1 4 5 8
# white row
dir3/out.txt
# white row
1 2 3 4 5
3 3 4 5 6
4 1 4 5 8
# white row
I want to remove all white spaces (usually at heading row, tail row and in between rows.
Is there is quick way to do in Unix ? Apolozie for simple question.
Edit:
I am not trying to remove every space rather just whole lines that are white spaces
This will find all the files named out.txt in subdirectories of present working directory and deletes while-space containing lines from each file.
find . -name "out.txt" -exec sed -i '/^$/d' '{}' \;
Note: You must own write permissions to modify these files.
To remove just blank lines, use
sed -i '/^$/d' file
To remove blank-lines containing spaces use
sed -i '/^[[:blank:]]*/$' file
To remove all spaces from file, use
sed 's/ //g' file > file.new && /bin/mv file.new file
Thats a space char, if the white space might include tab char, then use
sed 's/[[:blank:]]//g' file
If you're using GNU sed on a linux, then you can do
sed -i 's/[[:blank:]]//g' file
And if you want to delete blank lines, then add
sed -i 's/[[:blank:]]//g;/^$/d' file
You'd wrap all of this in a find cmd to get your file names like
cd $baseDir ; find . -name '*.txt' -print | xargs sed -i 's/[[:blank:]]//g;/^$/d' {}
Use just the first part,
find . -name '*.txt' -print
And adjust until you see the correct filename list appearing.
Then test the 2nd half, by forcing the find output to have just 1 test filename as output, i.e.
find . -name 'myTestOut.txt' | xargs ...
I don't have an easy way to test this now, but this sort of question gets asked every day here on S.O., search by [unix] [linux] [xargs] [sed] .
I hope this helps.

sed/awk or other: one-liner to increment a number by 1 keeping spacing characters

EDIT: I don't know in advance at which "column" my digits are going to be and I'd like to have a one-liner. Apparently sed doesn't do arithmetic, so maybe a one-liner solution based on awk?
I've got a string: (notice the spacing)
eh oh 37
and I want it to become:
eh oh 36
(so I want to keep the spacing)
Using awk I don't find how to do it, so far I have:
echo "eh oh 37" | awk '$3>=0&&$3<=99 {$3--} {print}'
But this gives:
eh oh 36
(the spacing characters where lost, because the field separator is ' ')
Is there a way to ask awk something like "print the output using the exact same field separators as the input had"?
Then I tried yet something else, using awk's sub(..,..) method:
' sub(/[0-9][0-9]/, ...) {print}'
but no cigar yet: I don't know how to reference the regexp and do arithmetic on it in the second argument (which I left with '...' for now).
Then I tried with sed, but got stuck after this:
echo "eh oh 37" | sed -e 's/\([0-9][0-9]\)/.../'
Can I do arithmetic from sed using a reference to the matching digits and have the output not modify the number of spacing characters?
Note that it's related to my question concerning Emacs and how to apply this to some (big) Emacs region (using a replace region with Emacs's shell-command-on-region) but it's not an identical question: this one is specifically about how to "keep spaces" when working with awk/sed/etc.
Here is a variation on ghostdog74's answer that does not require the number to be anchored at the end of the string. This is accomplished using match instead of relying on the number to be in a particular position.
This will replace the first number with its value minus one:
$ echo "eh oh 37 aaa 22 bb" | awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH) - 1; sub(/[0-9]+/, n); print }'
eh oh 36 aaa 22 bb
Using gsub there instead of sub would replace both the "37" and the "22" with "36". If there's only one number on the line, it doesn't matter which you use. By doing it this way, though, it will handle numbers with trailing whitespace plus other non-numeric characters that may be there (after some whitespace).
If you have gawk, you can use gensub like this to pick out an arbitrary number within the string (just set the value of which):
$ echo "eh oh 37 aaa 22 bb 19" |
awk -v which=2 'BEGIN { regex = "([0-9]+)\\>[^0-9]*";
for (i = 1; i < which; i++) {regex = regex"([0-9]+)\\>[^0-9]*"}}
{ match($0, regex, a);
n = a[which] - 1; # do the math
print gensub(/[0-9]+/, n, which) }'
eh oh 37 aaa 21 bb 19
The second (which=2) number went from 22 to 21. And the embedded spaces are preserved.
It's broken out on multiple lines to make it easier to read, but it's copy/pastable.
$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/[0-9]+$/,n) }1'
eh oh 38
or
$ echo "eh oh 37" | awk '{n=$NF+1; gsub(/..$/,n) }1'
eh oh 38
something like
number=`echo "eh oh 37" | grep -o '[0-9]*'`
sed 's/$number/`expr $number + 1`/'
How about:
$ echo "eh oh 37" | awk -F'[ \t]' '{$NF = $NF - 1;} 1'
eh oh 36
The solution will not preserve the number of decimals, so if the number is 10, then the result is 9, even if one would like to have 09.
I did not write the shortest possible code, it should stay readable
Here I construct the printf pattern using RLENGTH so it becomes %02d (2 being the length of the matched pattern)
$ echo "eh oh 10 aaa 22 bb" |
awk '{n = substr($0, match($0, /[0-9]+/), RLENGTH)-1 ;
nn=sprintf("%0" RLENGTH "d", n)
sub(/[0-9]+/, nn);
print
}'
eh oh 09 aaa 22 bb

Resources