Compare size of each cell in Unix Scripting - unix

I want to compare each cell size/length and change its content depending on its length.
The current table is of format
AB
CD
AB
AB
CD
155668/01
AB
1233/10
I want to replace the cells which have length more than "2" to DE.
Output
AB
CD
AB
AB
CD
DE
AB
DE
I tried
awk -F "," '{ if($(#1) > "2") {print "DE"} else {print $1 }}'
It says syntax error.
If I use wc -m in place if $(# the output is same is the input.

The easiest way is to use sed:
sed '/^..$/!s/.*/DE/' file
In awk, you could say:
awk '!/^..$/ { $0 = "DE" } 1' file
In both cases, the idea is the same: if the line does not consist of exactly two characters, replace the whole line with DE. In the case of sed, the whole line is .*, in the case of awk, it is $0.

Try this -
$ awk '{print (length($1)>2?"DE":$1)}' f
AB
CD
AB
AB
CD
DE
AB
DE

The idiomatic way would be:
awk 'length($1) > 2 { $1 = "DE" } 1'

Related

replace specific columns on lines not starting with specific character in a text file

I have a text file that looks like this:
>long_name
AAC-TGA
>long_name2
CCTGGAA
And a list of column numbers: 2, 4, 7. Of course I can have these as a variable like:
cols="2 4 7"
I need to replace every column of the rows that don't start with > with a single character, e.g an N, to result in:
>long_name
ANCNTGN
>long_name2
CNTNGAN
Additional details - the file has ~200K lines. All lines that don't start with > are the same length. Line indices will never exceed the length of the non > lines.
It seems to me that some combination of sed and awk must be able to do this quickly, but I cannot for the life of me figure out how to link it all together.
E.g. I can use sed to work on all lines that don't start with a > like this (in this case replacing all spaces with N's):
sed -i.bak '/^[^>]/s/ /N/g' input.txt
And I can use AWK to replace specific columns of lines as I want to like this (I think...):
awk '$2=N'
But I am struggling to stitch this together
With GNU awk, set i/o field separators to empty string so that each character becomes a field, and you can easily update them.
awk -v cols='2 4 7' '
BEGIN {
split(cols,f)
FS=OFS=""
}
!/^>/ {
for (i in f)
$(f[i])="N"
}
1' file
Also see Save modifications in place with awk.
You can generate a list of replacement commands first and then pass them to sed
$ printf '2 4 7' | sed -E 's|[0-9]+|/^>/! s/./N/&\n|g'
/^>/! s/./N/2
/^>/! s/./N/4
/^>/! s/./N/7
$ printf '2, 4, 7' | sed -E 's|[^0-9]*([0-9]+)[^0-9]*|/^>/! s/./N/\1\n|g'
/^>/! s/./N/2
/^>/! s/./N/4
/^>/! s/./N/7
$ sed -f <(printf '2 4 7' | sed -E 's|[0-9]+|/^>/! s/./N/&\n|g') ip.txt
>long_name
ANCNTGN
>long_name2
CNTNGAN
Can also use {} grouping
$ printf '2 4 7' | sed -E 's|^|/^>/!{|; s|[0-9]+|s/./N/&; |g; s|$|}|'
/^>/!{s/./N/2; s/./N/4; s/./N/7; }
Using any awk in any shell on every UNIX box:
$ awk -v cols='2 4 7' '
BEGIN { split(cols,c) }
!/^>/ { for (i in c) $0=substr($0,1,c[i]-1) "N" substr($0,c[i]+1) }
1' file
>long_name
ANCNTGN
>long_name2
CNTNGAN

Use awk to replace word in file

I have a file with some lines:
a
b
c
d
I would like to cat this file into a awk command to produce something like this:
letter is a
letter is b
letter is c
letter is d
using something like this:
cat file.txt | awk 'letter is $1'
But it's not printing out as expected:
$ cat raw.txt | awk 'this is $1'
a
b
c
d
At the moment, you have no { action } block, so your condition evaluates the two empty variables this and is, concatenating them with the first field $1, and checks whether the result is true (a non-empty string). It is, so the default action prints each line.
It sounds like you want to do this instead:
awk '{ print "letter is", $1 }' raw.txt
Although in this case, you might as well just use sed:
sed 's/^/letter is /' raw.txt
This command matches the start of each line and adds the string.
Note that I'm passing the file as an argument, rather than using cat with a pipe.
Not sure if you wanted sed or awk but this is in awk:
$ awk '{print "letter is " $1}' file
letter is a
letter is b
letter is c
letter is d

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

grep a line with single number

I have a file with a few lines like this:
1 ab
11 ad
41 ac
1 af
1 ag
and I want the lines where the number is 1:
1 ab
1 af
1 ag
How can I achieve this?
If I write this:
grep "1" file.txt
then I get all the lines that contain 1, even if that's not the entire number:
1 ab
11 ad
41 ac
1 af
1 ag
The -w option tells grep to search for a pattern as a single word:
grep -w 1 file.txt
You can write:
grep '^1 ' file.txt
to get all lines that start with a 1 followed by a space. (The ^ means "start-of-line".)
grep -w ^1 file.txt
To get lines starting with a one.
This is probably very useful for regex with grep.
grep "^1[ \t]" file.txt
^ -> beginning of line
[ \t] -> whitespace after "1"

Joining two consecutive lines using awk or sed

How would I join two lines using awk or sed?
I have data that looks like this:
abcd
joinabcd
efgh
joinefgh
ijkl
joinijkl
I need an output like the one below:
joinabcdabcd
joinefghefgh
joinijklijkl
awk '!(NR%2){print$0p}{p=$0}' infile
You can use printf with a ternary:
awk '{printf (NR%2==0) ? $0 "\n" : $0}'
awk 'BEGIN{i=1}{line[i++]=$0}END{j=1; while (j<i) {print line[j+1] line[j]; j+=2}}' yourfile
No need for sed.
Here it is in sed:
sed 'h;s/.*//;N;G;s/\n//g' < filename
They say imitation is the sincerest form of flattery.
Here's a Perl solution inspired by Dimitre's awk code:
perl -lne 'print "$_$p" if $. % 2 == 0; $p = $_' infile
$_ is the current line
$. is the line number
Some improvement to the "sed" script above that will take the following:
1008
-2734406.132904
2846
-2734414.838455
4636
-2734413.594009
6456
-2734417.316269
8276
-2734414.779617
and make it :
1008 -2734406.132904
2846 -2734414.838455
4636 -2734413.594009
6456 -2734417.316269
8276 -2734414.779617
the "sed" is : "sed 'h;s/.*//;G;N;s/\n/ /g'"
This is the answer to the question "How to make the count and the file appear on the same line" in the command:
find . -type f -exec fgrep -ci "MySQL" {} \; -print
Bryon Nicolson's answer produced the best result.

Resources