Enclose columns containing alphabets with single quotes using awk

Enclose columns containing alphabets with single quotes using awk - unix

Can awk process this?
Input
Neil,23,01-Jan-1990
25,Reena,19900203
Output
'Neil',23,'01-Jan-1990'
25,'Reena',19900203

awk approach:
awk -F, '{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]/) $i="\047"$i"\047"}1' OFS="," file
The output:
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
if($i~/[[:alpha:]]/) - if field contains alphabetic character
\047 - octal code of single quote ' character

Incorrect was my first attempt
sed -r 's/([^,]*[a-zA-Z]+[^,]*)(,{0,1})/"\1"\2/g' inputfile
#Sundeep gave an excellent comment: I need single quotes and it can be shorter:
I tried to match including the , of end-of-line, causing some complexity for matching. You can just match between the seperators making sure there is an alphabetic character somewhere.
sed 's/[^,]*[a-zA-Z][^,]*/\x27&\x27/g' inputfile

You might use this script:
script.awk
BEGIN { OFS=FS="," }
{ for(i= 1; i<=NF; i++) {
if( !match( $i, /^[0-9]+$/ ) ) $i = "'" $i "'"
}
print
}
and run it like this: awk -f script.awk yourfile .
Explanation
the first line sets up the input and output Fieldseparators to ,.
the loop tests each field, whether it contains only digits (/^[0-9]+$/):
if not the field is put in quotes

Related

awk/sed/grep to search for substring within string of second semicolon separated part/column and return only first part/column plus the substring

I have a Unix file containing semicolon separated records like below, having 2nd part/column a string with comma separated values, like below:
789651234512;TEST-10=5,TEST-136=6,TEST-3=1,TEST-4=2,TEST-5=3,TEST-9=4,TEST-9013=100
132567123784;TEST-3=1,TEST-136=5,TEST-15=4,TEST-4=2,TEST-5=3
132564013784;TEST-3=1,TEST-15=4,TEST-4=2,TEST-5=8
132496583212;TEST-13=4,TEST-136=7,TEST-23=1,TEST-6=2,TEST-5=3,TEST-4=5,TEST-6=11
I want to find all TEST-136=X, when exists, where X can be any interger number from 1 and up to 3 digits and return them like, for above example:
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
I am using the below awk, but that returns whole string of 2nd part/column:
awk -F'[;]' '/TEST-136/{ print $1";"$2 }' file.txt
However, I need to get only the 1st part/column and also the TEST-136=X part of the 2nd part/column, as said.

assumes ONE match per line/record.
$ awk -F';' 'match($0, /TEST-136=[[:digit:]]+/) {print $1, substr($0,RSTART,RLENGTH)}' OFS=';' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7

This might work for you (GNU sed):
sed -En 's/^([^;]*;).*(TEST-136=[^,]*).*/\1\2/p' file

Simple Perl,
$ perl -F";" -lane ' /(TEST-136=\w+)/ and print "$F[0];$1" ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$

Another awk
$ awk -F"[;,]" ' { for(i=2;i<=NF;i++) if($i~/TEST-136/) print $1 ";" $i } ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$

Use sed to replace all occurrences of strings which start with 'xy' and of length 5 or more

I am running AIX 6.1
I have a file which contains strings/words starting with some specific characters, say 'xy' or 'Xy' or 'Xy' or 'XY' (case insensitive) and I need to mask the entire word/string with asterisks '*' if the word is greater than say 5 characters.
e.g. I need a sed command which when run against a file containing the below line...
This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings
should give below as the output
This is a test line xy12 which I need to replace specific strings
I tried the below commands (did not yet come to the stage where I restrict to word lengths) but it does not work and displays the full line without any substitutions.
I tried using \< and > as well as \b for word identification.
sed 's/\<xy\(.*\)\>/******/g' result2.csv
sed 's/\bxy\(.*\)\b******/g' result2.csv

You can try with awk:
echo 'This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings' | awk 'BEGIN{RS=ORS=" "} !(/^[xX][yY]/ && length($0)>=5)'
The awk record separator is set to a space in order to be able to get the length of each word.
This works with GNU awk in --posix and --traditional modes.

With sed for the mental exercice
sed -E '
s/(^|[[:blank:]])([xyXY])([xyXY].{2}[^[:space:]]*)([^[:space:]])/\1#\3#/g
:A
s/(#[^#[:blank:]]*)[^#[:blank:]](#[#]*)/\1#\2/g
tA
s/#/*/g'
This need to not have # in the text.

A simple POSIX awk version :
awk '{for(i=1;i<=NF;++i) if ($i ~ /^[xX][yY]/ && length($i)>=5) gsub(/./,"*",$i)}1'
This, however, does not keep the spacing intact (multiple spaces are converted to a single one), the following does:
awk 'BEGIN{RS=ORS=" "}(/^[xX][yY]/ && length($i)>=5){gsub(/./,"*")}1'

You may use awk:
s='This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings xy123 xy1234 xy12345 xy123456 xy1234567'
echo "$s" | awk 'BEGIN {
ORS=RS=" "
}
{
for(i=1;i<=NF;i++) {
if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/)
gsub(/./,"*", $i);
print $i;
}
}'
A one liner:
awk 'BEGIN {ORS=RS=" "} { for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } }'
# => This is a test line ******* xy12 ***** ******* which I need to replace specific strings ***** ****** ******* ******** *********
See the online demo.
Details
BEGIN {ORS=RS=" "} - start of the awk: set the output record separator equal to the space record separator
{ for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^xy[a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } } - iterate over each field (with for(i=1;i<=NF;i++)) and if the current field ($i) length is equal or more than 5 (length($i) >= 5) and it matches a Xy and (&&) 1 or more alphanumeric chars pattern ($i~/^[Xx][Yy][a-zA-Z0-9]+$/), then replace each char with * (with gsub(/./,"*", $i)) and then print the current field value.

This might work for you (GNU sed):
sed -r ':a;/\bxy\S{5,}\b/I!b;s//\n&\n/;h;s/[^\n]/*/g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/;ta' file
If the current line does not contain a string which begins with xy case insensitive and 5 or more following characters, then there is no work to be done.
Otherwise:
Surround the string by newlines
Copy the pattern space (PS) to the hold space (HS)
Replace all characters other than newlines with *'s
Append the PS to the HS
Replace the PS with the HS
Swap the strings between the newlines retaining the remainder of the first line
Repeat

How to use Awk to filter rows using a column value under double quotes

"A","B",123,"C","AAB"
"A","BB",234,"CC","BA"
"AA","B",123,"CC","CBB"
"AA","BB",213,"C","CCA"
I want to get those rows where $1 == AA
awk 'BEGIN { FS = ","; OFS = FS;} {if ($1=="AA") print}'
but its not working. It works if the data is not in double quotes.

Just match the literal " with an escape character. This is the straight-forward filter to match the literal "AA" on the first column. Since awk works on a pattern { action } basis, the condition match to see if first column is "AA" can be done directly without needing to use explicit { print }
If the condition is met for that line, awk is left with a condition as awk 1 file on which case the line is printed.
awk -v FS=, '$1=="\"AA\""' file
Also, you can avoid escapes, by putting the match string in a variable under single-quotes and let it match the variable
awk -v FS=, -v m='"AA"' '$1==m' file

Following awk may help you on same.
awk -F, '{val=$1;gsub(/\"/,"",val)} val=="AA"' Input_file
Solution 2nd:
awk -F"[\",]" '$2=="AA"' Input_file

transpose a column in unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.

tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.

With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z

awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Replacing a String Pattern with another sequence in unix

I want replace the String TaskID_1 with a sequence starting from 1001 and this TaskID_1 can exists any many number of lines in my input file.
Similarly i need to replace all occurrences of TASKID_2 in my input file with next sequence value 1002.
Input file:
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
The output file should look like:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12

Here's one way using awk:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR }1' file
Or less verbosely:
awk -F '|' '{ $3=1000 + NR }1' OFS='|' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1002|dksj|kdjfdsjf|12
1234|25345|1003|dksj|kdjfdsjf|12
123425|65345|1004|dksj|kdjfdsjf|12
123425|15325|1005|dksj|kdjfdsjf|12
11345|55315|1006|dksj|kdjfdsjf|12
6345|15345|1007|dksj|kdjfdsjf|12
72345|25345|1008|dksj|kdjfdsjf|12
9345|411345|1009|dksj|kdjfdsjf|12
For the first example, the file separator and output file separator are set to a single pipe character. This is set in the BEGIN block, so that it is executed only once, and not on every line of input. We then set the third column to be equal to 1000 plus an incrementing variable. We could use ++i as this variable, but we could instead use NR (which is short for record number/line number) and this would therefore avoid the need to create an extra variable. The 1 on the end enables printing by default. A more verbose solution would look like:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR; print }' file
EDIT:
Using the updated data file, try:
awk 'BEGIN { FS=OFS="|" } { sub(/.*_/,"",$3); $3+=1000 }1' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12

A Perl solution using Steve's logic of adding 1000:
perl -pne 's/TaskID_(\d+)/$1+1000/e;' file
This replaces the 'TaskID_n' with 1000+n. 'e' is used to evaluate the replacement.

Replace TaskID_ with 100, this is super easy with sed for single digit IDs:
$ sed 's/TaskID_/100/' file
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
To store this change back to the file use the -i option:
sed -i 's/TaskID_/100/' file
Note: this works for TaskID_[0-9] if you want TaskID_23 mapped to 1023 then this won't, this would map TaskID_23 to 10023.

I can't come up with a better solution than the one steve suggested in awk.
So here's a worse solution, using only bash.
#!/bin/bash
IFS='|'
while read f1 f2 f3 f4 f5 f6; do
printf '%s|%s|%d|%s|%s|%s\n' "$f1" "$f2" "$((${f3#*_}+1000))" "$f4" "$f5" "$f6"
done < input
It's "worse" only because it'll be much slower than awk, which is fast and efficient with this sort of problem.

perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' your_file
Tested Below:
> cat temp
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
> perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' temp
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Enclose columns containing alphabets with single quotes using awk - unix

Can awk process this? Input Neil,23,01-Jan-1990 25,Reena,19900203 Output 'Neil',23,'01-Jan-1990' 25,'Reena',19900203

awk approach: awk -F, '{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]/) $i="\047"$i"\047"}1' OFS="," file The output: 'Neil',23,'01-Jan-1990' 25,'Reena',19900203 if($i~/[[:alpha:]]/) - if field contains alphabetic character \047 - octal code of single quote ' character

Related

awk/sed/grep to search for substring within string of second semicolon separated part/column and return only first part/column plus the substring

Use sed to replace all occurrences of strings which start with 'xy' and of length 5 or more

How to use Awk to filter rows using a column value under double quotes

transpose a column in unix

Replacing a String Pattern with another sequence in unix

Categories

Resources