sed command - branching to label - unix

I am not able to get the same output as the example given in the SED tutorial for branching below,
http://www.grymoire.com/Unix/Sed.html#uh-59
Quoting the code here:
#!/bin/sh
sed '
:again
s/([ ^I]*)//
t again
'
The spaces are still in the brackets after this filter.
[UPDATE]
Here is my output:
$echo "( ( test ) )" | sed '
> :again
> s/([ ]*)//
> t again
> '
( ( test ) )
$
Shouldn't that be ((test))?
How do I get the script to delete the blank spaces in the nested parenthesis as demonstrated by the author?
[/UPDATE]
[UPDATE2]
$echo " ( ( ) ) " | sed '
> :again
> s/\([ ]*\)//
> t again
> '
Prompt is not back.
[/UPDATE2]
Also how do I enter the "^I" character? I think it is the horizontal tab, but I am not able to key in like other control characters via puTTY(for eg, to get "Enter", I type "Ctrl-V" followed by the "Enter" key, but this isn't working for tab). I tried with spaces only(using regex [ ]* instead of [ ^I]*), but this also failed to work.

Bully for you to work thru some tutorials.
Assuming you're using vi or vim all you need to do to include a tab char inside the [ .. ] grouping, is to type the tab key. ( I use putty all the time, and if pressing tab char doesn't "insert" a tab char into document/command-line, then you have a putty configuration problem ).
The ^I is from the vi list mode. List mode is handy to see where are line-feed chars (\n) will show as the reg-exp char $ (which in reg-ex is an "end-of-line anchor", the other being ^ char (beginning of line)).
So turning on vi list mode, with :li and you'll see all tab chars expanded as ^I and all end of lines as $
As you say
How do I get the script to delete the blank spaces in the nested parenthesis as demonstrated
That is slightly ambiguous, as newer seds use plain parens as grouping chars to create replacement group like \1 for the replacement-side of the s/pat/repl/ substitute cmd.
Given that your example has no numbered-replacement value in the replacement-side, I'll assume that the purpose is the remove a literal () pair AND that it should work as indicated. Once you :set list, add a tab-char inside the [ ... ], it should work. If not, please edit your question with any error messages that might appear.
I hope this helps.

( test ) does not match the regex ([ ]*). ([ ]*) only matches strings that contain nothing but spaces inside parens. Perhaps you are looking for ([ ]* to remove leading spaces inside and [ ]*) to remove trailing spaces.

Related

Use sed to replace all occurrences of strings which start with 'xy' and of length 5 or more

I am running AIX 6.1
I have a file which contains strings/words starting with some specific characters, say 'xy' or 'Xy' or 'Xy' or 'XY' (case insensitive) and I need to mask the entire word/string with asterisks '*' if the word is greater than say 5 characters.
e.g. I need a sed command which when run against a file containing the below line...
This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings
should give below as the output
This is a test line xy12 which I need to replace specific strings
I tried the below commands (did not yet come to the stage where I restrict to word lengths) but it does not work and displays the full line without any substitutions.
I tried using \< and > as well as \b for word identification.
sed 's/\<xy\(.*\)\>/******/g' result2.csv
sed 's/\bxy\(.*\)\b******/g' result2.csv
You can try with awk:
echo 'This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings' | awk 'BEGIN{RS=ORS=" "} !(/^[xX][yY]/ && length($0)>=5)'
The awk record separator is set to a space in order to be able to get the length of each word.
This works with GNU awk in --posix and --traditional modes.
With sed for the mental exercice
sed -E '
s/(^|[[:blank:]])([xyXY])([xyXY].{2}[^[:space:]]*)([^[:space:]])/\1#\3#/g
:A
s/(#[^#[:blank:]]*)[^#[:blank:]](#[#]*)/\1#\2/g
tA
s/#/*/g'
This need to not have # in the text.
A simple POSIX awk version :
awk '{for(i=1;i<=NF;++i) if ($i ~ /^[xX][yY]/ && length($i)>=5) gsub(/./,"*",$i)}1'
This, however, does not keep the spacing intact (multiple spaces are converted to a single one), the following does:
awk 'BEGIN{RS=ORS=" "}(/^[xX][yY]/ && length($i)>=5){gsub(/./,"*")}1'
You may use awk:
s='This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings xy123 xy1234 xy12345 xy123456 xy1234567'
echo "$s" | awk 'BEGIN {
ORS=RS=" "
}
{
for(i=1;i<=NF;i++) {
if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/)
gsub(/./,"*", $i);
print $i;
}
}'
A one liner:
awk 'BEGIN {ORS=RS=" "} { for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } }'
# => This is a test line ******* xy12 ***** ******* which I need to replace specific strings ***** ****** ******* ******** *********
See the online demo.
Details
BEGIN {ORS=RS=" "} - start of the awk: set the output record separator equal to the space record separator
{ for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^xy[a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } } - iterate over each field (with for(i=1;i<=NF;i++)) and if the current field ($i) length is equal or more than 5 (length($i) >= 5) and it matches a Xy and (&&) 1 or more alphanumeric chars pattern ($i~/^[Xx][Yy][a-zA-Z0-9]+$/), then replace each char with * (with gsub(/./,"*", $i)) and then print the current field value.
This might work for you (GNU sed):
sed -r ':a;/\bxy\S{5,}\b/I!b;s//\n&\n/;h;s/[^\n]/*/g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/;ta' file
If the current line does not contain a string which begins with xy case insensitive and 5 or more following characters, then there is no work to be done.
Otherwise:
Surround the string by newlines
Copy the pattern space (PS) to the hold space (HS)
Replace all characters other than newlines with *'s
Append the PS to the HS
Replace the PS with the HS
Swap the strings between the newlines retaining the remainder of the first line
Repeat

How to format text using UNIX commands?

I'm trying to display all the files in a directory that have the same contents in a specific way. If the file is unique, it does not need to be displayed. Any file that is identical to others need to be displayed on the same line separated by commas.
For example,
c176ada8afd5e7c6810816e9dd786c36 2group1
c176ada8afd5e7c6810816e9dd786c36 2group2
e5e6648a85171a4af39bbf878926bef3 4group1
e5e6648a85171a4af39bbf878926bef3 4group2
e5e6648a85171a4af39bbf878926bef3 4group3
e5e6648a85171a4af39bbf878926bef3 4group4
2d43383ddb23f30f955083a429a99452 unique
3925e798b16f51a6e37b714af0d09ceb unique2
should be displayed as,
2group1, 2group2
4group1, 4group2, 4group3, 4group4
I know which files are considered unique in a directory from using md5sum, but I do not know how to do the formatting part. I think the solution involves awk or sed, but I am not sure. Any suggestions?
Awk solution (for your current input):
awk '{ a[$1]=a[$1]? a[$1]", "$2:$2 }END{ for(i in a) if(a[i]~/,/) print a[i] }' file
a[$1]=a[$1]? a[$1]", "$2:$2 - accumulating group names (from field $2) for each unique hash presented by the 1st field value $1. The array a is indexed by hashes with concatenated group names as a values (separated by a comma ,).
for(i in a) - iterating through array items
if(a[i]~/,/) print a[i] - means: if the hash associated with more than one group (separated by comma ,) - print the item
The output:
2group1, 2group2
4group1, 4group2, 4group3, 4group4
Given the input you provided, you essentially want to collect all the second columns where the first column is the same. So the first step is use awk to hash the second columns by the first. I leverage the solution posted here: Concatenate lines by first column by awk or sed
awk '{table[$1]=table[$1] $2 ",";} END {for (key in table) print key " => " table[key];}' file
c176ada8afd5e7c6810816e9dd786c36 => 2group1,2group2,
e5e6648a85171a4af39bbf878926bef3 => 4group1,4group2,4group3,4group4,
3925e798b16f51a6e37b714af0d09ceb => unique2,
2d43383ddb23f30f955083a429a99452 => unique,
And if you really want to filter to exclude the unique ones, just make sure you have at least two fields (telling AWK to use ',' as the separator):
awk '{table[$1]=table[$1] $2 ",";} END {for (key in table) print key " => " table[key];}' file | awk -F ',' 'NF > 2'
c176ada8afd5e7c6810816e9dd786c36 => 2group1,2group2,
e5e6648a85171a4af39bbf878926bef3 => 4group1,4group2,4group3,4group4,
perl:
perl -lane '
push #{$groups{$F[0]}}, $F[1]
} END {
for $g (keys %groups) {
print join ", ", #{$groups{$g}} if #{$groups{$g}} > 1
}
' file
The order of the output is indeterminate.
This might work for you (GNU sed):
sed -r 'H;x;s/((\S+)\s+\S+)((\n[^\n]+)*)\n\2\s+(\S+)/\1,\5\3/;x;$!d;x;s/.//;s/^\S+\s*//Mg;s/\n[^,]+$//Mg;s/,/, /g' file
Gather up all the lines of the file and use pattern matching to collapse the lines. At the end of the file, remove the keys and any unique lines and then print the remainder.

Split line into multiple lines of 42 Unix after last given char

I have a text file in unix formed from multiple long lines
ALTER Tit como(titel('42423432;434235111;757567562;2354679;5543534;6547673;32322332;54545453'))
ALTER Mit como(Alt('432322;434434211;754324237562;2354679;5543534;6547673;32322332;54545453'))
I need to split each line in multiple lines of no longer than 42 characters.
The split should be done at the end of last ";", and
so my ideal output file will be :
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
I used fold -w 42 givenfile.txt | sed 's/ $/ -/g'
it splits the line but doesnt add the "-" at the end of the line and doesnt split after the ";".
any help is much appreciated.
Thanks !
awk -F';' '
w{
print""
}
{
w=length($1)
printf "%s",$1
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
}
END{
print""
}
' file
This produces the output:
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
How it works
Awk implicitly loops through each line of its input and each line is divided into fields. This code uses a single variable w to keep track of the current width of the output line.
-F';'
Tell awk to break fields on semicolons.
`w{print""}
If the last line was not completed, w>0, then print a newline to terminate it before we start with a new line.
w=length($1); printf "%s",$1
Print the first field of the new line and set w according to its length.
Loop over the remaining fields:
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
This loops over the second to final fields of this line. Whenever we reach the point where we can't print another field without exceeding the 42 character limit, we print ; -\n.
END{print""}
Print a newline at the end of the file.
This might work for you (GNU sed):
sed -r 's/.{1,42}$|.{1,41};/& -\n/g;s/...$//' file
This globally replaces 1 to 41 characters followed by a ; or 1 to 42 characters followed by end of line with -\n. The last string will have three characters too many and so they are deleted.

pattern matching and delete all the lines except the last occurence

I have a txt file which is having 100+ lines, i want to search for pattern and delete all the lines except the last occurrence.
Here are the lines from the txt file.
my pattern search is "string1=" , "string2=", "string3=" , "string4=" and "string5="
string1=hi
string2=hello
string3=welcome
string3=welcome1
string3=
string4=hi
string5=hello
i want to go through the each line and keep "string3=" is empty on the file and remove the "string3=welcome" ,"string3=welcome1"
please help me.
For a single pattern, you can start with something like this:
grep "string3" input | tail -1
#!/usr/bin/perl
my %h;
while (<STDIN>) {
my ($k, $v) = split /=/;
$h{$k} = $v;
}
foreach my $k ( sort keys %h ) {
print "$k=$h{$k}";
}
The perl script here will take your list as stdin and process output as you mention. This assumes you want the keys (string*) as sorted output.
If you only wants the values that start with string1-5 only then you can put a match in the beginning of your while loop as so:
next if ! /^string[1-5]=/;

ive been searching this to get a sense but i am still confused

i'm confused about the $symbol for unix.
according to the definition, it states that it is the value stored by the variable following it. i'm not following the definition - could you please give me an example of how it is being used?
thanks
You define a variable like this:
greeting=hello
export name=luc
and use like this:
echo $greeting $name
If you use export that means the variable will be visible to subshells.
EDIT: If you want to assign a string containing spaces, you have to quote it either using double quotes (") or single quotes ('). Variables inside double quotes will be expanded whereas in single quotes they won't:
axel#loro:~$ name=luc
axel#loro:~$ echo "hello $name"
hello luc
axel#loro:~$ echo 'hello $name'
hello $name
In case of shell sctipts. When you assign a value to a variable you does not need to use $ simbol. Only if you want to acces the value of that variable.
Examples:
VARIABLE=100000;
echo "$VARIABLE";
othervariable=$VARIABLE+10;
echo $othervariable;
The other thing: if you use assignment , does not leave spaces before and after the = simbol.
Here is a good bash tutorial:
http://linuxconfig.org/Bash_scripting_Tutorial
mynameis.sh:
#!/bin/sh
finger | grep "`whoami` " | tail -n 1 | awk '{FS="\t";print $2,$3;}'
finger: prints all logged in user example result:
login Name Tty Idle Login Time Office Office Phone
xuser Forname Nickname tty7 3:18 Mar 9 07:23 (:0)
...
grep: filter lines what containing the given string (in this example we need to filter xuser if our loginname is xuser)
http://www.gnu.org/software/grep/manual/grep.html
whoami: prints my loginname
http://linux.about.com/library/cmd/blcmdl1_whoami.htm
tail -n 1 : shows only the last line of results
http://unixhelp.ed.ac.uk/CGI/man-cgi?tail
the awk script: prints the second and third column of the result: Forname, Nickname
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_toc.html

Resources