delete text with delimiter in unix - unix

I have a text file in the below format . I need to remove the text between the first and second semicolon (delimiter ), but retain the second semicolon
$cat test.txt
abc;def;ghi;jkl
mno;pqr;stu,xxx
My expected output
abc;ghi;jkl
mno;stu,xxx
I tried using sed 's/^([^;][^;]*);.*$/\1/', but it removes everything after the first semicolon. I also tried with cut -d ';' -f2, this only give the 2nd field as output.

Using cut
cut -d";" -f2 --complement file
-d is for delimeter, i.e ";" in your case
-f is for field, i.e keep the fields listed
--complement is to reverse the selection, i.e remove the fields listed
So:
$ cat test.txt
abc;def;ghi;jkl
mno;pqr;stu;xxx
$ cut -d";" -f2 --complement test.txt
abc;ghi;jkl
mno;stu;xxx

You may use this sed:
sed 's/;[^;]*//' file
abc;ghi;jkl
mno;stu,xxx

You can do it directly by simply removing the 2nd occurrence of the characters in question, e.g.
sed 's/[^;]*;//2' test.txt
Example Use/Output
$ sed 's/[^;]*;//2' test.txt
abc;ghi;jkl
mno;stu,xxx
A thanks to #EdMorton for improvements here as well.
If you did want to use awk, you could simply replace the 2nd field with nothing as well, e.g.
awk -F';' '{sub(/;[^;]*/,"")}1' test.txt
(same output)
With a thanks to #EdMorton for the improvement to the original.
Or as Cyrus suggest with cut, deleting field 2, e.g.
cut -d';' -f-1,3- test.txt
(same output)

Trying to fix OP's attempts here, with sed you could try following code. Simple explanation would be, create 1st back reference which has value till 1st occurrence of ; then from 1st ; to 2nd ; don't keep it in backreference and keep rest of the value in 2nd back reference. Finally while substituting substitute it with 1st and 2nd back reference values.
sed -E 's/^([^;]*);[^;]*;(.*)/\1;\2/' Input_file
OR as per Ed's comment please try following;
sed -E 's/^([^;]*);[^;]*/\1/' Input_file

super lazy awk solution
gawk/mawk/mawk2 'sub(/;[^;]+/,"")'
a more verbose solution but makes it clearer what it's doing
g/mawk 'BEGIN {FS=";+"; OFS=";"} ($2="")||($0=$0)&&($1=$1)'
clean out 2nd field, but since null string is assigned in, it returns 0 (false), thus requiring logical or || to continue.
$0=$0 plus $1=$1 to clean up extra ;, which will also print it.

Related

Using sed to replace symbol after semicolon

Trying to make use of the sed command in order to change a word after a semicolon, like so (fileGrades.txt):
Student;Grade;Comment;
Eric;1;None;
Smith;2;None;
Thomas;1;None;
Chad;3;Nice work;
Now using sed command should find Eric and Chad and change both of their grades to 2, but leave the rest untouched. I was thinking of doing it with this method (see below), but it didn't work as it would not allow me to utilize the semicolon to know where to change the grade.
sed -i 's/Chad;*/Chad;2/g' fileGrades.txt
I also tried this method using wild cards such as *, ^ and . , but it didn't work.
You can use
sed -E -i 's/(Eric|Chad);[0-9]*/\1;2/g' fileGrades.txt
Details:
-E - POSIX ERE enabled
-i - the contents of the input file gets modified
s/(Eric|Chad);[0-9]*/\1;2 - matches and captures into Group 1 (\1) Eric or Chad, then matches ; and zero or more digits, and replaces this match with the Group 1 value, ; and 2.
See the online demo:
#!/bin/bash
s='Student;Grade;Comment;
Eric;1;None;
Smith;2;None;
Thomas;1;None;
Chad;3;Nice work;'
sed -E 's/(Eric|Chad);[0-9]*/\1;2/g' <<< "$s"
Output:
Student;Grade;Comment;
Eric;2;None;
Smith;2;None;
Thomas;1;None;
Chad;2;Nice work;
This is a tailor made problem for awk, use following awk code in your shown samples case.
awk 'BEGIN{FS=OFS=";"} FNR==1{print;next} $1=="Eric" || $1=="Chad"{$2=2} 1' Input_file
Once you are happy with above code's results then try following code to save output into Input_file itself.
awk 'BEGIN{FS=OFS=";"} FNR==1{print;next} $1=="Eric" || $1=="Chad"{$2=2} 1' Input_file > temp && mv temp Input_file

Sed/Awk: how to insert text in a series pattern in a line

need to insert '\N' between whereever 2 sequencial commas in the line like below:
"abc,,,,5,,,3.2,,"
to:
"abc,\N,\N,\N,5,\N,\N,3.2,\N,"
Also, the number of the consequencial comma is not fixed, maybe 6, 7 or more. Need a flexible way to handle it.
Didn't find a clear solution from the google.
You can just use the following sed command:
sed 's/,,/,\\N,/g;s/,,/,\\N,/g;'
Demo:
$ echo 'abc,,,,5,,,3.2,,' | sed 's/,,/,\\N,/g;s/,,/,\\N,/g;s/,,/,\\N,/g'
abc,\N,\N,\N,5,\N,\N,3.2,\N,
Explanations:
s/,,/,\\N,/g will replace ,, by ,\N, globally on the string, you will have however to do two passes on the pattern space to be sure that all the replacements took place giving the commands: s/,,/,\\N,/g;s/,,/,\\N,/g;.
Additional notes:
To answer to your doubts about this approach not being flexible, I have prepared the following input file.
$ cat input_comma.txt
abc,,,,5,,,3.2,,
,,,,,,def,
1,,,,,,1.2
6commas,,,,,,
7commas,,,,,,,
As you can see, it does not matter how many successive commas are present in the input:
$ sed 's/,,/,\\N,/g;s/,,/,\\N,/g;s/,,/,\\N,/g' input_comma.txt
abc,\N,\N,\N,5,\N,\N,3.2,\N,
,\N,\N,\N,\N,\N,def,
1,\N,\N,\N,\N,\N,1.2
6commas,\N,\N,\N,\N,\N,
7commas,\N,\N,\N,\N,\N,\N,
With awk a similar approach in 2 passes can be implemented in the same way:
$ echo "test,,,mmm,,,,aa,," | awk '{gsub(/\,\,/,",\\N,");gsub(/\,\,/,",\\N,")} 1'
test,\N,\N,mmm,\N,\N,\N,aa,\N,
Could you please try following once.
awk '{gsub(/\,\,/,",\\N,");gsub(/\,\,/,",\\N,")} 1' Input_file
With perl:
perl -pe '1 while s/,,/,\\N,/g'

Replace characters in a delimited part of a file

I have the file teste.txt with the following content:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 1WFG3SNVDG9 71530894204
I execute the command
sed -e 's/^\(.\{18\}\)[0-9]/\1#/g' teste.txt
The result is:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 #WFG3SNVDG9 71530894204
Only the 19th position in line 3 is changed from 1 to #.
I would like to know how can I change all numeric characters from the 19th to the 30th position.
The expected result is:
02183101399205000 GBTD#VBYMBQ 04455927964
02183101409310000 XBQMPL#C##B 27699484827
54183101003651000 #WFG#SNVDG# 71530894204
An awk command to accomplish your goal:
awk '{ gsub(/[0-9]/,"#",$2); print }' teste.txt
This might work for you (GNU sed):
sed -r 's/./&\n/30;s//\n&/19;h;s/[0-9]/#/g;H;x;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Surround the string, which is from the 19th to the 30th character, by newlines and make a copy. Replace all digits by #'s. Append this string to the original and use pattern matching to rearrange the strings to make a new string with the unchanged parts either side of the changed part, at the same time discarding the introduced newlines.
An alternative method, utilising the fact the the fields are space separated:
sed -r ':a;s/( \S*)[0-9](\S* )/\1#\2/;ta' file
In fact the two methods can be combined:
sed -r 's/./&\n/30;s//\n&/19;:a;s/(\n.*)[0-9](.*\n)/\1#\2/;ta;s/\n//g' file

How to read nth line and mth field of text file in unix

Suppose i have | delimeted file,
Line1: 1|2|3|4
Line2: 5|6|7|8
Line3: 9|9|1|0
Now i need to read 3 field at second line which is 7 in above example how i can do that using Cut or Sed Command. I'm new to unix please help
A job for awk:
awk -F '|' 'NR==2{print $3}' file
or
awk -F '|' -v row=2 -v col=3 'NR==row{print $col}' file
Output:
7
This should work:
sed -n '2p' file |awk -F '|' '{print $3}'
This might work for you (GNU sed):
sed -rn '2s/^(([^|]*)\|?){3}.*/\2/p' file
Turn off automatic printing by setting the -n option, turn on easier regexp declaration by -r option. Use pattern matching and back references to replace the whole of the second line by the third field of the same line and print the result.
The address of the substitution command is limited to only the second line.
The regexp groups the non-delimited characters followed by a delimiter a specific number of times. The second group, only retains the non-delimited characters for the specific number. Each grouping is replaced by the next and so the last grouping is reported, the .* consumes the remainder of the line and so only the third field (contents of second group) is printed.
N.B. the delimiter would be present following the final column and is therefore optional \|?

Swap columns in a dictionary file

I need to change Finnish-Czech dictionary into the Czech-Finnish dictionary.
I tried this command:
sed -ne 's/\([^a-z A-Z]*\) \(.*\)$/\2 \1/ p' finnish-czech.txt
But the first back-reference doesn't work. I realized the end of the back-reference is false and instead of taking only first column it takes everything.
The seperator is <TAB>:
sed -r 's/^([^\t]*)\t([^\t]*)$/\2\t\1/p' finnish-czech.txt
Finnish field match( ^([^\t]*)) then TAB(\t) then czech filed match (([^\t]*)$
This is a simple job for awk:
awk '{ print $2 "\t" $1; }' <finnish-czech.txt
For each line, this prints the second field, then a tab, then the first field.
One possible complication is that your file seems to have carriage-returns preceding the newlines - you will probably want to remove them with tr -d '\r' or similar.

Resources