Unix: multi and single character delimiter in cut or awk commands - unix

This is the string I have:
my_file1.txt-myfile2.txt_my_file3.txt
I want to remove all the characters after the first "_" that follows the first ".txt".
From the above example, I want the output to be my_file1.txt-myfile2.txt. I have to search for first occurrence of ".txt" and continue parsing until I find the underscore character, and remove everything from there on.
Is it possible to do it in sed/awk/cut etc commands?

You can't do this job with cut but you can with sed and awk:
$ sed 's/\.txt/\n/g; s/\([^\n]*\n[^_]*\)_.*/\1/; s/\n/.txt/g' file
my_file1.txt-myfile2.txt
$ awk 'match($0,/\.txt[^_]*_/){print substr($0,1,RSTART+RLENGTH-2)}' file
my_file1.txt-myfile2.txt

Could you please try following, written based on your shown samples.
awk '{sub(/\.txt_.*/,".txt")} 1' Input_file
Simply substituting everything from .txt_ to till last of line with .txt and printing the line here

Related

delete text with delimiter in unix

I have a text file in the below format . I need to remove the text between the first and second semicolon (delimiter ), but retain the second semicolon
$cat test.txt
abc;def;ghi;jkl
mno;pqr;stu,xxx
My expected output
abc;ghi;jkl
mno;stu,xxx
I tried using sed 's/^([^;][^;]*);.*$/\1/', but it removes everything after the first semicolon. I also tried with cut -d ';' -f2, this only give the 2nd field as output.
Using cut
cut -d";" -f2 --complement file
-d is for delimeter, i.e ";" in your case
-f is for field, i.e keep the fields listed
--complement is to reverse the selection, i.e remove the fields listed
So:
$ cat test.txt
abc;def;ghi;jkl
mno;pqr;stu;xxx
$ cut -d";" -f2 --complement test.txt
abc;ghi;jkl
mno;stu;xxx
You may use this sed:
sed 's/;[^;]*//' file
abc;ghi;jkl
mno;stu,xxx
You can do it directly by simply removing the 2nd occurrence of the characters in question, e.g.
sed 's/[^;]*;//2' test.txt
Example Use/Output
$ sed 's/[^;]*;//2' test.txt
abc;ghi;jkl
mno;stu,xxx
A thanks to #EdMorton for improvements here as well.
If you did want to use awk, you could simply replace the 2nd field with nothing as well, e.g.
awk -F';' '{sub(/;[^;]*/,"")}1' test.txt
(same output)
With a thanks to #EdMorton for the improvement to the original.
Or as Cyrus suggest with cut, deleting field 2, e.g.
cut -d';' -f-1,3- test.txt
(same output)
Trying to fix OP's attempts here, with sed you could try following code. Simple explanation would be, create 1st back reference which has value till 1st occurrence of ; then from 1st ; to 2nd ; don't keep it in backreference and keep rest of the value in 2nd back reference. Finally while substituting substitute it with 1st and 2nd back reference values.
sed -E 's/^([^;]*);[^;]*;(.*)/\1;\2/' Input_file
OR as per Ed's comment please try following;
sed -E 's/^([^;]*);[^;]*/\1/' Input_file
super lazy awk solution
gawk/mawk/mawk2 'sub(/;[^;]+/,"")'
a more verbose solution but makes it clearer what it's doing
g/mawk 'BEGIN {FS=";+"; OFS=";"} ($2="")||($0=$0)&&($1=$1)'
clean out 2nd field, but since null string is assigned in, it returns 0 (false), thus requiring logical or || to continue.
$0=$0 plus $1=$1 to clean up extra ;, which will also print it.

Replace characters in a delimited part of a file

I have the file teste.txt with the following content:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 1WFG3SNVDG9 71530894204
I execute the command
sed -e 's/^\(.\{18\}\)[0-9]/\1#/g' teste.txt
The result is:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 #WFG3SNVDG9 71530894204
Only the 19th position in line 3 is changed from 1 to #.
I would like to know how can I change all numeric characters from the 19th to the 30th position.
The expected result is:
02183101399205000 GBTD#VBYMBQ 04455927964
02183101409310000 XBQMPL#C##B 27699484827
54183101003651000 #WFG#SNVDG# 71530894204
An awk command to accomplish your goal:
awk '{ gsub(/[0-9]/,"#",$2); print }' teste.txt
This might work for you (GNU sed):
sed -r 's/./&\n/30;s//\n&/19;h;s/[0-9]/#/g;H;x;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Surround the string, which is from the 19th to the 30th character, by newlines and make a copy. Replace all digits by #'s. Append this string to the original and use pattern matching to rearrange the strings to make a new string with the unchanged parts either side of the changed part, at the same time discarding the introduced newlines.
An alternative method, utilising the fact the the fields are space separated:
sed -r ':a;s/( \S*)[0-9](\S* )/\1#\2/;ta' file
In fact the two methods can be combined:
sed -r 's/./&\n/30;s//\n&/19;:a;s/(\n.*)[0-9](.*\n)/\1#\2/;ta;s/\n//g' file

Join lines depending on the line beginning

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with a space, empty line or a nonnumeric character. E.g.
40403813|7|Failed|No such file or directory|1
40403816|7|Hi,
The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...
I'd like join the split line back with the previous line (as mentioned below):
40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...
using a Unix command like sed/awk. I'm not clear how to join a line with the preceeding one.
Any suggestion?
awk to the rescue!
awk -v ORS='' 'NR>1 && /^[0-9]/{print "\n"} NF' file
only print newline when the current line starts with a digit, otherwise append rows (perhaps you may want to add a space to ORS if the line break didn't preserve the space).
Don't do anything based on the values of the strings in your fields as that could go wrong. You COULD get a wrapping line that starts with a digit, for example. Instead just print after every complete record of 5 fields:
$ awk -F'|' '{rec=rec $0; nf+=NF} nf>=5{print rec; nf=0; rec=""}' file
40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
Try:
awk 'NF{printf("%s",$0 ~ /^[0-9]/ && NR>1?RS $0:$0)} END{print ""}' Input_file
OR
awk 'NF{printf("%s",/^[0-9]/ && NR>1?RS $0:$0)} END{print ""}' Input_file
It will check if each line starts from a digit or not if yes and greater than line number 1 than it will insert a new line with-it else it will simply print it, also it will print a new line after reading the whole file, if we not mention it, it is not going to insert that at end of the file reading.
If you only ever have the line split into two, you can use this sed command:
sed 'N;s/\n\([^[:digit:]]\)/\1/;P;D' infile
This appends the next line to the pattern space, checks if the linebreak is followed by something other than a digit, and if so, removes the linebreak, prints the pattern space up to the first linebreak, then deletes the printed part.
If a single line can be broken across more than two lines, we have to loop over the substitution:
sed ':a;N;s/\n\([^[:digit:]]\)/\1/;ta;P;D' infile
This branches from ta to :a if a substitution took place.
To use with Mac OS sed, the label and branching command must be separate from the rest of the command:
sed -e ':a' -e 'N;s/\n\([^[:digit:]]\)/\1/;ta' -e 'P;D' infile
If the continuation lines always begin with a single space:
perl -0000 -lape 's/\n / /g' input
If the continuation lines can begin with an arbitrary amount of whitespace:
perl -0000 -lape 's/\n(\s+)/$1/g' input
It is probably more idiomatic to write:
perl -0777 -ape 's/\n / /g' input
You can use sed when you have a file without \r :
tr "\n" "\r" < inputfile | sed 's/\r\([^0-9]\)/\1/g' | tr '\r' '\n'

sed - find text between 2 strings and use it for replace

I have a file with many lines like below:
townValue.put("Aachen");
townValue.put("Aalen");
townValue.put("Ahlen");
townValue.put("Arnsberg");
townValue.put("Aschaffenburg");
townValue.put("Augsburg");
I want to change this lines to:
townValue.put("Aalen", "Aalen");
townValue.put("Ahlen", "Ahlen");
townValue.put("Arnsberg", "Arnsberg");
townValue.put("Aschaffenburg", "Aschaffenburg");
townValue.put("Augsburg", "Augsburg");
How can I achieve this with sed or awk. This seems to be a special find & replace task, I couldn't find yet in the net.
Thanks for the help
Use sed -e 's/"[^"]*"/&, &/':
$ cat 1
townValue.put("Aachen");
townValue.put("Aalen");
townValue.put("Ahlen");
townValue.put("Arnsberg");
townValue.put("Aschaffenburg");
townValue.put("Augsburg");
$ sed -e 's/"[^"]*"/&, &/' 1
townValue.put("Aachen", "Aachen");
townValue.put("Aalen", "Aalen");
townValue.put("Ahlen", "Ahlen");
townValue.put("Arnsberg", "Arnsberg");
townValue.put("Aschaffenburg", "Aschaffenburg");
townValue.put("Augsburg", "Augsburg");
According to sed(1):
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
Code for awk,because of the large number of quotes in the command line I recommend to use a script:
awk -f script file
script
BEGIN {FS=OFS="\""}
$3=", \""$2"\""$3
$ cat file
townValue.put("Aachen");
townValue.put("Aalen");
townValue.put("Ahlen");
townValue.put("Arnsberg");
townValue.put("Aschaffenburg");
townValue.put("Augsburg");
$ awk -f script file
townValue.put("Aachen", "Aachen");
townValue.put("Aalen", "Aalen");
townValue.put("Ahlen", "Ahlen");
townValue.put("Arnsberg", "Arnsberg");
townValue.put("Aschaffenburg", "Aschaffenburg");
townValue.put("Augsburg", "Augsburg");

Replacing the last column in a unix file with another value

I would like to replace the last column values with another value using vim editor or sed command.I tried the below command but it replaces the data which is already present in my 2nd column sometimes.
:%s/,3/,2/
My sample data:
410339,166,1430,3
410340,112,1840,3
410341,109,1315,3
410342,123,1435,3
410343,230,3200,3
410344,857,36975,3
410345,125,4440,3
410346,105,1460,3
410348,122,1150,3
410349,314,2380,3
410350,132,4650,3
410351,136,7465,3
410352,103,1775,3
410353,101,1095,3
410354,101,1360,3
You'll need to indicate that you want the end-of-line:
:%s/,3$/2/
To complete your attempt in Vim, just anchor the matched expression to the end of the line with $.
:%s/,3$/,2/
Produces:
410339,166,1430,2
410340,112,1840,2
410341,109,1315,2
410342,123,1435,2
410343,230,3200,2
410344,857,36975,2
410345,125,4440,2
410346,105,1460,2
410348,122,1150,2
410349,314,2380,2
410350,132,4650,2
410351,136,7465,2
410352,103,1775,2
410353,101,1095,2
410354,101,1360,2
I would use awk:
awk '{$NF=2}1' FS=, OFS=,
This unconditionally makes the last column have the value 2. If the desired replacement is a string, you will need to use quotes or the more flexible:
awk '{$NF=r}1' FS=, OFS=, r="the string to put in the last column"
You can restrict the replacement to those columns that have the value 3 with:
awk '$NF==3{$NF=r}1' FS=, OFS=, r="the string to put in the last column"
And, to do this in vim, just do:
:%! awk ...
With GNU sed:
If the last column is only ever '3':
sed -i 's/3$/2/' file
If the last column could be something like '13':
sed -i 's/,3$/,2/' file
The -i flag:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)

Resources