sed doesn't recognize special characters - unix

I have a big ttl file and there are some errors when I load it
There are some special characters in the text that generates error :
<https://permid.org/1-5037622197>
a tr-org:Organization ;
mdaas:HeadquartersAddress "Germany\n"^^xsd:string ;
tr-common:hasPermId "5037622197"^^xsd:string;
tr-org:hasActivityStatus tr-org:statusActive ;
fibo-be-le-cb:isDomiciledIn <http://sws.geonames.org/2921044/> ;
vcard:organization-name "ARWOBAU Immobilien und Beteiligungs GmbH"^^xsd:string .
I want to get ride off all ^^xsd:string. I used sed command but this seem to not work, please find the code below
sed -i -e 's/^^xsd:string//g' test.txt
test.txt is the example above

Related

How to print \n as a text without creating newline within a quote in R cat

I want to print the following lines in a text file using cat function in R:
shebang line
for file in *.out; do sed $'s/Cluster/\\\n&/g' $file > "$(basename "$file" .out)_split.out2"; done
To do this I use:
cat("shebang line","for file in *.out; do sed $'s/Cluster/\\\n&/g' $file > \"$(basename \"$file\" .out)_split.out2\"; done",file="output.txt",sep="\n",append=TRUE)
But in output.txt, I get:
shebang line
for file in *.out; do sed $'s/Cluster/\
&/g' $file > "$(basename "$file" .out)_split.out2"; done
Seems like it creates a extra newline and don't print \n within single quote.
Considering that I do need to use \n as a separator, how can I print the character as is without creating a newline?
Thanks
Try:
cat("shebang line","for file in *.out; do sed $'s/Cluster/\\\\\\n&/g' $file > \"$(basename \"$file\" .out)_split.out2\"; done",file="output.txt",sep="\n",append=TRUE)

Sed command garbled on Solaris [duplicate]

I have this data in file.txt:
1234-abca-dgdsf-kds-2;abc dfsfds 2
123-abcdegfs-sdsd;dsfdsf dfd f
12523-cvjbsvndv-dvd-dvdv;dsfdsfpage
I want to replace the string after "-" and up to ";" with just ";", so that I get:
1234;abc dfsfds 2
123;dsfdsf dfd f
12523;dsfdsfpage
I tried with the command:
sed -e "s/-.*;/;" file.txt
But it gives me the following error:
sed command garbled
Why is this happening?
sed replacement commands are defined as (source):
's/REGEXP/REPLACEMENT/[FLAGS]'
(substitute) Match the regular-expression against the content of the pattern space. If found, replace matched string with REPLACEMENT.
However, you are saying:
sed "s/-.*;/;"
That is:
sed "s/REGEXP/REPLACEMENT"
And hence missing a "/" at the end of the expression. Just add it to have:
sed "s/-.*;/;/"
# ^
You are missing a slash at the end of the sed command:
Should be "s/-.*;/;/"
-.* here the * greedy, so this would fail if there are more than one ;
echo "12523-cvjbsvndv-dvd-dvdv;dsfdsfpage;test" | sed -e "s/-.*;/;/"
12523;test
Change to -[^;]*
echo "12523-cvjbsvndv-dvd-dvdv;dsfdsfpage;test" | sed -e "s/-[^;]*;/;/"
12523;dsfdsfpage;test
This should work :
sed 's/-.*;/;/g' file > newFile

Split line with multiple delimiters in Unix

I have the below lines in a file
id=1234,name=abcd,age=76
id=4323,name=asdasd,age=43
except that the real file has many more tag=value fields on each line.
I want the final output to be like
id,name,age
1234,abcd,76
4323,asdasd,43
I want all values before (left of) the = to come out as separated with a , as the first row and all values after the (right side) of the = to come below for in each line
Is there a way to do it with awk or sed? Please let me know if for loop is required for the same?
I am working on Solaris 10; the local sed is not GNU sed (so there is no -r option, nor -E).
$ cat tst.awk
BEGIN { FS="[,=]"; OFS="," }
NR==1 {
for (i=1;i<NF;i+=2) {
printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
}
}
{
for (i=2;i<=NF;i+=2) {
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
id,name,age
1234,abcd,76
4323,asdasd,43
Assuming they don't really exist in your input, I removed the ...s etc. that were cluttering up your example before running the above. If that stuff really does exist in your input, clarify how you want the text "(n number of fields)" to be identified and removed (string match? position on line? something else?).
EDIT: since you like the brevity of the cat|head|sed; cat|sed approach posted in another answer, here's the equivalent in awk:
$ awk 'NR==1{h=$0;gsub(/=[^,]+/,"",h);print h} {gsub(/[^,]+=/,"")} 1' file
id,name,age
1234,abcd,76
4323,asdasd,43
FILE=yourfile.txt
# first line (header)
cat "$FILE" | head -n 1 | sed -r "s/=[^,]+//g"
# other lines (data)
cat "$FILE" | sed -r "s/[^,]+=//g"
sed -r '1 s/^/id,name,age\n/;s/id=|name=|age=//g' my_file
edit: or use
sed '1 s/^/id,name,age\n/;s/id=\|name=\|age=//g'
output
id,name,age
1234,abcd,76 ...(n number of fields)
4323,asdasd,43...
The following simply combines the best of the sed-based answers so far, showing you can have your cake and eat it too. If your sed does not support the -r option, chances are that -E will do the trick; all else failing, one can replace R+ by RR* where R is [^,]
sed -r '1s/=[^,]+//g; s/[^,]+=//g'
(That is, the portable incantation would be:
sed "1s/=[^,][^,]*//g; s/[^,][^,]*=//g"
)

add information to a text file

I have a text file and I would like to insert in the middle of the text some more text.
I have got this input text:
[etc....]
relay_recipient_maps =
btree:/opt/pmx69/postfix/etc/vdm_valid_users_yoda,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_luke,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_rd2d2,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_c3p0,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_dark_vador
[etc...]
After the last line, I want to insert Anakin after "the dark vador line", but I haven't any idea how to solve my problem.
Here's one way using GNU sed. Run like:
sed -f script.sed file.txt
Contents of script.sed:
/relay_recipient_maps =/,/[^,]$/ {
/relay_recipient_maps =/ n
/[^,]$/ {
s/[ \t]*$/,/
a\ btree:/opt/pmx69/postfix/etc/vdm_valid_users_anakin
}
}
or:
/relay_recipient_maps =/,/[^,]$/ {
/btree.*[^,]$/ {
s/[ \t]*$/,/
a\ btree:/opt/pmx69/postfix/etc/vdm_valid_users_anakin
}
}
Results:
[etc....]
relay_recipient_maps =
btree:/opt/pmx69/postfix/etc/vdm_valid_users_yoda,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_luke,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_rd2d2,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_c3p0,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_dark_vador,
btree:/opt/pmx69/postfix/etc/vdm_valid_users_anakin
[etc...]
Depends on what shell you are using.
You can do this within bash, zsh and maybe some others:
echo "the dark vador line" >> file_with_content.txt
This will append the string to the end of the file.
You can use sed:
sed -e '/dark_vador/ s/$/Anakin/' file.txt
For every line that contains the pattern dark_vador, the string Anakin will be appended to the end of the line.
If you only want to do this for the first match, you can quit after appending:
sed -e '/dark_vador/ { s/$/Anakin/; q }' file.txt
The s and q commands must be grouped together so they are only executed on lines that have dark_vador, otherwise q would quit after the first line read (even if it doesn't contain dark_vador).
If you would like to append a line after the dark_vador line, you can use the a command:
sed -e '/dark_vador/ a\
Anakin' file.txt
This will create a new line with Anakin after every line with dark_vador. As above, you can quit if you only want to add a line after the first match:
sed -e '/dark_vador/ { a\
Anakin
q }' file.txt
Hope this helps =)
This is a GNU sed alternative:
sed '/^relay_recipient_maps/,/[^,]$/{/btree:.*[^,]$/s##&,\n btree:/opt/pmx69/postfix/etc/vdm_valid_users_Anakin#}' file.txt
How it works:
For the block started with "relay_recipient_maps", and for all subsequent lines till one not ending with comma, then substitute the "btree:" line not ending with comma (this would be the last btree in the "relay_recipient_maps" block) with that same line plus "," and the new line desired.
This assumes the "relay_recipient_maps" block always ends with a btree: line not ending with ","

Removing trailing / starting newlines with sed, awk, tr, and friends

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)
Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)
From Useful one-line scripts for sed:
# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file
# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:
sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
So I'm going to borrow part of #dogbane's answer for this, since that sed line for removing the leading blank lines is so short...
tac is part of coreutils, and reverses a file. So do it twice:
tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'
It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.
here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line
awk '
/[[:graph:]]/ {
# a non-empty line
# set the flag to begin printing lines
p=1
# print the accumulated "interior" empty lines
for (i=1; i<=n; i++) print ""
n=0
# then print this line
print
}
p && /^[[:space:]]*$/ {
# a potentially "interior" empty line. remember it.
n++
}
' filename
Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.
As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get
echo "$(echo "$(tac "$filename")" | tac)"
which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.
Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.
sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.
An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.
sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'
Testing:
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n'
foo
foo
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
\t $
$
foo$
$
foo$
$
\t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo
foo
prompt$
using awk:
awk '{a[NR]=$0;if($0 && !s)s=NR;}
END{e=NR;
for(i=NR;i>1;i--)
if(a[i]){ e=i; break; }
for(i=s;i<=e;i++)
print a[i];}' yourFile
this can be solved easily with sed -z option
sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello
Welcome to
Unix and Linux
For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.
sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:
sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
I've tried a simple performance comparison with the well-known recursive script
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
on a 3MB file with 1MB of random blank lines around a random base64 text.
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)
For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.
sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
Using bash
$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
In bash, using cat, wc, grep, sed, tail and head:
# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i
Man, it's definitely worth to learn "real" programming language to avoid that ugliness!
#dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with #dogbane's sed command to remove both leading and trailing blanks.
awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'
This is pretty simple in operation.
Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.
So the only things that get buffered and never displayed are any trailing blanks.
I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.
This AWK script will do the trick:
BEGIN {
ne=0;
}
/^[[:space:]]*$/ {
ne++;
}
/[^[:space:]]+/ {
for(i=0; i < ne; i++)
print "";
ne=0;
print
}
The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).
It is memory efficient; it does not read the entire file into memory.
awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.
If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)
If you want to remove only those trailing lines that are empty, see #AndyMortimer's answer.
A bash solution.
Note: Only useful if the file is small enough to be read into memory at once.
[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
I'd like to introduce another variant for gawk v4.1+
result=($(gawk '
BEGIN {
lines_count = 0;
empty_lines_in_head = 0;
empty_lines_in_tail = 0;
}
/[^[:space:]]/ {
found_not_empty_line = 1;
empty_lines_in_tail = 0;
}
/^[[:space:]]*?$/ {
if ( found_not_empty_line ) {
empty_lines_in_tail ++;
} else {
empty_lines_in_head ++;
}
}
{
lines_count ++;
}
END {
print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
}
' "$file"))
empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}
if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
echo "Removing whitespace from \"$file\""
eval "gawk -i inplace '
{
if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
print
}
}
' \"$file\""
fi
Because I was writing a bash script anyway containing some functions, I found it convenient to write those:
function strip_leading_empty_lines()
{
while read line; do
if [ -n "$line" ]; then
echo "$line"
break
fi
done
cat
}
function strip_trailing_empty_lines()
{
acc=""
while read line; do
acc+="$line"$'\n'
if [ -n "$line" ]; then
echo -n "$acc"
acc=""
fi
done
}

Resources