No way to move pound (£) symbol in UNIX? - unix

Need get the delimiters at the starting of each line, below are sample input and output files for reference. In actual Delimiter used are( £{, ^$^)
Note - The file to be rearranged has huge data
Have tried the below but it does not work:
tr £{ \\n
sed 's/£{/\n/g'
awk '{ gsub("£{", "\n") } 1'
Input File:
£{firstlinecontinues£{secondstartsfromhereandit
keepsoncontinueingtillend£{herecomes3rdand£{fi
nallyfourthisalsohere
Output File:
£{firstlinecontinues
£{secondstartsfromhereanditkeepsoncontinueingtillend
£{herecomes3rdand
£{finallyfourthisalsohere

With GNU awk for multi-char RS and \s:
$ awk -v RS='£{' 'NR>1{gsub(/\s/,""); print RS $0}' file
£{firstlinecontinues
£{secondstartsfromhereanditkeepsoncontinueingtillend
£{herecomes3rdand
£{finallyfourthisalsohere

awk 'BEGIN{RS="(£{\|\^\$\^)"; OFS=ORS=""}{$1=$1;print $0 (FNR>1?"\n":"") RT}' file

Since the £ symbol is represented by two Octal codes, 302 and 243, I was able to produce the desired result with this perl command:
perl -pe 's/(\302\243)/\n$1/g' data.txt
NOTE: Here's what I see on my system:
echo "£" | od -c
0000000 302 243 \n
0000003

Related

using sed or awk to double quote comma separate and concatenate a list

I have the following list in a text file:
10.1.2.200
10.1.2.201
10.1.2.202
10.1.2.203
I want to encase in "double quotes", comma separate and join the values as one string.
Can this be done in sed or awk?
Expected output:
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203","10.1.2.204"
The easiest is something like this (in pseudo code):
Read a line;
Put the line in quotes;
Keep that quoted line in a stack or string;
At the end (or while constructing the string), join the lines together with a comma.
Depending on the language, that is fairly straightforward to do:
With awk:
$ awk 'BEGIN{OFS=","}{s=s ? s OFS "\"" $1 "\"" : "\"" $1 "\""} END{print s}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
Or, less 'wall of quotes' to define a quote character:
$ awk 'BEGIN{OFS=",";q="\""}{s=s ? s OFS q$1q : q$1q} END{print s}' file
With sed:
$ sed -E 's/^(.*)$/"\1"/' file | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g'
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
(With Perl and Ruby, with a join function, it is easiest to push the elements onto a stack and then join that.)
Perl:
$ perl -lne 'push #a, "\"$_\""; END{print join(",", #a)}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
Ruby:
$ ruby -ne 'BEGIN{#arr=[]}; #arr.push "\"#{$_.chomp}\""; END{puts #arr.join(",")}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
here is another alternative
sed 's/.*/"&"/' file | paste -sd,
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
awk -F'\n' -v RS="\0" -v OFS='","' -v q='"' '{NF--}$0=q$0q' file
should work for given example.
Tested with gawk:
kent$ cat f
10.1.2.200
10.1.2.201
10.1.2.202
10.1.2.203
kent$ awk -F'\n' -v RS="\0" -v OFS='","' -v q='"' '{NF--}$0=q$0q' f
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
$ awk '{o=o (NR>1?",":"") "\""$0"\""} END{print o}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"

Display matched string to end of line

How to find a particular string in a file and display matched string and rest of the line?
For example- I have a line in a.txt:
This code gives ORA-12345 in my code.
So, I am finding string 'ORA-'
Output Should be:
ORA-12345 in my code
Tried using grep:
grep 'ORA-*' a.txt
but it gives whole line in the output.
# Create test data:
echo "junk ORA-12345 more stuff" > a.tst
echo "junk ORB-12345 another stuff" >> a.tst
# Actually command:
# the -o (--only-matching) flag will print only the matched result, and not the full line
cat a.tst | grep -o 'ORA-.*$' # ORA-12345 more stuff
As fedorqui pointed out you can use:
grep -o 'ORA-.*$' a.tst
An additional answer in awk:
awk '$0 ~ "ORA" {print substr($0, match($0, "ORA"))}' a.tst
From the inside out, here's what's going on:
match($0, "ORA") finds where in the line ORA appears. In this case, it happens to be position 17.
substr($0, match($0, "ORA")) then returns from position 17 to the end of the line.
$0 ~ "ORA" makes sure that the the above is applied only to those lines that contain ORA.
with sed
echo "This code gives ORA-12345 in my code." | sed 's/.*ORA-/ORA-/'

awk — getting minus instead of FILENAME

I am trying to add the filename to the end of each line as a new field. It works except instead of getting the filename I get -.
Base file:
070323111|Hudson
What I want:
070323111|Hudson|20150106.csv
What I get:
070323111|Hudson|-
This is my code:
mv $1 $1.bak
cat $1.bak | awk '{print $0 "|" FILENAME}' > $1
- is the way to present the filename when there is not such info. Since your are doing cat $1.bak | awk ..., awk is not reading from a file but from stdin.
Instead, just do:
awk '...' file
in your case:
awk '{print $0 "|" FILENAME}' $1.bak > $1
From man awk:
FILENAME
The name of the current input file. If no files are specified on the
command line, the value of FILENAME is “-”. However, FILENAME is
undefined inside the BEGIN rule (unless set by getline).

Append to same line using grep

I have a file with multiple lines. I'm trying to find lines that match a certain pattern and then get them appended to an output file, all on the same.
Ex:
Input file:
ABCD
other text
EFGH
other text
IJKLM
I'm trying to get the output to be :
ABCD EFGH IJKLM
An easy way to make grep output matches separated by spaces instead of newlines is to wrap it in a sub-shell with $(...) like this:
echo $(grep -o '^[A-Z]*$' input.txt) >> output.txt
Or you could use tr:
grep -o '^[A-Z]*$' input.txt | tr '\n' ' ' >> output.txt
Or perl:
grep -o '^[A-Z]*$' input.txt | perl -pe 'chomp; s/$/ /'
You can use tr to translate the newlines to spaces:
grep $EXPRESSION $INPUT_FILE | tr '\n' ' ' >> $OUTPUT_FILE
If you like perl, you can also
perl -nl40e 'print if /PATTERN/' files....
like
perl -nl40e 'print if /[A-Z]/' file
for your input produces
ABCD EFGH IJKLM
Here is an short awk
awk 'NR%2==1' ORS=" " file
ABCD EFGH IJKLM
It will print every second line into one line.

AWK to print field $2 first, then field $1

Here is the input(sample):
name1#gmail.com|com.emailclient.account
name2#msn.com|com.socialsite.auth.account
I'm trying to achieve this:
Emailclient name1#gmail.com
Socialsite name2#msn.com
If I use AWK like this:
cat foo | awk 'BEGIN{FS="|"} {print $2 " " $1}'
it messes up the output by overlaying field 1 on the top of field 2.
Any tips/suggestions? Thank you.
A couple of general tips (besides the DOS line ending issue):
cat is for concatenating files, it's not the only tool that can read files! If a command doesn't read files then use redirection like command < file.
You can set the field separator with the -F option so instead of:
cat foo | awk 'BEGIN{FS="|"} {print $2 " " $1}'
Try:
awk -F'|' '{print $2" "$1}' foo
This will output:
com.emailclient.account name1#gmail.com
com.socialsite.auth.accoun name2#msn.com
To get the desired output you could do a variety of things. I'd probably split() the second field:
awk -F'|' '{split($2,a,".");print a[2]" "$1}' file
emailclient name1#gmail.com
socialsite name2#msn.com
Finally to get the first character converted to uppercase is a bit of a pain in awk as you don't have a nice built in ucfirst() function:
awk -F'|' '{split($2,a,".");print toupper(substr(a[2],1,1)) substr(a[2],2),$1}' file
Emailclient name1#gmail.com
Socialsite name2#msn.com
If you want something more concise (although you give up a sub-process) you could do:
awk -F'|' '{split($2,a,".");print a[2]" "$1}' file | sed 's/^./\U&/'
Emailclient name1#gmail.com
Socialsite name2#msn.com
Use a dot or a pipe as the field separator:
awk -v FS='[.|]' '{
printf "%s%s %s.%s\n", toupper(substr($4,1,1)), substr($4,2), $1, $2
}' << END
name1#gmail.com|com.emailclient.account
name2#msn.com|com.socialsite.auth.account
END
gives:
Emailclient name1#gmail.com
Socialsite name2#msn.com
Maybe your file contains CRLF terminator. Every lines followed by \r\n.
awk recognizes the $2 actually $2\r. The \r means goto the start of the line.
{print $2\r$1} will print $2 first, then return to the head, then print $1. So the field 2 is overlaid by the field 1.
The awk is ok. I'm guessing the file is from a windows system and has a CR (^m ascii 0x0d) on the end of the line.
This will cause the cursor to go to the start of the line after $2.
Use dos2unix or vi with :se ff=unix to get rid of the CRs.

Resources