unix command, remove white spaces in text file with delimeter

unix command, remove white spaces in text file with delimeter - unix

I have text file,
Input file:
sno|name|lab|result|dep
1|aaa|ALB|<= 3.67|CHE
2|bbb|WBC|> 7.2|FVC
3|ccc|RBC|> 14|CHE
Output file:
sno|name|lab|result|dep
1|aaa|ALB|<=3.67|CHE
2|bbb|WBC|>7.2|FVC
3|ccc|RBC|>14|CHE
How to remove white spaces in column 4(result)?

If you can remove spaces from everything, just use sed:
sed 's/ //g' input.txt > output.txt
Or even tr (translate):
tr -d ' ' < input.txt > output.txt
Otherwise, if you need to edit just the fourth column, use awk. The following command considers | as field separator (-F \|) and then outputs files using | as output field separator (-vOFS=\|).
awk -F \| -vOFS=\| '{gsub(/ /, "", $4); print; }' input.txt > output.txt

Related

Merge a string to a line extracted from a text file in UNIX

I wanted to merge a string ABC to a line that I have extracted from a file.
The following command is used to extract the lines 20-25 in file_ABC, take only the first column, which is then transposed to become a row (or line).
sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s
This is the result:
2727778 14734 0 0 0 2713044
I would like to add at the first position of this line the string ABC.
ABC 2727778 14734 0 0 0 2713044
Any suggestion on how to do that?

A quick hack would be to use something like
printf 'ABC\t%s\n' "$(sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s)"
You could modify your initial command instead to use awk for everything, though:
awk '
BEGIN {printf "ABC"}
NR>=20 && NR<=25 {printf "\t%s", $1}
END {print ""}
' file_ABC

This might work for you (GNU sed):
sed '20,25{s/\s.*//;H};$!d;x;s/^/ABC/;s/\n/ /g' file
Gather up the first column fields by appending them to the hold space for rows 20 to 25 only. At the end of the file prepend ABC and replace the introduced newlines by spaces.

For fun, bash only
filename=file_ABC
words=("${filename##*_}")
i=0
while read -r word rest_of_line; do
((++i < 20 )) && continue
(( i > 25 )) && break
words+=("$word")
done < "$filename"
join() { local IFS=$1; shift; echo "$*"; }
join $'\t' "${words[#]}"
But this will be much slower than a single awk call.

if you want to keep all in one script
$ awk 'BEGIN {line="ABC"}
NR>=20 && NR<=25 {line=line FS $1}
NR==25 {print line; exit}' file
improved version as suggested by #EdMorton
$awk 'NR>=20 {line=line OFS $1}
NR==25 {print "ABC" line; exit}' file

using sed or awk to double quote comma separate and concatenate a list

I have the following list in a text file:
10.1.2.200
10.1.2.201
10.1.2.202
10.1.2.203
I want to encase in "double quotes", comma separate and join the values as one string.
Can this be done in sed or awk?
Expected output:
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203","10.1.2.204"

The easiest is something like this (in pseudo code):
Read a line;
Put the line in quotes;
Keep that quoted line in a stack or string;
At the end (or while constructing the string), join the lines together with a comma.
Depending on the language, that is fairly straightforward to do:
With awk:
$ awk 'BEGIN{OFS=","}{s=s ? s OFS "\"" $1 "\"" : "\"" $1 "\""} END{print s}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
Or, less 'wall of quotes' to define a quote character:
$ awk 'BEGIN{OFS=",";q="\""}{s=s ? s OFS q$1q : q$1q} END{print s}' file
With sed:
$ sed -E 's/^(.*)$/"\1"/' file | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g'
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
(With Perl and Ruby, with a join function, it is easiest to push the elements onto a stack and then join that.)
Perl:
$ perl -lne 'push #a, "\"$_\""; END{print join(",", #a)}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"
Ruby:
$ ruby -ne 'BEGIN{#arr=[]}; #arr.push "\"#{$_.chomp}\""; END{puts #arr.join(",")}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"

here is another alternative
sed 's/.*/"&"/' file | paste -sd,
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"

awk -F'\n' -v RS="\0" -v OFS='","' -v q='"' '{NF--}$0=q$0q' file
should work for given example.
Tested with gawk:
kent$ cat f
10.1.2.200
10.1.2.201
10.1.2.202
10.1.2.203
kent$ awk -F'\n' -v RS="\0" -v OFS='","' -v q='"' '{NF--}$0=q$0q' f
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"

$ awk '{o=o (NR>1?",":"") "\""$0"\""} END{print o}' file
"10.1.2.200","10.1.2.201","10.1.2.202","10.1.2.203"

awk — getting minus instead of FILENAME

I am trying to add the filename to the end of each line as a new field. It works except instead of getting the filename I get -.
Base file:
070323111|Hudson
What I want:
070323111|Hudson|20150106.csv
What I get:
070323111|Hudson|-
This is my code:
mv $1 $1.bak
cat $1.bak | awk '{print $0 "|" FILENAME}' > $1

- is the way to present the filename when there is not such info. Since your are doing cat $1.bak | awk ..., awk is not reading from a file but from stdin.
Instead, just do:
awk '...' file
in your case:
awk '{print $0 "|" FILENAME}' $1.bak > $1
From man awk:
FILENAME
The name of the current input file. If no files are specified on the
command line, the value of FILENAME is “-”. However, FILENAME is
undefined inside the BEGIN rule (unless set by getline).

Append to same line using grep

I have a file with multiple lines. I'm trying to find lines that match a certain pattern and then get them appended to an output file, all on the same.
Ex:
Input file:
ABCD
other text
EFGH
other text
IJKLM
I'm trying to get the output to be :
ABCD EFGH IJKLM

An easy way to make grep output matches separated by spaces instead of newlines is to wrap it in a sub-shell with $(...) like this:
echo $(grep -o '^[A-Z]*$' input.txt) >> output.txt
Or you could use tr:
grep -o '^[A-Z]*$' input.txt | tr '\n' ' ' >> output.txt
Or perl:
grep -o '^[A-Z]*$' input.txt | perl -pe 'chomp; s/$/ /'

You can use tr to translate the newlines to spaces:
grep $EXPRESSION $INPUT_FILE | tr '\n' ' ' >> $OUTPUT_FILE

If you like perl, you can also
perl -nl40e 'print if /PATTERN/' files....
like
perl -nl40e 'print if /[A-Z]/' file
for your input produces
ABCD EFGH IJKLM

Here is an short awk
awk 'NR%2==1' ORS=" " file
ABCD EFGH IJKLM
It will print every second line into one line.

How do I Get the distinct List of Special Characters from a File using GREP or SED?

I have a file which contains about 30000 Records delimited by '|'. I need to get a distinct list of special characters only from the file.
For Eg:
123|fasdf|%df&|pap,came|!
234|%^&asdf|34|'":|
My output should be:
|%&,!^'":
Any help would be greatly appreciated.
Thanks,
Velraj.

grep -o '[|%&,!^":]' input | sort -u
You have to list all your special characters inside brackets.
This will return each unique special character on its own line. If you really need a string with these characters you have to remove newlines afterwards, e.g.:
grep -o '[|%&,!^":]' input | sort -u | tr -d '\n'
UPDATE:
If you need to remove all characters which are not from 'a-zA-Z0-9' set then you can use this one:
grep -o '[^a-zA-Z0-9]' input | sort -u | tr -d '\n'

echo "123|fasdf|%df&|pap,came|! 234|%^&asdf|34|'\":|" \
| { tr -d '[[:alnum:]]'; printf "\n"; } \
| sed 's/\(.\)/\1_/g' \
| awk -v 'RS=_' '{print $0}' \
| sort -u \
| awk '{printf $0}END{printf "\n"}'
output
!"%&',:^||
You can replace the first line echo .... with cat fileName