I am trying to replace the alternate spaces with newlines using UNIX.
I tried using tr command in UNIX but was unable modify it to replace alternate spaces.
Sample input:
0 1 2 3 4 5
Sample output:
0 1
2 3
4 5
How do we achieve this ?
awk might help in this case:
echo "0 1 2 3 4 5" | awk '
{
for (i=1; i<=NF; i++)
{
if ((i-1)%2 == 0)
{
printf "%d ",$i;
}
else
{
print $i
}
}
}
'
We split by space and have 6 items. We, then, are looping through all fields and outputting each field. Every other field is output in a new line with print $i; otherwise we print using printf "%d ",$i; and not create a new line.
echo "0 1 2 3 4 5" | sed 's/\([^ ][^ ]* [^ ][^ ]*\) */\1\n/g'
This can be made shorter with GNU sed which has the '+' notation.
Related
I have a record like this
1664|41.0000|0.683333|0.6560|
Command
$ awk -F"|" '/AL_ALL_CALLS_1.6P/ { if($22>0 && $182!="" && !$183)
print $3,$7,$10,$12,$15,$22,$24,$36,$39,$40,$96,$103,$182,$184,$186}'
CDR_File_1.txt | awk -F"|" '{ for (i=1;i<=NF;i++) { if ($i=="") {
$i="0" } } OFS=" ";print }' | awk -F" " '{print
$1,$2,$3,$4,$5,$6,$6/60,$7,$8,$9,$10,$11,$12,$13,$14,$15}' | sed "s/
/|/g" | awk -F"[|.]" '{for (i=1;i<=NF;i++) {if ($i==$i+0)
{n=split($i,a,"."); $i=sprintf("%d %d", a[1], a[2])}}}1' | head -1
Output
1664 0 41 0 0 0 0 0 683333 0 0 0 6560
Expected
1664 41 0000 0 683333 0 6560
Just check if a given field is a number and, in such case, split it:
awk '/anu/ { # lines containing "anu"
for (i=1;i<=NF;i++) { # loop through the fields
if ($i==$i+0) { # if it is a number
n=split($i,a,".") # slice the number
$i=sprintf("%d %d", a[1], a[2]) # put it back together with a space
}
}
}1' file # print the line
See it in action:
$ awk '/anu/ {for (i=1;i<=NF;i++) {if ($i==$i+0) {n=split($i,a,"."); $i=sprintf("%d %d", a[1], a[2])}}}1' file
45 0 0 25 abc anurag.jain
25.12 1.25 xyz stack
The key point here is the usage of the format-control letter %d in printf to remove the now superfluous leading zeroes:
$ awk 'BEGIN {printf "%d %d", 0000001, 01}'
1 1
Also, the usage of $var == $var +0 to check if a field is a number or not:
$ awk 'BEGIN {print "a" == "a" + 0}'
0
$ awk 'BEGIN {print 23.0 == 23.0 + 0}'
1
From your updated question I see you don't need to remove extra zeros: with $i=sprintf("%s %s", a[1], a[2]) we have more than enough. Also, since you have integers that do not need extra processing, it is best to check for these fields differently, for example with $i~/^[0-9]+\.[0-9]+$/.
$ awk -F"|" '{for (i=1;i<=NF;i++) {if ($i~/^[0-9]+\.[0-9]+$/) {n=split($i,a,"."); $i=sprintf("%s %s", a[1], a[2])}}}1' file
1664 41 0000 0 683333 0 6560
awk default delimiter space treats any amount of space between two fields as equivalent..
echo "1 2"|awk '{for (i=1;i<=NF;i++) print $i}'
#which gives the result (two spaces between 1 and 2)
1
2
How can I add "=" to this existing delimiter? I have tried the following and that has started to consider "single" space character as a delimiter and spoiled the above result.
echo "1 2"|awk -F"[ |=]" '{for (i=1;i<=NF;i++) print $i}'
#which gives the result
1
2
How can I give any amount of space as a delimiter here? Thanks in advance.
You can specify a regular expression as the delimiter:
echo "1 2"|awk -F"[ |=]+" '{for (i=1;i<=NF;i++) print $i}'
It also means
echo "1 2 3==5"|awk -F"[ |=]+" '{for (i=1;i<=NF;i++) print $i}'
would print
1
2
3
5
To convert rows into tab-delimited, it's easy
cat input.txt | tr "\n" " "
But I have a long file with 84046468 lines. I wish to convert this into a file with 1910147 rows and 44 tab-delimited columns. The first column is a text string such as chrXX_12345_+ and the other 43 columns are numerical strings. Is there a way to perform this transformation?
There are NAs present, so I guess sed and substituting "\n" for "\t" if the string preceding is a number doesn't work.
sample input.txt
chr10_1000103_+
0.932203
0.956522
1
0.972973
1
0.941176
1
0.923077
1
1
0.909091
0.9
1
0.916667
0.8
1
1
0.941176
0.904762
1
1
1
0.979592
0.93617
0.934783
1
0.941176
1
1
0.928571
NA
1
1
1
0.941176
1
0.875
0.972973
1
1
NA
0.823529
0.51366
chr10_1000104_-
0.952381
1
1
0.973684
sample output.txt
chr10_1000103_+ 0.932203 (numbers all tab-delimited)
chr10_1000104_- etc
(sorry alot of numbers to type manually)
sed '
# use a delimiter
s/^/M/
:Next
# put a counter
s/^/i/
# test counter
/^\(i\)\{44\}/ !{
$ !{
# not 44 line or end of file, add the next line
N
# loop
b Next
}
}
# remove marker and counter
s/^i*M//
# replace new line by tab
s/\n/ /g' YourFile
some limite if more than 255 tab on sed (so 44 is ok)
Here's the right approach using 4 columns instead of 44:
$ cat file
chr10_1000103_+
0.932203
0.956522
1
chr10_1000104_-
0.952381
1
1
$ awk '{printf "%s%s", $0, (NR%4?"\t":"\n")}' file
chr10_1000103_+ 0.932203 0.956522 1
chr10_1000104_- 0.952381 1 1
Just change 4 to 44 for your real input.
If you are seeing control-Ms in your output it's because they are present in your input so use dos2unix or similar to remove them before running the tool or with GNU awk you could just set -v RS='\n\r'.
When posting questions it's important to make it as clear, simple, and brief as possible so that as many people as possible will be interested in helping you.
BTW, cat input.txt | tr "\n" " " is a UUOC and should just be tr "\n" " " < input.txt
Not the best solution, but should work:
line="nonempty"; while [ ! -z "$line" ]; do for i in $(seq 44); do read line; echo -n "$line "; done; echo; done < input.txt
If there is an empty line in the file, it will terminate. For a more permanent solution I'd try perl.
edit:
If you are concerned with efficiency, just use awk.
awk '{ printf "%s\t", $1 } NR%44==0{ print "" }' < input.txt
You may want to strip the trailing tab character with | sed 's/\t$//' or make the awk script more complicated.
This might work for you (GNU sed):
sed '/^chr/!{H;$!d};x;s/\n/\t/gp;d' file
If a line does not begin with chr append it to the hold space and then delete it unless it is the last. If the line does start chr or it is the last line, then swap to the hold space and replace all newlines by tabs and print out the result.
N.B. the start of the next line will be left untouched in the pattern space which becomes the new hold space.
How can I print every second row as tab delimited second column like below. thanx in advance.
input
wex
2
cr_1.b
4
output
wex 2
cr_1.b 4
Here's another option that doesn't depend on the length of lines:
awk '{ if (NR % 2 == 1) tmp=$0; else print tmp, $0; }' <filename>
If you really want a tab separator, use printf "%s\t%s\n",tmp,$0; instead.
Assuming you have no blank lines in your input file, this should do the trick:
awk 'length(f) > 0 { print f $0; f = "" } length(f) == 0 { f = $0 }' file
How to print the last but one record of a file using awk?
Something like:
awk '{ prev_line=this_line; this_line=$0 } END { print prev_line }' < file
Essentially, keep a record of the line before the current one, until you hit the end of the file, then print the previous line.
edit to respond to comment:
To just extract the second field in the penultimate line:
awk '{ prev_f2=this_f2; this_f2=$2 } END { print prev_f2 }' < file
You can do it with awk but you may find that:
tail -2 inputfile | head -1
will be a quicker solution - it grabs the last two lines of the complete set then the first of those two.
The following transcript shows how this works:
pax$ echo '1
> 2
> 3
> 4
> 5' | tail -2 | head -1
4
If you must use awk, you can use:
pax$ echo '1
2
3
4
5' | awk '{last = this; this = $0} END {print last}'
4
It works by keeping the last and current line in variables last and this and just printing out last when the file is finished.