How can I print every second row as tab delimited second column like below. thanx in advance.
input
wex
2
cr_1.b
4
output
wex 2
cr_1.b 4
Here's another option that doesn't depend on the length of lines:
awk '{ if (NR % 2 == 1) tmp=$0; else print tmp, $0; }' <filename>
If you really want a tab separator, use printf "%s\t%s\n",tmp,$0; instead.
Assuming you have no blank lines in your input file, this should do the trick:
awk 'length(f) > 0 { print f $0; f = "" } length(f) == 0 { f = $0 }' file
Related
I have a tab separated file. I am using the below code:
awk -v var="MAS_CONTROL_WL_column_nmbr.dat" 'BEGIN{RS="\n"}
{ while(getline line < var){ printf("%s\t",$line)};close(var);
printf( "\n") }' MAS_CONTROL_WL.tsv > test.tsv
This code prints the column number that is present in the column number file but the issue that I am facing is \t is coming after the last column.
How to remove that?
First a test file:
$ cat > foo
1
2
3
And the awk:
$ awk -v var=foo '
BEGIN { RS="\n" }
{
out="" # introducing output buffer
while(getline line < var) {
out=out sprintf("%s%s",(out==""?"":"\t"),line) # controlling tabs
}
close(var)
print out # output output buffer
}' foo | cat -T # useful use of cat
Output:
1^I2^I3
1^I2^I3
1^I2^I3
Instead of printing "field-tab" for every field, print the first field without a tab, then append the rest as "tab-field":
awk -v var="MAS_CONTROL_WL_column_nmbr.dat" '
BEGIN{RS="\n"}
{
if (getline line < var) printf("%s",$line);
while (getline line < var) printf("\t%s",$line);
close(var);
printf( "\n");
}
' MAS_CONTROL_WL.tsv > test.tsv
In case you still need an answer to your original question (removing \t after the last column) :sed -i 's/[[:space:]]$//' your_file.tsv will remove the white space at the end of the lines of your file.
I have file as
1|dev|Smith|78|minus
1|ana|jhon|23|plus
1|ana|peter|22|plus
2|dev|dash|45|minus
2|dev||44|plus
I want output as, against uniq value of column 1 and 2 print multiple values of column 3 and 5
1|dev|Smith|minus
1|ana|jhon;peter|plus;plus
2|dev|dash;|minus;plus
I can accumulate multiple records into 1 just for one column , I want to do it for 2 column in one command
awk -F"|" '{if(a[$1"|"$2])a[$1"|"$2]=a[$1"|"$2]";"$5; else
a[$1"|"$2]=$5;}END{for (i in a)print i, a[i];}' OFS="|" input.txt > output.txt
It is giving output as
2|dev|minus;plus
1|ana|plus;plus
1|dev|minus
If datamash is okay
$ # -g 1,2 tells to group by 1st and 2nd column
$ # collapse 3 collapse 5 tells to combine those column values
$ datamash -t'|' -g 1,2 collapse 3 collapse 5 < ip.txt
1|dev|Smith|minus
1|ana|jhon,peter|plus,plus
2|dev|dash,|minus,plus
$ # easy to change , to ; if input file doesn't contain ,
$ datamash -t'|' -g 1,2 collapse 3 collapse 5 < ip.txt | tr ',' ';'
1|dev|Smith|minus
1|ana|jhon;peter|plus;plus
2|dev|dash;|minus;plus
In awk, not the usual way, but first setting $3|$5 and then adding outwards like <-;$3|$5;-> to $3;$3|$5;$5, that's why ;dash instead of dash;:
$ awk '
BEGIN { FS=OFS="|" }
{
a[$1 OFS $2]=$3(a[$1 OFS $2]?";"a[$1 OFS $2]";":"|")$5
}
END {
for(i in a)
print i,a[i]
}' file
2|dev|;dash|minus;plus
1|ana|peter;jhon|plus;plus
1|dev|Smith|minus
The proper awk way would probably be closer to:
$ awk '
BEGIN { FS=OFS="|" }
{
i=$1 OFS $2
a[i] = a[i] ( a[i]=="" || $3=="" ? "" : ";" ) $3
b[i] = b[i] ( b[i]=="" || $5=="" ? "" : ";" ) $5
}
END {
for(i in a)
print i,a[i],b[i]
}' file
2|dev|dash|minus;plus
1|ana|jhon;peter|plus;plus
1|dev|Smith|minus
I have a file processing task that I need a hand in. I have two files (matched_sequences.list and multiple_hits.list).
INPUT FILE 1 (matched_sequences.list):
>P001 ID
ABCD .... (very long string of characters)
>P002 ID
ABCD .... (very long string of characters)
>P003 ID
ABCD ... ( " " " " )
INPUT FILE 2 (multiple_hits.list):
ID1
ID2
ID3
....
What I want to do is match the second column (ID2, ID4, etc.) with a list of IDs stored in multiple_hits.list. Then create a new matched_sequences file similar to the original but which excludes all IDs found in multiple_hits.list (about 60 out of 1000). So far I have:
#!/bin/bash
X=$(cat matched_sequences.list | awk '{print $2}')
Y=$(cat multiple_hits.list | awk '{print $1}')
while read matched_sequenes.list
do
[ $X -ne $Y ] && (cat matched_sequences.list | awk '{print $1" "$2}') > new_matched_sequences.list
done
I get the following error raised:
-bash: read: `matched_sequences.list': not a valid identifier
Many thanks in advance!
EXPECTED OUTPUT (new_matched_sequences.list):
Same as INPUT FILE 1 with all IDs in multiple_hits.list excluded
#!/usr/bin/awk -f
function chomp(s) {
sub(/^[ \t]*/, "", s)
sub(/[ \t\r]*$/, "", s)
return s
}
BEGIN {
file = ARGV[--ARGC]
while ((getline line < file) > 0) {
a[chomp(line)]++
}
RS = ""
FS = "\n"
ORS = "\n\n"
}
{
id = chomp($1)
sub(/^.* /, "", id)
}
!(id in a)
Usage:
awk -f script.awk matched_sequences.list multiple_hits.list > new_matched_sequences.list
A shorter awk answer is possible, with a tiny script reading first the file with the IDs to exclude, and then the file containing the sequences. The script would be as follows (comments make it long, it's just three useful lines in fact:
BEGIN { grab_flag = 0 }
# grab_flag will be used when we are reading the sequences file
# (not absolutely necessary to set here, though, because we expect the file will start with '>')
FNR == NR { hits[$1] = 1 ; next } # command executed for all lines of the first file: record IDs stored in multiple_hits.list
# otherwise we are reading the second file, containing the sequences:
/^>/ { if (hits[$2] == 1) grab_flag = 0 ; else grab_flag = 1 } # sets the flag indicating whether we have to output the sequence or not
grab_flag == 1 { print }
And if you call this script exclude.awk, you will invoke it this way:
awk -f exclude.awk multiple_hits.list matched_sequences.list
I have a unix file with the following contents.
$cat myfile.txt
abc:1
abc:2
hello:3
hello:6
wonderful:1
hai:2
hai:4
hai:8
How do I get the max value given for each text in the file above.
'abc' value 2
'hello' value 6
'hai' value 8
'womderful' 1
Based on the current example in your question, minus the first line of expected output:
awk -F':' '{arr[$1]=$2 ; next} END {for (i in arr) {print i, arr[i]} } ' inputfile
You example input and expected output are very confusing.... The reason I posted this is to get feedback from the OP forthcoming
This assumes the data is unsorted, but also works with sorted data (New):
sort -t: -k2n inputfile | awk -F':' '{arr[$1]=$2 ; next} END {for (i in arr) {print i, arr[i]} } '
How to print the last but one record of a file using awk?
Something like:
awk '{ prev_line=this_line; this_line=$0 } END { print prev_line }' < file
Essentially, keep a record of the line before the current one, until you hit the end of the file, then print the previous line.
edit to respond to comment:
To just extract the second field in the penultimate line:
awk '{ prev_f2=this_f2; this_f2=$2 } END { print prev_f2 }' < file
You can do it with awk but you may find that:
tail -2 inputfile | head -1
will be a quicker solution - it grabs the last two lines of the complete set then the first of those two.
The following transcript shows how this works:
pax$ echo '1
> 2
> 3
> 4
> 5' | tail -2 | head -1
4
If you must use awk, you can use:
pax$ echo '1
2
3
4
5' | awk '{last = this; this = $0} END {print last}'
4
It works by keeping the last and current line in variables last and this and just printing out last when the file is finished.