My input
1
abc
1cde
efg
xxx
1
abc
pattern1
pattern2
efg
xxx
1
abc
cde
efg
xxx
my expected output (print from 1 it contains pattern1 and 2):
1
abc
pattern1
pattern2
efg
xxx
I have so for:
sed -n '/^1/ {x;/pattern1/ {N;/\n.*pattern2/p};d} $/^1/ {h;/pattern1/ {N;/\n.*pattern2/p};d}}H' My file
BTW my file is a very big file, please show me a method that can do it quickly.
Thanks so much.
sed is for s/old/new/ - that is all. For anything else you should be using awk.
It looks like your expected output can't actually be produced from your sample input so it's a guess and untested since we don't have anything concrete to test against but it sounds like you might want:
awk -v RS= -v ORS='\n\n' '/pattern1/ && /pattern2/' file
Related
Its been asked several times but its not clear to me yet.
I have the following text in a file ( data.txt, tab delimeted ):
ABC 12
ABC-AS 14
DEF 18
DEF-AS 9
Now I want to search for ABC and DEF, but not ABC-AS, DEF-AS as a result.
grep -w ABC data.txt
Output:
grep -w ABC data.txt
ABC
ABC-AS
grep --no-group-separator -w "ABC" data.txt
ABC
ABC-AS
grep --group-separator="\t" -w "ABC" data.txt
ABC
ABC-AS
With a regex
grep -E "(ABC|DEF)[^\-]" data.txt
Details
(ABC|DEF): Match "ABC" or "DEF"
[^\-]: Anything except "-"
Output
ABC 12
DEF 18
Try this, which select only those matches that exactly match the whole line
grep --line-regexp "ABC" data.txt
I'm new to this so please if this has been answered somewhere else kindly refer me to that question. I searched extensively and there are sort of similar questions but none are really applicable to my problem.
I want to count the number of unique names per class. I have a sheet with a list of names (column 1) and their class (column 2). I need to know how many unique names are in the list, per class. The list is tab delimited.
I think probably awk will be able to solve this quickly, but I'm really not that skilled in awk.
Example input:
Name Class
ABCD protein-coding
ABCD protein-coding
DCFG lincRNA
GTFR lincRNA
Desired output:
Class Count
protein-coding 1
lincRNA 2
$ cat f
Name Class
ABCD protein-coding
ABCD protein-coding
DCFG lincRNA
GTFR lincRNA
$ awk 'FNR>1{a[$2]+=!( ($1,$2) in b); b[$1,$2]}END{for(i in a)print i, a[i]}' f
lincRNA 2
protein-coding 1
You can sort the file and then get the unique value as follow:
code:
sort test_file.txt | uniq | awk '{print $2}' | uniq -c
Output:
1 protein-coding
2 lincRNA
With GNU awk for true multi-dimensional arrays:
$ awk 'NR>1{names[$2][$1]} END{for (class in names) print class, length(names[class])}' file
lincRNA 2
protein-coding 1
With any awk:
$ awk 'NR>1{if (!seen[$2,$1]++) cnt[$2]++} END{for (class in cnt) print class, cnt[class]}' file
lincRNA 2
protein-coding 1
If working with uniq and sort, a solution may be:
sed 1d input.tsv | sort -t $'\t' | uniq | awk '{print $2}' | uniq -c | awk 'BEGIN{print "Class\tCount"}{print $2"\t"$1}'
I skipped the headers with sed 1d and wrote an output file separated wth tabs.
My file is as below
file name = test
1 abc
2 xyz
3 pqr
How can i convert second column of file in upper case without using awk or sed.
You can use tr to transform from lowercase to uppercase. cut will extract the single columns and paste will combine the separated columns again.
Assumption: Columns are delimited by tabs.
paste <(cut -f1 file) <(cut -f2 file | tr '[:lower:]' '[:upper:]')
Replace file with your file name (that is test in your case).
In pure bash
#!/bin/bash
while read -r col1 col2;
do
printf "%s%7s\n" "$col1" "${col2^^}"
done < file > output-file
Input-file
$ cat file
1 abc
2 xyz
3 pqr
Output-file
$ cat output-file
1 ABC
2 XYZ
3 PQR
I have this little problem that I want to ask:
So I have a file named "quest", which has:
Tom 100 John 10 Tom 100
How do I use awk to output something like:
Tom 200
I'd appreciate your help. I tried to look up online but I am not sure what I am look for. Thanks ahead!!
I do know how to use regular expression /Tom/ to grep the entry, but I am not sure how to proceed from there.
You can try something like:
$ awk '{
for(i=1; i<=NF; i+=2)
names[$i] = ((names[$i]) ? names[$i]+$(i+1) : $(i+1))
}
END{
for (name in names) print name, names[name]
}' quest
Tom 200
John 10
You basically iterate over the fields creating keys for all odd fields and assigning values of even fields to them. If the key already exists, you just add to the existing value.
This expects your file format to have Names on odd fields (for eg. 1, 3, 5 .. etc) and values on even fields (eg 2, 4, 6 .. etc).
In the END block, you just print entire array content.
I guess you need calculate all users' mark, not only Tom, here is the code:
xargs -n2 < file|awk '{a[$1]+=$2}END{for (i in a) print i,a[i]}'
Tom 200
John 10
and one-liner of awk
awk '{for (i=1;i<=NF;i+=2) a[$i]+=$(i+1)}END{for (i in a) print i,a[i]}' file
Tom 200
John 10
$ echo 'Tom 100 John 10 Tom 100' | grep -o '[0-9]*' | paste -sd+ | bc
210
grep -o '[0-9]*' produces
100
10
100
paste -sd+ produces
100+10+100
bc calculates the result.
However, this only works for small input since bc has limitation in input size.
In that case you can use awk '{s+=$0}END{print s}' instead of paste -sd+ | bc.
However note that GNU Awk treats all number as floting point, it produces inaccurate result when number is too large.
awk '/Tom/{
for(i=1;i<=NF;i++)
if($i=="Tom")s+=$(i+1);
print "Tom",s;s=0}' your_file
Test
Here is a way to do it in awk (no loop):
awk -v RS=" " '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Tom 200
John 10
If there are more than one line like this
cat quest
Tom 100 John 10 Tom 100
Paul 20 Tom 40 John 10
Then do this with gnu awk:
awk -v RS=" |\n" '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Paul 20
Tom 240
John 20
And if you do not like getline
awk -v RS=" |\n" 'NR%2 {n=$1;next}{a[n]+=$1} END {for (i in a) print i,a[i]}' quest
I have two files like this:
abc.txt
a
b
z
1
10
and abcd.txt
a
b
c
d
1
10
100
1000
I would like:
a
b
1
10
I would like to use grep -fw abc.txt abcd.txt to search through every line of abc.txt and print lines which match the entire word. If I just use grep -f, I get lines 100 since the pattern '10' matches '100'. But grep -f -w abc.txt abcd.txt produces:
a
b
1
and doesn't print out the 10. So, I guess, what is the best way to match every line in abc.txt with the entire line of abcd.txt ?