Grep part of large file without spliting it - unix

How can I grep a certain part of a large file from lines 1000 to 2000, up to line 1000 or from line 1000 for example?
I don't want to split the file in smaller files.

you could use sed to pre-process. EDIT: adding a q per Kent's suggestion
sed -n '1000,2000{p;2000q}' file.txt | grep 'abc'
for line 1000 through end of file
sed -n '1000,$p' file.txt | grep 'abc'

As a minor improvement over the sed solution by #ravoori, refactor the grep into the sed:
sed '1000,$/pattern/!d;2000q' file.txt
If you have the pattern in a variable, use double quotes;
sed '1000,$/'"$pattern"'/!d;2000q' file.txt
Or equivalently in Awk:
awk 'NR==2000{exit(0)}NR>=1000 && /pattern/' file.txt
or with a variable
awk -v pat="$pattern" 'NR==2000{exit(0)}NR>=1000 && $0~pat' file.txt

I'd suggest
head -2000 FILE.TXT | tail -1000 | grep XXX
as the neatest solution because head does not have to read the huge file, just the first few N thousand lines. It essentially achieves what q does in the sed solution.

Related

Extract filename

So I am new to SED and Unix and I would like to replace the following file:
1500:../someFile.C:111 error
1869:../anotherFile.C:222 error
1869:../anotherFile2.Cxx:333 error
//thousands of more lines with same structure
With the followig file
someFile.c
anotherFile.c
anotherFile2.Cxx
Basically, I just want to extract the filename from every line.
So far, I have read the documentation on sed and the second answer here. My best attempt was to use a regex as follows:
sed "s/.\*\/.:.*//g" myFile.txt
Lots of ways to do this.
Sure, you could use sed:
sed 's/^[^:]*://;s/:.*//;s#\.\./##' input.txt
sed 's%.*:\.\./\([^:]*\):.*%\1%' input.txt
Or you could use a series of grep -o instances in a pipe:
grep -o ':[^:]*:' input.txt | grep -o '[^:]\{1,\}' | grep -o '/.*' | grep -o '[^/]\{1,\}'
You could even use awk:
awk -F: '{sub(/\.\.\//,"",$2); print $2}' input.txt
But the simplest way would probably be to use cut:
cut -d: -f2 input.txt | cut -d/ -f2
You can capture the substring between last / and following : and replace the whole string with the captured string(\1).
sed 's#.*/\([^:]\+\).*#\1#g' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx
OR , with little less escaping, sed with -r flag.
sed -r 's#.*/([^:]+).*#\1#g' myFile.txt
Or if you want to use grep,this will only work if your grep supports -P flag which will enable PCRE:
grep -oP '.*/\K[^:]+' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx

How to delete any lines containing numbers?

I need to delete all lines that contain numbers so I have only words left.
sed -i '/^[[:digit:]]*$/d' filename.TXT
Not sure if sed works.
This sed will delete all lines containing any number:
sed '/[0-9]/d' filename.txt
awk solution
awk '!/[0-9]/' filename.txt
grep solution
grep -v '[0-9]' filename.txt

How to use sed to extract text between two bar signs (i.e. '|')?

I would like to extract text that falls between two | signs in a file with multiple lines. For instance, I want to extract P16 from sp|P16|SM2. I have found a possible answer here. However, I cannot apply the answer to my case. I am using the following:
sed -n '/|/,/|/ p' filename
or this by escaping the | sign:
sed -n '/\|/,/\|/ p' filename
But what I receive as result are all the lines in the file unchanged even though I am using -n to suppress automatic printing of pattern space. Any ideas what I am missing?
[EDIT]:
I can get the desired result using the following. However, I would like an explanation why the above mentioned is not working:
sed 's/^sp|//' filename | sed 's/|.*//'
the tool for this task is cut
$ echo "sp|P16|SM2" | cut -d'|' -f2
P16
awk is better choice for column based data:
awk -F'|' '{print $2}'
will give you P16
sed one-liner:
The following sed one-liner will only leave the 2nd column for you:
kent$ echo "sp|P16|SM2"|sed 's/[^|]*|//;s/|[^|]*//'
P16
Or using grouping:
kent$ echo "sp|P16|SM2"|sed 's/.*|\([^|]*\)|.*/\1/'
P16
Short explanation why your two commands didn't work:
1) sed -n '/|/,/|/ p' filename
This sed will print lines between two lines which containing |
2) sed -n '/\|/,/\|/ p' filename
Sed takes BRE as default. If you escape the |, you gave them special meaning, the logical OR. again, the /pat1/,/pat2/ address was wrong usage for your case, it checks lines, not within a line.

How to delete duplicate lines in file in unix?

I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'

How can I grep for a word in a specific part of a file?

How to grep a word in 100 to 200 lines from file using grep and sed?
grep "word" file(s)
for instance
grep word *
searches for word in all files in the current directory. To print 100 -200 line do
sed -n '100,200p'
So combined you get
sed -n '100,200p' *|grep word
You can use awk.
cat "$FILE" | awk 'NR>=100 && NR<=200 && /regex/'
If you don’t mind using perl instead, the most straightforward solution is
$ perl -nle 'print if 100 .. 200 && /regex/' somefile

Resources