How to delete any lines containing numbers? - unix

I need to delete all lines that contain numbers so I have only words left.
sed -i '/^[[:digit:]]*$/d' filename.TXT
Not sure if sed works.

This sed will delete all lines containing any number:
sed '/[0-9]/d' filename.txt
awk solution
awk '!/[0-9]/' filename.txt
grep solution
grep -v '[0-9]' filename.txt

Related

Extract filename

So I am new to SED and Unix and I would like to replace the following file:
1500:../someFile.C:111 error
1869:../anotherFile.C:222 error
1869:../anotherFile2.Cxx:333 error
//thousands of more lines with same structure
With the followig file
someFile.c
anotherFile.c
anotherFile2.Cxx
Basically, I just want to extract the filename from every line.
So far, I have read the documentation on sed and the second answer here. My best attempt was to use a regex as follows:
sed "s/.\*\/.:.*//g" myFile.txt
Lots of ways to do this.
Sure, you could use sed:
sed 's/^[^:]*://;s/:.*//;s#\.\./##' input.txt
sed 's%.*:\.\./\([^:]*\):.*%\1%' input.txt
Or you could use a series of grep -o instances in a pipe:
grep -o ':[^:]*:' input.txt | grep -o '[^:]\{1,\}' | grep -o '/.*' | grep -o '[^/]\{1,\}'
You could even use awk:
awk -F: '{sub(/\.\.\//,"",$2); print $2}' input.txt
But the simplest way would probably be to use cut:
cut -d: -f2 input.txt | cut -d/ -f2
You can capture the substring between last / and following : and replace the whole string with the captured string(\1).
sed 's#.*/\([^:]\+\).*#\1#g' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx
OR , with little less escaping, sed with -r flag.
sed -r 's#.*/([^:]+).*#\1#g' myFile.txt
Or if you want to use grep,this will only work if your grep supports -P flag which will enable PCRE:
grep -oP '.*/\K[^:]+' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx

How to delete duplicate lines in file in unix?

I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'

Grep part of large file without spliting it

How can I grep a certain part of a large file from lines 1000 to 2000, up to line 1000 or from line 1000 for example?
I don't want to split the file in smaller files.
you could use sed to pre-process. EDIT: adding a q per Kent's suggestion
sed -n '1000,2000{p;2000q}' file.txt | grep 'abc'
for line 1000 through end of file
sed -n '1000,$p' file.txt | grep 'abc'
As a minor improvement over the sed solution by #ravoori, refactor the grep into the sed:
sed '1000,$/pattern/!d;2000q' file.txt
If you have the pattern in a variable, use double quotes;
sed '1000,$/'"$pattern"'/!d;2000q' file.txt
Or equivalently in Awk:
awk 'NR==2000{exit(0)}NR>=1000 && /pattern/' file.txt
or with a variable
awk -v pat="$pattern" 'NR==2000{exit(0)}NR>=1000 && $0~pat' file.txt
I'd suggest
head -2000 FILE.TXT | tail -1000 | grep XXX
as the neatest solution because head does not have to read the huge file, just the first few N thousand lines. It essentially achieves what q does in the sed solution.

How to extract a set of strings between to characters or words using sed

I have a particular sentence like this -
|/billing/gcdr/ftpdir|fw43/collectors/ANHMCA04ANT|
I would required to extract the following portion from this particular sentence -
/billing/gcdr/ftpdir
Is there any possiblity that i can do it with sed ? If yes , please help me.
along with sed and awk you can also use cut. using cut:
echo "|/billing/gcdr/ftpdir|fw43/collectors/ANHMCA04ANT|" | cut -d'|' -f2
This is not a sed solution, but if you can live with awk:
echo "|/billing/gcdr/ftpdir|fw43/collectors/ANHMCA04ANT|"| awk -F"|" '{print $2}'
will yield:
/billing/gcdr/ftpdir
Explanation of awk command:
awk -F"|" '{print $2}'
-F="|" specifies the "field separator" - it separates/groups the input line into different fields using the '|' character (rather than using whitepace by default) and then prints the second field of that line.
sed can easily do that:
sed 's/^|//;s/|.*//;'
The first sed command (s/^|//) will remove first | symbol, and the second one (s/|.*//) will remove next | and all symbols after that.
To test it run in console:
echo "|/billing/gcdr/ftpdir|fw43/collectors/ANHMCA04ANT|" | sed 's/^|//;s/|.*//;'
This might work for you:
echo '|/billing/gcdr/ftpdir|fw43/collectors/ANHMCA04ANT|' |
sed 's/^|\([^|]*\).*/\1/'
/billing/gcdr/ftpdir

Print specific lines using sed

Im trying to print only lines that do not start with a letter from the file "main"
Ive tried sed -n '/^[a-z]/ /!w' main
and it gives me "w': event not found"
With sed as requested:
sed '/^[[:alpha:]]/d' main
or
sed -n '/^[^[:alpha:]]/p' main
or
sed -n '/^[[:alpha:]]/!p' main
Note: you could use [a-z] inplace of [[:alpha:]] but I prefer the latter because it is safe to use across different locales
there are many other ways to print lines
sed -n '/^[^a-zA-Z]/p' main
sed -n '/^[^a-z]/Ip' main
awk 'BEGIN{IGNORECASE=1}!/^[a-z]/' main
grep -vi "^[a-z]" main
ruby -ne 'print unless /^[a-z]/i' main
shell
while read -r line
do
case "$line" in
[^a-zA-Z]*) echo $line;;
esac
done < main
grep -v '^[a-z]' main
will do it.

Resources