Extract filename - unix

So I am new to SED and Unix and I would like to replace the following file:
1500:../someFile.C:111 error
1869:../anotherFile.C:222 error
1869:../anotherFile2.Cxx:333 error
//thousands of more lines with same structure
With the followig file
someFile.c
anotherFile.c
anotherFile2.Cxx
Basically, I just want to extract the filename from every line.
So far, I have read the documentation on sed and the second answer here. My best attempt was to use a regex as follows:
sed "s/.\*\/.:.*//g" myFile.txt

Lots of ways to do this.
Sure, you could use sed:
sed 's/^[^:]*://;s/:.*//;s#\.\./##' input.txt
sed 's%.*:\.\./\([^:]*\):.*%\1%' input.txt
Or you could use a series of grep -o instances in a pipe:
grep -o ':[^:]*:' input.txt | grep -o '[^:]\{1,\}' | grep -o '/.*' | grep -o '[^/]\{1,\}'
You could even use awk:
awk -F: '{sub(/\.\.\//,"",$2); print $2}' input.txt
But the simplest way would probably be to use cut:
cut -d: -f2 input.txt | cut -d/ -f2

You can capture the substring between last / and following : and replace the whole string with the captured string(\1).
sed 's#.*/\([^:]\+\).*#\1#g' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx
OR , with little less escaping, sed with -r flag.
sed -r 's#.*/([^:]+).*#\1#g' myFile.txt
Or if you want to use grep,this will only work if your grep supports -P flag which will enable PCRE:
grep -oP '.*/\K[^:]+' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx

Related

How can I put a sed command into a while loop?

Hoping someone kind can help me pls!
I have an file input.list:
/scratch/user/IFS/IFS001/IFS003.GATK.recal.bam
/scratch/user/IFS/IFS002/IFS002.GATK.recal.bam
/scratch/user/EGS/ZFXHG22/ZFXHG22.GATK.recal.bam
and I want to extract the bit before .GATK.recal.bam - I have found a solution for this:
sed 's/\.GATK\.recal\.bam.*//' input.list | sed 's#.*/##'
I now want to incorporate this into a while loop but it's not working... please can someone take a look and guide me where I'm doing wrong. My attempt is below:
while read -r line; do ID=${sed 's/\.GATK\.recal\.bam.*//' $line | sed 's#.*/##'}; sbatch script.sh $ID; done < input.list
Apologies for the easy Q...
You can use the output of the sed command as input for the loop:
sed 'COMMAND' input.file | while read -r id ; do
some_command "${id}"
done
Instead of the loop, also xargs could be used:
sed 'COMMAND' input.file | xargs -n1 some_command
ps: GNU sed supports to execute the result of a s operation as a command. I wouldn't recommend to use this in production, for portability reasons at least, but it's worth mention probably:
sed 's/\(.*\)\.GATK\.recal\.bam.*/sbatch script.sh \1/e' input.file
You can do this in straight up bash (If you're using that shell; ksh93 and zsh will be very similar) no sed required:
while read -r line; do
id="${line##*/}" # Remove everything up to the last / in the string
id="${id%.GATK.recal.bam}" # Remove the trailing suffix
sbatch script.sh "$id"
done < input.list
At the very least you can use a single sed call per line:
id=$(sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##' <<<"$line")
or with plain sh
id=$(printf "%s\n" "$line" | sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##')

How to delete any lines containing numbers?

I need to delete all lines that contain numbers so I have only words left.
sed -i '/^[[:digit:]]*$/d' filename.TXT
Not sure if sed works.
This sed will delete all lines containing any number:
sed '/[0-9]/d' filename.txt
awk solution
awk '!/[0-9]/' filename.txt
grep solution
grep -v '[0-9]' filename.txt

How to delete duplicate lines in file in unix?

I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'

about unix command "sed"

I want to do the following substitution in a text file:
the original string: "---a---"
after substitution : "---\a---"
and I run the following command:
sed -r -e "s/-(a)-/-\\\1-/g" test.txt
but it doesn't give the right result. What command args should I use?
Remember that backslashes are significant in Bash's double-quoted strings as well as in sed itself. Either use single quotes:
sed -r -e 's/-(a)-/-\\\1-/g' test.txt
Or escape the backslashes again:
sed -r -e "s/-(a)-/-\\\\\\1-/g" test.txt
If you echo the strings, you'll see what's happening:
$ echo "s/-(a)-/-\\\1-/g"
s/-(a)-/-\\1-/g
$ echo 's/-(a)-/-\\\1-/g'
s/-(a)-/-\\\1-/g
$ echo "s/-(a)-/-\\\\\\1-/g"
s/-(a)-/-\\\1-/g
The first one (your original) just looks like a literal backslash followed by a literal 1 to sed.
Try sed -r -e "s/-(a)-/-\\\\\\1-/g" or sed -r -e 's/-(a)-/-\\\1-/g'
The problem is that \ is captured by bash if you use double quotes.
With " you will have to do something like:
[jaypal:~/Temp] echo "---a---" | sed -r "s/-(a)-/-\\\\\1-/g"
---\a---
You have to replace 1 with a
sed -r -e "s/-(a)-/-\\\a-/g" test.txt
so many answers ......
Kaizen ~/so_test
$ echo "---a---" | sed -n 's/a/\\a/p'
---\a---
since you have a text file the following should work :
sed -i 's/a/\\a/g' filename.txt ;
does this help ?

Grep part of large file without spliting it

How can I grep a certain part of a large file from lines 1000 to 2000, up to line 1000 or from line 1000 for example?
I don't want to split the file in smaller files.
you could use sed to pre-process. EDIT: adding a q per Kent's suggestion
sed -n '1000,2000{p;2000q}' file.txt | grep 'abc'
for line 1000 through end of file
sed -n '1000,$p' file.txt | grep 'abc'
As a minor improvement over the sed solution by #ravoori, refactor the grep into the sed:
sed '1000,$/pattern/!d;2000q' file.txt
If you have the pattern in a variable, use double quotes;
sed '1000,$/'"$pattern"'/!d;2000q' file.txt
Or equivalently in Awk:
awk 'NR==2000{exit(0)}NR>=1000 && /pattern/' file.txt
or with a variable
awk -v pat="$pattern" 'NR==2000{exit(0)}NR>=1000 && $0~pat' file.txt
I'd suggest
head -2000 FILE.TXT | tail -1000 | grep XXX
as the neatest solution because head does not have to read the huge file, just the first few N thousand lines. It essentially achieves what q does in the sed solution.

Resources