Extract filename

Extract filename - unix

So I am new to SED and Unix and I would like to replace the following file:
1500:../someFile.C:111 error
1869:../anotherFile.C:222 error
1869:../anotherFile2.Cxx:333 error
//thousands of more lines with same structure
With the followig file
someFile.c
anotherFile.c
anotherFile2.Cxx
Basically, I just want to extract the filename from every line.
So far, I have read the documentation on sed and the second answer here. My best attempt was to use a regex as follows:
sed "s/.\*\/.:.*//g" myFile.txt

Lots of ways to do this.
Sure, you could use sed:
sed 's/^[^:]*://;s/:.*//;s#\.\./##' input.txt
sed 's%.*:\.\./\([^:]*\):.*%\1%' input.txt
Or you could use a series of grep -o instances in a pipe:
grep -o ':[^:]*:' input.txt | grep -o '[^:]\{1,\}' | grep -o '/.*' | grep -o '[^/]\{1,\}'
You could even use awk:
awk -F: '{sub(/\.\.\//,"",$2); print $2}' input.txt
But the simplest way would probably be to use cut:
cut -d: -f2 input.txt | cut -d/ -f2

You can capture the substring between last / and following : and replace the whole string with the captured string(\1).
sed 's#.*/\([^:]\+\).*#\1#g' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx
OR , with little less escaping, sed with -r flag.
sed -r 's#.*/([^:]+).*#\1#g' myFile.txt
Or if you want to use grep,this will only work if your grep supports -P flag which will enable PCRE:
grep -oP '.*/\K[^:]+' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx

Related

How can I put a sed command into a while loop?

Hoping someone kind can help me pls!
I have an file input.list:
/scratch/user/IFS/IFS001/IFS003.GATK.recal.bam
/scratch/user/IFS/IFS002/IFS002.GATK.recal.bam
/scratch/user/EGS/ZFXHG22/ZFXHG22.GATK.recal.bam
and I want to extract the bit before .GATK.recal.bam - I have found a solution for this:
sed 's/\.GATK\.recal\.bam.*//' input.list | sed 's#.*/##'
I now want to incorporate this into a while loop but it's not working... please can someone take a look and guide me where I'm doing wrong. My attempt is below:
while read -r line; do ID=${sed 's/\.GATK\.recal\.bam.*//' $line | sed 's#.*/##'}; sbatch script.sh $ID; done < input.list
Apologies for the easy Q...

You can use the output of the sed command as input for the loop:
sed 'COMMAND' input.file | while read -r id ; do
some_command "${id}"
done
Instead of the loop, also xargs could be used:
sed 'COMMAND' input.file | xargs -n1 some_command
ps: GNU sed supports to execute the result of a s operation as a command. I wouldn't recommend to use this in production, for portability reasons at least, but it's worth mention probably:
sed 's/\(.*\)\.GATK\.recal\.bam.*/sbatch script.sh \1/e' input.file

You can do this in straight up bash (If you're using that shell; ksh93 and zsh will be very similar) no sed required:
while read -r line; do
id="${line##*/}" # Remove everything up to the last / in the string
id="${id%.GATK.recal.bam}" # Remove the trailing suffix
sbatch script.sh "$id"
done < input.list
At the very least you can use a single sed call per line:
id=$(sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##' <<<"$line")
or with plain sh
id=$(printf "%s\n" "$line" | sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##')

How to delete any lines containing numbers?

I need to delete all lines that contain numbers so I have only words left.
sed -i '/^[[:digit:]]*$/d' filename.TXT
Not sure if sed works.

This sed will delete all lines containing any number:
sed '/[0-9]/d' filename.txt
awk solution
awk '!/[0-9]/' filename.txt
grep solution
grep -v '[0-9]' filename.txt

How to delete duplicate lines in file in unix?

I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?

There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.

sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n

After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'

about unix command "sed"

I want to do the following substitution in a text file:
the original string: "---a---"
after substitution : "---\a---"
and I run the following command:
sed -r -e "s/-(a)-/-\\\1-/g" test.txt
but it doesn't give the right result. What command args should I use?

Remember that backslashes are significant in Bash's double-quoted strings as well as in sed itself. Either use single quotes:
sed -r -e 's/-(a)-/-\\\1-/g' test.txt
Or escape the backslashes again:
sed -r -e "s/-(a)-/-\\\\\\1-/g" test.txt
If you echo the strings, you'll see what's happening:
$ echo "s/-(a)-/-\\\1-/g"
s/-(a)-/-\\1-/g
$ echo 's/-(a)-/-\\\1-/g'
s/-(a)-/-\\\1-/g
$ echo "s/-(a)-/-\\\\\\1-/g"
s/-(a)-/-\\\1-/g
The first one (your original) just looks like a literal backslash followed by a literal 1 to sed.

Try sed -r -e "s/-(a)-/-\\\\\\1-/g" or sed -r -e 's/-(a)-/-\\\1-/g'
The problem is that \ is captured by bash if you use double quotes.

With " you will have to do something like:
[jaypal:~/Temp] echo "---a---" | sed -r "s/-(a)-/-\\\\\1-/g"
---\a---

You have to replace 1 with a
sed -r -e "s/-(a)-/-\\\a-/g" test.txt

so many answers ......
Kaizen ~/so_test
$ echo "---a---" | sed -n 's/a/\\a/p'
---\a---
since you have a text file the following should work :
sed -i 's/a/\\a/g' filename.txt ;
does this help ?

Grep part of large file without spliting it

How can I grep a certain part of a large file from lines 1000 to 2000, up to line 1000 or from line 1000 for example?
I don't want to split the file in smaller files.

you could use sed to pre-process. EDIT: adding a q per Kent's suggestion
sed -n '1000,2000{p;2000q}' file.txt | grep 'abc'
for line 1000 through end of file
sed -n '1000,$p' file.txt | grep 'abc'

As a minor improvement over the sed solution by #ravoori, refactor the grep into the sed:
sed '1000,$/pattern/!d;2000q' file.txt
If you have the pattern in a variable, use double quotes;
sed '1000,$/'"$pattern"'/!d;2000q' file.txt
Or equivalently in Awk:
awk 'NR==2000{exit(0)}NR>=1000 && /pattern/' file.txt
or with a variable
awk -v pat="$pattern" 'NR==2000{exit(0)}NR>=1000 && $0~pat' file.txt

I'd suggest
head -2000 FILE.TXT | tail -1000 | grep XXX
as the neatest solution because head does not have to read the huge file, just the first few N thousand lines. It essentially achieves what q does in the sed solution.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract filename - unix

Related

How can I put a sed command into a while loop?

How to delete any lines containing numbers?

How to delete duplicate lines in file in unix?

about unix command "sed"

Grep part of large file without spliting it

Categories

Resources