gnu parallel pipe sed - no input files error - unix

I am using the following sed script to perform some find and replace:
parallel --pipepart --block 1000M -a input.txt sed -ise 's/cat/dog/g; s/abc/xyz/g; s/def/22/g' > output.txt
But I am getting the following error:
sed: no input files
I tried removing the -i option but the outcome is still the same.
input.txt file contains a combination of alphabets and numbers and contains around 30 million lines

You need -q:
parallel -q --pipepart --block 1000M -a input.txt sed -se 's/cat/dog/g; s/abc/xyz/g; s/def/22/g' > output.txt
To understand why:
https://www.gnu.org/software/parallel/man.html#QUOTING
https://www.gnu.org/software/parallel/parallel_design.html#Always-running-commands-in-a-shell
Also --block -1 --lb may be more efficient than --block 1000M.

Related

How can I put a sed command into a while loop?

Hoping someone kind can help me pls!
I have an file input.list:
/scratch/user/IFS/IFS001/IFS003.GATK.recal.bam
/scratch/user/IFS/IFS002/IFS002.GATK.recal.bam
/scratch/user/EGS/ZFXHG22/ZFXHG22.GATK.recal.bam
and I want to extract the bit before .GATK.recal.bam - I have found a solution for this:
sed 's/\.GATK\.recal\.bam.*//' input.list | sed 's#.*/##'
I now want to incorporate this into a while loop but it's not working... please can someone take a look and guide me where I'm doing wrong. My attempt is below:
while read -r line; do ID=${sed 's/\.GATK\.recal\.bam.*//' $line | sed 's#.*/##'}; sbatch script.sh $ID; done < input.list
Apologies for the easy Q...
You can use the output of the sed command as input for the loop:
sed 'COMMAND' input.file | while read -r id ; do
some_command "${id}"
done
Instead of the loop, also xargs could be used:
sed 'COMMAND' input.file | xargs -n1 some_command
ps: GNU sed supports to execute the result of a s operation as a command. I wouldn't recommend to use this in production, for portability reasons at least, but it's worth mention probably:
sed 's/\(.*\)\.GATK\.recal\.bam.*/sbatch script.sh \1/e' input.file
You can do this in straight up bash (If you're using that shell; ksh93 and zsh will be very similar) no sed required:
while read -r line; do
id="${line##*/}" # Remove everything up to the last / in the string
id="${id%.GATK.recal.bam}" # Remove the trailing suffix
sbatch script.sh "$id"
done < input.list
At the very least you can use a single sed call per line:
id=$(sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##' <<<"$line")
or with plain sh
id=$(printf "%s\n" "$line" | sed -e 's/\.GATK\.recal\.bam$//' -e 's#.*/##')

counting number of lines starting with a specific alphabet in UNIX?

How do I calculate the number of lines starting with "N" in a fastq file in UNIX? I have tried sed but I am not getting an expected output
Try:
sed -n '/^N/p' file.txt | wc -l
I don't know the layout of a fastq file, but can you use grep?
grep -c "^N" file.txt

Extract filename

So I am new to SED and Unix and I would like to replace the following file:
1500:../someFile.C:111 error
1869:../anotherFile.C:222 error
1869:../anotherFile2.Cxx:333 error
//thousands of more lines with same structure
With the followig file
someFile.c
anotherFile.c
anotherFile2.Cxx
Basically, I just want to extract the filename from every line.
So far, I have read the documentation on sed and the second answer here. My best attempt was to use a regex as follows:
sed "s/.\*\/.:.*//g" myFile.txt
Lots of ways to do this.
Sure, you could use sed:
sed 's/^[^:]*://;s/:.*//;s#\.\./##' input.txt
sed 's%.*:\.\./\([^:]*\):.*%\1%' input.txt
Or you could use a series of grep -o instances in a pipe:
grep -o ':[^:]*:' input.txt | grep -o '[^:]\{1,\}' | grep -o '/.*' | grep -o '[^/]\{1,\}'
You could even use awk:
awk -F: '{sub(/\.\.\//,"",$2); print $2}' input.txt
But the simplest way would probably be to use cut:
cut -d: -f2 input.txt | cut -d/ -f2
You can capture the substring between last / and following : and replace the whole string with the captured string(\1).
sed 's#.*/\([^:]\+\).*#\1#g' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx
OR , with little less escaping, sed with -r flag.
sed -r 's#.*/([^:]+).*#\1#g' myFile.txt
Or if you want to use grep,this will only work if your grep supports -P flag which will enable PCRE:
grep -oP '.*/\K[^:]+' myFile.txt
someFile.C
anotherFile.C
anotherFile2.Cxx

How to make sed read its script from a file?

Recently I came across following grep command:
/usr/xpg4/bin/grep -Ff grep.txt input.txt > output.txt
which as per my understanding means that from input.txt, grep the matter contained in grep.txt and output it to output.txt.
I want to do something similar for sed i.e. I want to keep the sed commands in a separate file (say sed.txt) and want to apply them on input file (say input.txt) and create a output file (say output.txt).
I tried following:
/usr/xpg4/bin/sed -f sed.txt input.txt > output.txt
It does not work and I get the following error:
sed: command garbled
The contents of files mentioned above are as below:
sed.txt
sed s/234/acn/ input.txt
sed s/78gt/hit/ input.txt
input.txt
234GH
5234BTW
89er
678tfg
234
234YT
tfg456
wert
78gt
gh23444
Your sed.txt should only contain sed commands: No prefixing with sed or suffixing with an input file. In your case it should probably be:
# sed.txt
s/234/acn/
s/78gt/hit/
When ran on your input:
$ /usr/xpg4/bin/sed -f sed.txt input.txt
acnGH
5acnBTW
89er
678tfg
acn
acnYT
tfg456
wert
hit
ghacn44
Rather than keeping the sed commands in a separate text file, you may want to try creating a sed script. The file below can run directly on your data files:
./myscript.sed inputfile.txt > outputfile.txt
#!/bin/sed -f
s/234/acn/
s/78gt/hit/

Removing first line from stdin and redirect to stdout

i need to redirect all of the stdout of a program except the first line into a file.
Is there a common unix program that removes lines from stdin and spits the rest out to stdout?
Others have already mentioned "tail". sed will also work:
sed 1d
As will Awk:
awk 'NR > 1'
tail -n +2 -f -
sed -e 1d < input > output

Resources