Brackets in system() command in R - r

I have been trying all day to find a way to run this line (which works in bash) in R, and I keep getting errors about the round brackets... I understand that the paste command gets confused when dealing with brackets, but I have tried escaping the brackets, putting them in double quotes like this "')'" but nothing works so I am out of resources. Does anybody have any idea how this could work in R?
system(paste("sortBed -i <(awk -v a=1 -v b=2 -v c=3 -v d=4 '{OFS=FS=\"\t\"} {if ($d < 0.5) print \"value\"$a, $b-\"'$d'\", $c+\"'$d'\"}' file.in > file.out", sep=""))
sh: -c: line 0: syntax error near unexpected token `('

The reason seems to be that the R system() command calls the bourne shell (sh) instead of the bourne again shell (bash). For example, the command
> system("paste <(echo 'Hi')")
will fail, mentioning the bourne shell in the process:
sh: -c: line 0: syntax error near unexpected token `('
One solution is to print the command in the bourne shell and pipe the output into bash:
> system("echo \"paste <(echo 'Hi')\" | bash")
Hi

I get the same error as you when running the line from R. As far as I can see there's missing a final parenthesis for the output process substitution in the bash script but adding that doesn't prevent the error. Also the tabulator should be double-escaped to make sure the backslash is passed onto the awk script.
One solution that we found out works in this case is to pipe the output from awk directly into sortBed.
system(paste("awk -v a=1 -v b=2 -v c=3 -v d=4 '{OFS=FS=\"\\t\"} {if ($d < 0.5) print \"value\"$a, $b-\"'$d'\", $c+\"'$d'\"}' file.in | sortBed -i", sep=""))
We didn't really get the output process substitution to work, so if anyone has any suggestions for that it would be nice to hear.

Related

Single quotes in awk's system

I am trying to run bioawk (an extension of awk for fasta files) from awk's system functionality:
awk -v var=$i '{system("~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
The result prints the literal "\n" between the values of $name and $seq instead of the intended carriage return.
What it prints:
NAME\nSEQUENCE
What I would like it to print:
NAME
SEQUENCE
When I print the bioawk command that want to run with:
awk -v var=$i '{system("echo ~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
I get:
~/bin/bioawk-master/bioawk -c fastx {if ($name=="CANHHJNM_03494"){print ">"$name"\n"$seq}} ../../prokka/p190631-dr-tm-dc-sp-pi/EP41/EP41.ffn
I can see that it is missing the single quotes surrounding the brackets. I though having '\'' would solve this issue, but obviously it doesn't. Any help with this problem would be much appreciated
not sure this will solve your problem but the (second) easiest way to handle single quotes in an awk script is defining it externally as a variable
$ awk -v q="'" 'BEGIN{print q "single_quoted" q}'
'single_quoted'

file line count in ksh unix

I am trying to get line count from a file like below in a ksh script. But it returns nothing :
filerecordcount= $((`wc -l <../data/act.dat`))
I also tried these :
filerecordcount= `wc -l <../data/act.dat`
filerecordcount= $(wc -l <../data/act.dat)
When i print the variable its not printing the value in the variable.
print "Record Count in .dat file : $filerecordcount." 1>&2;
But when i try the same from command prompt it returns the count
wc -l<../data/act.dat
You don't have to give the space after the = in assigning part. Use the below one. It will work fine. But don't forget to print the variable filerecordcount.
UPDATE:
filerecordcount=$((`wc -l <../data/act.dat`))
Simplify simplify. Your backquotes are doing the command expansion, and their output is being processed by $((...)) as an arithmetic expression. It's a little redundant.
filerecordcount=$(wc -l < ../data/act.dat)
No space after the =, and just one level of command expansion.
Alternately you can use process substitution:
read filerecordcount < <(wc -l < ../data/act.dat)
Or you could even do this without a subshell, using a loop:
filerecordcount=0
while read junk; do ((filerecordcount++)); done < ../data/act.dat

awk getline not accepting external variable from a file

I have a file test.sh from which I am executing the following awk command.
awk -f x.awk < result/output.txt >>difference.txt
x.awk
while (getline < result/$bld/$DeviceType)
the variable DeviceType and bld are available in test.sh.
I have declared them as export type.
export DeviceType=$line
Even then while executing test.sh file, the script stops at following line
awk -f x.awk < result/output.txt >>difference.txt
and I am getting
awk: x.awk:4: (FILENAME=- FNR=116) fatal: division by zero attempted
error.
The awk script is read by awk, not touched by the shell. Inside an awk script, $bld means 'the field designated by the number in the variable bld' (that's the awk variable bld).
You can set awk variables on the command line (officially with the -v option):
awk -v bld="$bld" -v dev="$DeviceType" -f x.awk < result/output.txt >> difference.txt
Whether that does what you want is still debatable. Most likely you need x.awk to contain something like:
BEGIN { file = sprintf("result/%s/%s", bld, dev); }
{ while ((getline < file) > 0) print }
awk is not shell just like C is not shell. You should not expect to be able to access shell variables within an awk program any more than you can access shell variables within a C program.
To pass the VALUE of shell variables to an awk script, see http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details but essentially:
awk -v awkvar="$shellvar" '{ ... use awkvar ...}'
is usually the right approach.
Having said that, whatever you're trying to do it looks like the wrong approach. If you are considering using getline, make sure to read http://awk.freeshell.org/AllAboutGetline first and understand all of the caveats but if you tell us what it is you're trying to do with sample input and expected output we can almost certainly help you come up with a better approach that has nothing to do with getline.

unix script using sed

Im trying to get the following script to work, but Im having some issues:
g++ -g -c $1
DWARF=echo $1 | sed -e `s/(^.+)\.cpp$/\1/`
and Im getting -
./dcompile: line 3: test3.cpp: command not found
./dcompile: command substitution: line 3: syntax error near unexpected token `^.+'
./dcompile: command substitution: line 3: `s/(^.+)\.cpp$/\1/'
sed: option requires an argument -- 'e'
and then bunch of stuff on sed usage. What I want to do is pass in a cpp file and then extract the file name without the .cpp and put it into the variable DWARF. I would also like to later use the variable DWARF to do the following -
readelf --debug-dump=info $DWARF+".o" > $DWARF+".txt"
But Im not sure how to actually do on the fly string concats, so please help with both those issues.
You actually need to execute the command:
DWARF=$(echo $1 | sed -e 's/(^.+)\.cpp$/\1/')
The error message is a shell error because your original statement
DWARF=echo $1 | sed -e `s/(^.+)\.cpp$/\1/`
is actually parsed like this
run s/(^.+)\.cpp$/\1/
set DWARF=echo
run the command $1 | ...
So when it says test3.cpp: command not found I assume that you are running with argument test3.cpp and it's literally trying to execute that file
You also need to wrap the sed script in single quotes, not backticks
In BASH you can crop off the extension from $1 by
${1%*.cpp}
if you need to set the DWARF var use
DWARF="${1%*.cpp}"
or just reference $1 as
readelf --debug-dump=info "${1%*.cpp}.o" > "${1%*.cpp}.txt"
which will chop off the rightmost .cpp so test.cpp.cpp will be test.cpp
You can use awk for this:
$ var="testing.cpp"
$ DWARF=$(awk -F. '{print $1}' <<< $var)
$ echo "$DWARF"
testing

Unix command to ignore lines that begins with X

I'm currently using a compiler that constantly returns warnings, I don't want to see the warnings. I've noticed that all warnings begin with the string "Note :", so I figured it's possible to filter out these lines.
I compile with
jrc *.jr
Is there a unix command that alters the output it gives to not print out the lines that begin with "Note :"?
grep -v "^Note:"
Also, you may want to redirect stderr to stdout:
command 2>&1 | grep -v "^Note:"
Another way would be to use sed.
sed '/^Note:/d'

Resources