I have created a salmonify function in R which takes all the fastq files in a folder and then runs the salmon quant function in command line one after another using the system function. However, after the system function completes the list successfully, the rest of the salmonify function will not run. ie. it does not print the statement, and if I try nesting the salmonify function in another function (to run it for multiple study folders), it stops at the same place.
I am using a linux ubuntu 18.04. The salmon function is the latest (1.1.0), and R is version is 3.6.
The code is:
salmonify_single = function(hardrive, study, index){
salmon_file = "salmon quant -i "
code1= " -l A -r fastq_files/"
code3 = " -o Salmon/quant/"
name = paste0(hardrive, "/", study, "_data/fastq_files")
L = list.files(name)
O = gsub(x = L, pattern = "_1.fastq", replacement = "")
code4 = O[! duplicated(O)]
lst = as.list(L)
lst3 = list()
for (i in seq_along(lst)) {
x = lst[[i]]
lst3[i] = paste0(salmon_file, index, code1, x[1], code3, code4[i])
}
for (i in lst3) {
print(i)
}
fold_name = paste0(hardrive, "/", study, "_data")
setwd(fold_name)
for (i in lst3) {
system(i)
}
print(paste0("Completed salmonify for ", study))
}
The output for printing the list that will run from
for (i in lst3) {
print(i)
}
is :
> salmonify_paired("DATA/RNAseq_Analysis3",
+ "dikovskaya", "salmon_index_k31")
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095296_1.fastq -2 fastq_files/SRR2095296_2.fastq -o Salmon/quant/SRR2095296"
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095297_1.fastq -2 fastq_files/SRR2095297_2.fastq -o Salmon/quant/SRR2095297"
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095298_1.fastq -2 fastq_files/SRR2095298_2.fastq -o Salmon/quant/SRR2095298"
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095299_1.fastq -2 fastq_files/SRR2095299_2.fastq -o Salmon/quant/SRR2095299"
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095300_1.fastq -2 fastq_files/SRR2095300_2.fastq -o Salmon/quant/SRR2095300"
[1] "salmon quant -i salmon_index_k31 -l A -1 fastq_files/SRR2095301_1.fastq -2 fastq_files/SRR2095301_2.fastq -o Salmon/quant/SRR2095301"
Then the salmon function begins, and then I can't get it to print Completed salmonify for.... Any help here would be much appreciated.
Best,
James
Related
I have the following bash script in which an R script is called
#!/bin/bash
declare -x a=33
declare -x b=1
declare -x c=0
Rscript --vanilla MWE.R $a $b $c
echo $a $b $c
I want to modify the bash variables in the R script and return their modified values in the bash script because I am then passing the modified variables somewhere else. The R script is
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
Rb = as.numeric(args[2])
Rc = as.numeric(args[3])
Rb = Rb + 1
Rc = Rc + 1
args[2]=Rb
args[3]=Rc
print(c(args[1],args[2],args[3]))
However, the output of the print and echo respectively are:
[1] "33" "2" "1"
33 1 0
which shows that the new values aren't passed from R to bash. What am I doing wrong?
As Rscript does not allow environment variable manipulation you will need to capture the R output from the bash program.
One of the many possibilities is to use an array:
#!/bin/bash
declare a=33
declare b=1
declare c=0
declare -a RESULT
RESULT=($(Rscript --vanilla MWE.R $a $b $c))
a=${RESULT[1]}
b=${RESULT[2]}
c=${RESULT[3]}
I need to build up long command lines in R and pass them to system(). I find it is very inconvenient to use paste0/paste function, or even sprintf function to build each command line. Is there a simpler way to do like this:
Instead of this hard-to-read-and-too-many-quotes:
cmd <- paste("command", "-a", line$elem1, "-b", line$elem3, "-f", df$Colum5[4])
or:
cmd <- sprintf("command -a %s -b %s -f %s", line$elem1, line$elem3, df$Colum5[4])
Can I have this:
cmd <- buildcommand("command -a %line$elem1 -b %line$elem3 -f %df$Colum5[4]")
For a tidyverse solution see https://github.com/tidyverse/glue. Example
name="Foo Bar"
glue::glue("How do you do, {name}?")
With version 1.1.0 (CRAN release on 2016-08-19), the stringr package has gained a string interpolation function str_interp() which is an alternative to the gsubfn package.
# sample data
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
# do the string interpolation
stringr::str_interp("command -a ${line$elem1} -b ${line$elem3} -f ${df$Colum5[4]}")
#[1] "command -a 10 -b 30 -f 4"
This comes pretty close to what you are asking for. When any function f is prefaced with fn$, i.e. fn$f, character interpolation will be performed replacing ... with the result of running ... as an R expression.
library(gsubfn)
cmd <- fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
Here is a self contained reproducible example:
library(gsubfn)
# test inputs
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## [1] "command -a 10 -b 30 -f 4"
system
Since any function can be used we could operate directly on the system call like this. We have used echo here to make it executable but any command could be used.
exitcode <- fn$system("echo -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## -a 10 -b 30 -f 4
Variation
This variation would also work. fn$f also performs substitution of $whatever with the value of variable whatever. See ?fn for details.
with(line, fn$identity("command -a $elem1 -b $elem3 -f `df$Colum5[4]`"))
## [1] "command -a 10 -b 30 -f 4"
Another option would be to use whisker.render from https://github.com/edwindj/whisker which is a {{Mustache}} implementation in R. Usage example:
require(dplyr); require(whisker)
bedFile="test.bed"
whisker.render("processing {{bedFile}}") %>% print
Not really a string interpolation solution, but still a very good option for the problem is to use the processx package instead of system() and then you don't need to quote anything.
library(GetoptLong)
str = qq("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
cat(str)
qqcat("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
https://cran.r-project.org/web/packages/GetoptLong/vignettes/variable_interpolation.html
My main question is how to split strings on the command line into parameters using a terminal command in Linux?
For example
on the command line:
./my program hello world "10 20 30"
The parameters are set as:
$1 = hello
$2 = world
$3 = 10 20 30
But I want:
$1 = hello
$2 = world
$3 = 10
$4 = 20
$5 = 30
How can I do it correctly?
You can reset the positional parameters $# by using the set builtin. If you do not double-quote $#, the shell will word-split it producing the behavior you desire:
$ cat my_program.sh
#! /bin/sh
i=1
for PARAM; do
echo "$i = $PARAM";
i=$(( $i + 1 ));
done
set -- $#
echo "Reset \$# with word-split params"
i=1
for PARAM; do
echo "$i = $PARAM";
i=$(( $i + 1 ));
done
$ sh ./my_program.sh foo bar "baz buz"
1 = foo
2 = bar
3 = baz buz
Reset $# with word-split params
1 = foo
2 = bar
3 = baz
4 = buz
As an aside, I find it mildly surprising that you want to do this. Many shell programmers are frustrated by the shell's easy, accidental word-splitting — they get "John", "Smith" when they wanted to preserve "John Smith" — but it seems to be your requirement here.
Use xargs:
echo "10 20 30" | xargs ./my_program hello world
xargs is a command on Unix and most Unix-like operating systems used
to build and execute command lines from standard input. Commands such as
grep and awk can accept the standard input as a parameter, or argument
by using a pipe. However, others such as cp and echo disregard the
standard input stream and rely solely on the arguments found after the
command. Additionally, under the Linux kernel before version 2.6.23,
and under many other Unix-like systems, arbitrarily long lists of
parameters cannot be passed to a command,[1] so xargs breaks the list
of arguments into sublists small enough to be acceptable.
(source)
I have a unix script to get files via ftp looks something like this:
#!/bin/sh
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
FILE='1234'
ftp -n $HOST <<END_SCRIPT
quote USER $USER
quote PASS $PASSWD
cd .LogbookPlus
get $FILE
quit
END_SCRIPT
exit 0
Instead of getting a specific file, I want to get the last modified file in a folder, or all files created in the last 24 hours. Is this possible via ftp?
This is really pushing the FTP client further than it should be pushed, but it is possible.
Note that the LS_FILE_OFFSET might be different on your system and this won't work at all if the offset is wrong.
#!/bin/sh
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
DIRECTORY='.LogbookPlus'
FILES_TO_GET=1
LS_FILE_OFFSET=57 # Check directory_listing to see where filename begins
rm -f directory_listing
# get listing from directory sorted by modification date
ftp -n $HOST > directory_listing <<fin
quote USER $USER
quote PASS $PASSWD
cd $DIRECTORY
ls -t
quit
fin
# parse the filenames from the directory listing
files_to_get=`cut -c $LS_FILE_OFFSET- < directory_listing | head -$FILES_TO_GET`
# make a set of get commands from the filename(s)
cmd=""
for f in $files_to_get; do
cmd="${cmd}get $f
"
done
# go back and get the file(s)
ftp -n $HOST <<fin
quote USER $USER
quote PASS $PASSWD
cd $DIRECTORY
$cmd
quit
fin
exit 0
You should have definitely given some more information about the systems you are using, e.g. not every ftp server supports ls -t that #JesseParker uses. I used the opportunity and put some ideas that I have used myself for some time into a script that uses awk to to the dirty deeds. As you can see, knowing what flavor of unix your client uses would be beneficial. I have tested this script to run under Debian Wheezy GNU/Linux and FreeBSD 9.2.
#!/bin/sh
# usage: <this_script> <num_files> <date...> [ <...of...> <...max....> <...age...> ... ]
#
# Fetches files from preconfigured ftp server to current directory.
# Maximum number of files is <num_files>
# Only files that have a newer modification time than given date are considered.
# This date is given according to the local 'date' command, which is very different
# on BSD and GNU systems, e.g.:
#
# GNU:
# yesterday
# last year
# Jan 01 1970
#
# BSD:
# -v-1d # yesterday (now minus 1 day)
# -v-1y # last year (now minus 1 year)
# -f %b %e %C%y Jan 01 1970 # format: month day century year
#
# Script tries to autodetect date system, YMMV.
#
# BUGS:
# Does not like quotation marks (") in file names, maybe much more.
#
# Should not have credentials inside this file, but maybe have them
# in '.netrc' and not use 'ftp -n'.
#
# Plenty more.
#
HOST='1.1.1.1'
USER='user'
PASSWD='pass'
DIR='.LogbookPlus'
# Date format for numerical comparison. Can be simply +%s if supported.
DATE_FMT=+%C%y%m%d%H%M%S
# The server's locale for date strings.
LC_SRV_DATE=C
# The 'date' command from BSD systems and that from the GNU coreutils
# are completely different. Test for the appropriate system here:
if LC_ALL=C date -j -f "%b %e %C%y" "Jan 01 1970" $DATE_FMT > /dev/null 2>&1 ; then
SYS_TYPE=BSDish
elif LC_ALL=C date -d "Jan 01 1970" $DATE_FMT > /dev/null 2>&1 ; then
SYS_TYPE=GNUish
else
echo "sh: don't know how to date ;-) sorry!"
exit 1;
fi
# Max. number of files to get (newest files first)
MAX_NUM=$(( ${1:-1} + 0 )) # ensure argv[1] is treated as a number
shift
# Max. age of files. Only files newer that this will be considered.
if [ GNUish = "$SYS_TYPE" ] ; then
MAX_AGE=$( date "$DATE_FMT" -d "${*:-yesterday}" )
elif [ BSDish = "$SYS_TYPE" ] ; then
MAX_AGE=$( date -j "${*:--v-1d}" "$DATE_FMT" )
fi
# create temporary file
TMP_FILE=$(mktemp)
trap 'rm -f "$TMP_FILE"' EXIT INT TERM HUP
ftp -i -n $HOST <<END_FTP_SCRIPT | \
awk -v max_age="$MAX_AGE" \
-v max_num="$MAX_NUM" \
-v date_fmt="$DATE_FMT" \
-v date_loc="$LC_SRV_DATE" \
-v sys_type="$SYS_TYPE" \
-v tmp_file="$TMP_FILE" '
BEGIN {
# columns in the 'dir' output from the ftp server:
# drwx------ 1 user group 4096 Apr 8 2009 Mail
# -rw------- 1 user group 13052 Nov 20 02:07 .bash_history
perm=1; links=2; user=3; group=4; size=5; month=6; day=7; yeartime=8; # name=$9..$NF
if ( "BSDish" == sys_type ) {
date_cmd="LC_ALL=" date_loc " date -j -f"
} else if ( "GNUish" == sys_type ) {
date_cmd="LC_ALL=" date_loc " date -d"
} else {
print "awk: don'\''t know how to date ;-) sorry!" > "/dev/stderr"
exit 1;
}
files[""] = ""
file_cnt = 0
out_cmd = "sort -rn | head -n " max_num " > " tmp_file
}
$perm ~ /^[^-]/ { # skip non-regular files
next
}
{
if ( "BSDish" == sys_type ) {
if ( $yeartime ~ /[0-9][0-9][0-9][0-9]/ ) {
ts_fmt = "\"%b %e %C%y\""
} else if ( $yeartime ~ /[0-9][0-9:[0-9][0-9]/ ) {
ts_fmt = "\"%b %e %H:%M\""
} else {
print "has neither year nor time: " $8
exit 1
}
} else { # tested in BEGIN: must be "GNUish"
ts_fmt = ""
}
cmd = date_cmd " " ts_fmt " \"" $month " " $day " " $yeartime "\" " date_fmt
cmd | getline timestamp
close( cmd )
if ( timestamp > max_age ) {
# clear everything but the file name
$perm=$links=$user=$group=$size=$month=$day=$yeartime=""
files[ file_cnt,"name" ] = $0
files[ file_cnt,"time" ] = timestamp
++file_cnt
}
}
END {
for( i=0; i<file_cnt; ++i ) {
print files[ i,"time" ] "\t" files[ i,"name" ] \
| out_cmd
}
close( out_cmd )
print "quote USER '$USER'\nquote PASS '$PASSWD'\ncd \"'$DIR'\""
i = 0
while( (getline < tmp_file) > 0 ) {
$1 = "" # drop timestamp
gsub( /^ /,"" ) # strip leading space
print "get \"" $0 "\""
}
print "quit"
}
' \
| ftp -v -i -n $HOST
quote USER $USER
quote PASS $PASSWD
cd "$DIR"
dir .
quit
END_FTP_SCRIPT
I am working on a UNIX box, and trying to run an application, which gives some debug logs to the standard output. I have redirected this output to a log file, but now wish to get the lines where the error is being shown.
My problem here is that a simple
cat output.log | grep FAIL
does not help out. As this shows only the lines which have FAIL in them. I want some more information along with this. Like the 2-3 lines above this line with FAIL. Is there any way to do this via a simple shell command? I would like to have a single command line (can have pipes) to do the above.
grep -C 3 FAIL output.log
Note that this also gets rid of the useless use of cat (UUOC).
grep -A $NUM
This will print $NUM lines of trailing context after matches.
-B $NUM prints leading context.
man grep is your best friend.
So in your case:
cat log | grep -A 3 -B 3 FAIL
I have two implementations of what I call sgrep, one in Perl, one using just pre-Perl (pre-GNU) standard Unix commands. If you've got GNU grep, you've no particular need of these. It would be more complex to deal with forwards and backwards context searches, but that might be a useful exercise.
Perl solution:
#!/usr/perl/v5.8.8/bin/perl -w
#
# #(#)$Id: sgrep.pl,v 1.6 2007/09/18 22:55:20 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}
Pre-Perl Unix solution (using plain ed, sed, and sort - though it uses getopt which was not necessarily available back then):
#!/bin/ksh
#
# #(#)$Id: old.sgrep.sh,v 1.5 2007/09/15 22:15:43 jleffler Exp $
#
# Special grep
# Finds a pattern and prints lines either side of the pattern
# Line numbers are always produced by ed (substitute for grep),
# which allows us to eliminate duplicate lines cleanly. If the
# user did not ask for numbers, these are then stripped out.
#
# BUG: if the pattern occurs in in the first line or two and
# the number of lines to go back is larger than the line number,
# it fails dismally.
set -- `getopt "f:b:hn" "$#"`
case $# in
0) echo "Usage: $0 [-hn] [-f x] [-b y] pattern [files]" >&2
exit 1;;
esac
# Tab required - at least with sed (perl would be different)
# But then the whole problem would be different if implemented in Perl.
number="'s/^\\([0-9][0-9]*\\) /\\1:/'"
filename="'s%^%%'" # No-op for sed
f=3
b=3
nflag=no
hflag=no
while [ $# -gt 0 ]
do
case $1 in
-f) f=$2; shift 2;;
-b) b=$2; shift 2;;
-n) nflag=yes; shift;;
-h) hflag=yes; shift;;
--) shift; break;;
*) echo "Unknown option $1" >&2
exit 1;;
esac
done
pattern="${1:?'No pattern'}"
shift
case $# in
0) tmp=${TMPDIR:-/tmp}/`basename $0`.$$
trap "rm -f $tmp ; exit 1" 0
cat - >$tmp
set -- $tmp
sort="sort -t: -u +0n -1"
;;
*) filename="'s%^%'\$file:%"
sort="sort -t: -u +1n -2"
;;
esac
case $nflag in
yes) num_remove='s/[0-9][0-9]*://';;
no) num_remove='s/^//';;
esac
case $hflag in
yes) fileremove='s%^$file:%%';;
no) fileremove='s/^//';;
esac
for file in $*
do
echo "g/$pattern/.-${b},.+${f}n" |
ed - $file |
eval sed -e "$number" -e "$filename" |
$sort |
eval sed -e "$fileremove" -e "$num_remove"
done
rm -f $tmp
trap 0
exit 0
The shell version of sgrep was written in February 1989, and bug fixed in May 1989. It then remained unchanged except for an administrative change (SCCS to RCS transition) in 1997 until 2007, when I added the -h option. I switched to the Perl version in 2007.
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
You can use sed to print specific lines, lets say you want line 20
sed '20 p' -n FILE_YOU_WANT_THE_LINE_FROM
Done.
-n prevents echoing lines from the file. The part in quotes is a sed rule to apply, it specifies that you want the rule to apply to line 20, and you want to print.
With GNU grep on Windows:
$ grep --context 3 FAIL output.log
$ grep --help | grep context
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-NUM same as --context=NUM