Unix piping in system() does not work in R - r

I am using the system() command to run a Bash command in R, but every time I try to pipe the results of one command into the next (using '|'), I get some error.
For example:
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'') returns the error: Error: unexpected '{' in "system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{", and if I try to remove awk -F "\t" '{print $2}' so that I'm left with system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p'), I get the following:
/usr/bin/grep: 2-: No such file or directory
[1] 2
I have to keep removing parts of it till I am left with only system('grep ^SN bam_stats.txt'), AKA no pipes are left, for it to work.
Here is a sample from the file 'bam_stats.txt' from which I'm extracting information:
SN filtered sequences: 0
SN sequences: 137710356
SN is sorted: 1
SN 1st fragments: 68855178
SN last fragments: 68855178
SN reads mapped: 137642653
SN reads mapped and paired: 137602018 # paired-end technology bit set + both mates mapped
SN reads unmapped: 67703
SN percentage of properly paired reads (%): 99.8
Can someone tell me why piping is not working? Apologies if this is a stupid question. Please let me know if I should provide more information.
Thank you in advance.

I don't know R but IF Rs implementation of system() just passes it's argument to a shell then, in terms of standard Unix quoting, your example
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'')
contains 2 strings within quotes and a string in the middle that's outside of quotes:
Inside: grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t"
Outside: {print $2}
Inside: <a null string>
because the 2 quotes in the middle around '{print $2}' are ending the first quoted string then later starting a second quoted string.
You don't need sed, grep, or cut if you're using awk anyway though so try just this:
system('awk -F"\t" "/^SN/ && (++cnt==8){print \$3}" bam_stats.txt')

Related

Can someone explain me the unix command the following command

I want to validate the file. As per validation, I need to check the length of each column, null or not null and primary constant of that file.
cat File_name| awk -F '|' '{print NF}' | sort | uniq
This command split lines of the file on tokens using pipe | as delimiter, print number of tokens on each row (NF variable), sort the output (sort command) and on the end get only uniq numbers (uniq command).
The script can be optimised getting rid of cat command and combine it in awk and use parameter of sort to get uniq records:
awk -F '|' '{print NF}' file_name | sort -u

How to print the last but one record of a file using sed? [duplicate]

I have a file that has the following as the last three lines. I want to retrieve the penultimate line, i.e. 100.000;8438; 06:46:12.
.
.
.
99.900; 8423; 06:44:41
100.000;8438; 06:46:12
Number of patterns: 8438
I don't know the line number. How can I retrieve it using a shell script? Thanks in advance for your help.
Try this:
tail -2 yourfile | head -1
A short sed one-liner inspired by https://stackoverflow.com/a/7671772/5287901
sed -n 'x;$p'
Explanation:
-n quiet mode: dont automatically print the pattern space
x: exchange the pattern space and the hold space (hold space now store the current line, and pattern space the previous line, if any)
$: on the last line, p: print the pattern space (the previous line, which in this case is the penultimate line).
Use this
tail -2 <filename> | head -1
ed and sed can do it as well.
str='
99.900; 8423; 06:44:41
100.000;8438; 06:46:12
Number of patterns: 8438
'
printf '%s' "$str" | sed -n -e '${x;1!p;};h' # print last line but one
printf '%s\n' H '$-1p' q | ed -s <(printf '%s' "$str") # same
printf '%s\n' H '$-2,$-1p' q | ed -s <(printf '%s' "$str") # print last line but two
From: Useful sed one-liners by Eric Pement
# print the next-to-the-last line of a file
sed -e '$!{h;d;}' -e x # for 1-line files, print blank line
sed -e '1{$q;}' -e '$!{h;d;}' -e x # for 1-line files, print the line
sed -e '1{$d;}' -e '$!{h;d;}' -e x # for 1-line files, print nothing
You don't need all of them, just pick one.
tail +2 <filename>
This prints from second line to last line.
To clarify what has already been said:
ec2thisandthat | sort -k 5 | grep 2012- | awk '{print $2}' | tail -2 | head -1
snap-e8317883
snap-9c7227f7
snap-5402553f
snap-3e7b2c55
snap-246b3c4f
snap-546a3d3f
snap-2ad48241
snap-d00150bb
returns
snap-2ad48241
tac <file> | sed -n '2p'

grep multiple files get count of unique cut

I think I'm close on this, and saw similar questions but couldn't get it to work as I want. So, I have several log files and I would like to count the occurrences of several different service calls by date.
First I tried the below, the cut is just to get the first element (date) and 11th element (name of service call), which is specific to my log file:
grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
But this returned something that looks like:
5 log_1.log:2017-12-05 getLegs()
10 log_1.log:2017-12-05 getArms()
7 log_2.log:2017-12-05 getLegs()
13 log_2.log:2017-12-04 getLegs()
What I really want is:
12 2017-12-05 getLegs()
10 2017-12-05 getArms()
13 2017-12-04 getLegs()
I've seen examples where they cat * first, but looks like the same problem.
cat * | grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
What am I doing wrong? As always, thanks a lot!
Your issue seems to be that grep prefixes the matched lines with the filenames. (grep has this behavior when multiple filenames are specified, to disambiguate the results.) You can pass the -h to grep to not print the filenames:
grep -h "invoking webservice" *.log | cut -d ' ' -f1 -f11 | sort | uniq -c
Note that I dropped the -E flag, because it is used to enable extended regex support, and your example doesn't need it.
Alternatively, you could use cat to dump the content of files to standard output, and pipe that to grep. That would work, because it removes the need for filename parameters for grep:
cat *.log | grep "invoking webservice" | cut -d ' ' -f1 -f11 | sort | uniq -c

long running process id based on grep condition and send mail

ps -eaf | LaunchKTRProcess | grep -v grep
this command will give me , full details of process, and i have to manually check his Running time and kill the process.
This
ps -e | sed 1d | egrep -v '^ *[^ ]+ +[^ ]+ +([^ ]|00:0.):'
gives the ps of all processes which run for more than ten minutes (I use sed 1d to remove the ps header line because not every ps has an option to suppress it); you can filter the output based on further conditions. Then
| awk '{print $1}'
extracts the PIDs (1st column).

How to keep a file's format if you use the uniq command (in shell)?

In order to use the uniq command, you have to sort your file first.
But in the file I have, the order of the information is important, thus how can I keep the original format of the file but still get rid of duplicate content?
Another awk version:
awk '!_[$0]++' infile
This awk keeps the first occurrence. Same algorithm as other answers use:
awk '!($0 in lines) { print $0; lines[$0]; }'
Here's one that only needs to store duplicated lines (as opposed to all lines) using awk:
sort file | uniq -d | awk '
FNR == NR { dups[$0] }
FNR != NR && (!($0 in dups) || !lines[$0]++)
' - file
There's also the "line-number, double-sort" method.
nl -n ln | sort -u -k 2| sort -k 1n | cut -f 2-
You can run uniq -d on the sorted version of the file to find the duplicate lines, then run some script that says:
if this_line is in duplicate_lines {
if not i_have_seen[this_line] {
output this_line
i_have_seen[this_line] = true
}
} else {
output this_line
}
Using only uniq and grep:
Create d.sh:
#!/bin/sh
sort $1 | uniq > $1_uniq
for line in $(cat $1); do
cat $1_uniq | grep -m1 $line >> $1_out
cat $1_uniq | grep -v $line > $1_uniq2
mv $1_uniq2 $1_uniq
done;
rm $1_uniq
Example:
./d.sh infile
You could use some horrible O(n^2) thing, like this (Pseudo-code):
file2 = EMPTY_FILE
for each line in file1:
if not line in file2:
file2.append(line)
This is potentially rather slow, especially if implemented at the Bash level. But if your files are reasonably short, it will probably work just fine, and would be quick to implement (not line in file2 is then just grep -v, and so on).
Otherwise you could of course code up a dedicated program, using some more advanced data structure in memory to speed it up.
for line in $(sort file1 | uniq ); do
grep -n -m1 line file >>out
done;
sort -n out
first do the sort,
for each uniqe value grep for the first match (-m1)
and preserve the line numbers
sort the output numerically (-n) by line number.
you could then remove the line #'s with sed or awk

Resources