Running R + awk from bash - r

A script run well on R, but failed when it was executed using R q -e from bash.
The script that run well on R was:
R> sizes <- read.table(pipe("ls -l /tmp | awk '!/^total/ {print $5}'"))
R> summary(sizes)
The command pattern from bash followed a previous discussion, but generated error messages:
R -q -e "x <- read.table(pipe("ls -l /tmp | awk '!/^total/ {print $5}'"));summary(x)"
awk: line 1: extra ')'
awk: line 1: extra ')'
awk: line 1: syntax error at or near ;
What's wrong with the above command?
root#kali:~# uname -a
Linux kali 3.18.0-kali3-586 #1 Debian 3.18.6-1~kali2 (2015-03-02) i686 GNU/Linux

Try this
ls -l /tmp | awk '!/^total/ {print $5}' | R --slave -e 'x <- scan(file="stdin"); summary(x)'
If you're trying to get stats on all the files in a particular directory hierarchy something like this is probably better:
find /tmp -type f -exec du {} \; | awk '{print $1}' | R --slave -e 'x <- scan(file="stdin"); summary(x)'

Related

How can pipes and grep and wc be combined to just give a count of the phrase “syntax ok”

How can pipes and grep and wc be combined to just give a count of the phrase “syntax ok”
Something like the following…
cd /usr/IBMIHS/bin/ |
apachectl -t -f /usr/IBMIHS/conf/AAA/httpd.conf |
apachectl -t -f /usr/IBMIHS/conf/AAA/siteAA.conf |
grep "^Syntax OK" | wc
Simply via grouping commands with curly brackets, and use grep -c:
{
apachectl -t -f /usr/IBMIHS/conf/AAA/httpd.conf
apachectl -t -f /usr/IBMIHS/conf/AAA/siteAA.conf
} |& grep -c "Syntax OK"
From man grep
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.

dynamically pass string to Rscript argument with sed

I wrote a script in R that has several arguments. I want to iterate over 20 directories and execute my script on each while passing in a substring from the file path as my -n argument using sed. I ran the following:
find . -name 'xray_data' -exec sh -c 'Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
which results in this error:
ubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
sh: command substitution: line 0: syntax error near unexpected token `('
sh: command substitution: line 0: `sed -e s/.*DeMMO.*[/](.*)_.*[/]xray_data/1/ "./DeMMO1/D1T3rep_Dec2019_Ellison/xray_data"'
When I try to use sed with my pattern on an example file path, it works:
echo "./DeMMO1/D1T1exp_Dec2019_Poorman/xray_data" | sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/'
which produces the correct substring:
D1T1exp_Dec2019
I think there's an issue with trying to use single quotes inside the interpreted string but I don't know how to deal with this. I have tried replacing the single quotes around the sed pattern with double quotes as well as removing the single quotes, both result in this error:
sed: RE error: illegal byte sequence
How should I extract the substring from the file path dynamically in this case?
To loop through the output of find.
while IFS= read -ru "$fd" -d '' files; do
echo "$files" ##: do whatever you want to do with the files here.
done {fd}< <(find . -type f -name 'xray_data' -print0)
No embedded commands in quotes.
It uses a random fd just in case something inside the loop is eating/slurping stdin
Also -print0 delimits the files with null bytes, so it should be safe enough to handle spaces tabs and newlines on the path and file names.
A good start is always put an echo in front of every commands you want to do with the files, so you have an idea what's going to be executed/happen just in case...
This is the solution that ultimately worked for me due to issues with quotes in sed:
for dir in `find . -name 'xray_data'`;
do sampleID="`basename $(dirname $dir) | cut -f1 -d'_'`";
Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f "$dir" -b "$dir/SEM_images" -c "$dir/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "$sampleID";
done

xargs to copy one file into several

I have a directory that has one file with information (call it masterfile.inc) and several files that are empty (call them file1.inc-file20.inc)
I'm trying to formulate an xargs command that copies the contents of masterfile.inc into all of the empty files.
So far I have
ls -ltr | awk '{print $9}' | grep -v masterfile | xargs -I {} cat masterfile.inc > {}
Unfortunately, all this does is creates a file called {} and prints masterfile.inc into it N times.
Is there something I'm missing with the syntax here?
Thanks in advance
You can use this command to copy file 20 times:
$ tee <masterfile.inc >/dev/null file{1..20}.inc
Note: file{1..20}.inc will expand to file1, file2, ... , file20
If you disternation filenames are random:
$ shopt -s extglob
$ tee <masterfile.inc >/dev/null $(ls !(masterfile.inc))
Note: $(ls !(masterfile.inc)) will expand to all file in current directory except masterfile.inc (please don't use spaces in filename)
While the tee trick is really brilliant you might be interested in a solution that is easier to adapt for other situations. Here using GNU Parallel:
ls -ltr | awk '{print $9}' | grep -v masterfile | parallel "cat masterfile.inc > {}"
It takes literally 10 seconds to install GNU Parallel:
wget pi.dk/3 -qO - | sh -x
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

problem in a shell command

i am trying the following command on the command line
ps -u `id | cut -f2 -d"=" | cut -f1 -d"("` -f | grep ppLSN | awk '{print $9}' | awk '{FS="=";print $2}' | grep KLMN | wc -l
the value of teh command is returned as 7.
but when i am putting the same command inside a script abc_sh like below
ps -u `id | cut -f2 -d"=" | cut -f1 -d"("` -f | grep ppLSN | awk '{print $9}' | awk '{FS="=";print $2}' | grep $XYZ | wc -l
and i am calling the script on the command line as abc_sh XYZ=KLMN and it does not work and returns 0
the problem is with the grep in the command grep $XYZ
could anybody please tell why this is not working?
Because your $1 variable (first argument to the script) is set to XYZ=KLMN.
Just use abc_sh KLMN and grep $1 instead of grep $XYZ.
(Assuming we are talking about bash here)
The other alternative is defining a temporary environment variable in which case you would have to call it like this: XYZ=KLMN abc_sh
EDIT:
Found what you were using, you have to use set -k (see SHELL BUILTIN COMMANDS in the BASH manual)
-k All arguments in the form of assignment statements are
placed in the environment for a command, not just those
that precede the command name.
So
vinko#parrot:~$ more abc
#!/bin/bash
echo $XYZ
vinko#parrot:~$ set -k
vinko#parrot:~$ ./abc XYZ=KLMN
KLMN
vinko#parrot:~$ set +k
vinko#parrot:~$ ./abc XYZ=KLMN
vinko#parrot:~$
So, the place where this was working probably has set -k in one of the startup scripts (bashrc or profile.)
Try any of these to set a temporary environment variable:
XYZ=KLMN abc_sh
env XYZ=KLMN abc_sh
(export XYZ=KLMN; abc_sh)
you are using so many commands chained together....
ps -u `id -u` -f | awk -v x="$XYZ" -v p="ppLSN" '$0~p{
m=split($9,a,"=")
if(a[2]~x){count++}
}
END{print count}'
Call this script:
#!/bin/ksh
ps -u $(id -u) -o args | grep $XYZ | cut -f2- -d " "
Like this:
XYZ=KLMN abc_sh

Multiple grep search/ignore patterns

I usually use the following pipeline to grep for a particular search string and yet ignore certain other patterns:
grep -Ri 64 src/install/ | grep -v \.svn | grep -v "file"| grep -v "2\.5" | grep -v "2\.6"
Can this be achieved in a succinct manner? I am using GNU grep 2.5.3.
Just pipe your unfiltered output into a single instance of grep and use an extended regexp to declare what you want to ignore:
grep -Ri 64 src/install/ | grep -v -E '(\.svn|file|2\.5|2\.6)'
Edit: To search multiple files maybe try
find ./src/install -type f -print |\
grep -v -E '(\.svn|file|2\.5|2\.6)' | xargs grep -i 64
Edit: Ooh. I forgot to add the simple trick to stop a cringeable use of multiple grep instances, namely
ps -ef | grep something | grep -v grep
Replacing that with
ps -ef | grep "[s]omething"
removes the need of the second grep.
Use the -e option to specify multiple patterns:
grep -Ri 64 src/install/ | grep -v -e '\.svn' -e file -e '2\.5' -e '2\.6'
You might also be interested in the -F flag, which indicates that patterns are fixed strings instead of regular expressions. Now you don't have to escape the dot:
grep -Ri 64 src/install/ | grep -vF -e .svn -e file -e 2.5 -e 2.6
I noticed you were grepping out ".svn". You probably want to skip any directories named ".svn" in your initial recursive grep. If I were you, I would do this instead:
grep -Ri 64 src/install/ --exclude-dir .svn | grep -vF -e file -e 2.5 -e 2.6
you can use awk instead of grep
awk '/64/&&!/(\.svn|file|2\.[56])/' file
You maybe want to use ack-grep which allow to exclude with perl regexp as well and avoid all the VC directories, great for grepping source code.
The following script will remove all files except a list of files:
echo cleanup_all $#
if [[ $# -eq 0 ]]; then
FILES=`find . -type f`
else
EXCLUDE_FILES_EXP="("
for EXCLUDED_FILE in $#
do
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP./$EXCLUDED_FILE|"
done
# strip last char
EXCLUDE_FILES_EXP="${EXCLUDE_FILES_EXP%?}"
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP)"
echo exluded files expression : $EXCLUDE_FILES_EXP
FILES=`find . -type f | egrep -v $EXCLUDE_FILES_EXP`
fi
echo removing $FILES
for FILE in $FILES
do
echo "cleanup: removing file $FILE"
rm $FILE
done

Resources