How can I use system or system2 if the command has pipe to head like "cmd | head" - r

I noticed that when I run a long command in linux (I am using a cantos 7.3 distro, R 4.0.3 on the terminal) and that I pipe to head only the first outputs are shows to me (and the command stops)
ls -R /opt # on my system I would get tons of output for 10s of seconds
ls -R /opt | head # just get the top 5 and command is stopped straight away
when I try the equivalent in R I cannot get the same behaviour
system(command = "ls -R /opt | head") # will take a long time (I assume the time for ls -R /opt to finish)
Is there a way for me to get the same behaviour in R than the one I get on my system command line ?

Related

How to log all Bash (busybox/ash) commands by all users on a server?

How can we locally log all Bash commands run, along with user and time stamp?
I am looking for a way to customize Ash sessions with my own sets of aliases and whatnots. What is the Ash equivalent of Bash's bashrc files?
Standard solution not working:
export PROMPT_COMMAND='RETRN_VAL=$?;logger -p local6.debug "$(whoami) [$$]: $(history 1 | sed "s/^[ ]*[0-9]\+[ ]*//" ) [$RETRN_VAL]"'
os: linux 4.9
bash: .ash
logger: busybox (1.27.2) & syslog-ng (3.10)
This is for rsyslog:
Edit /etc/rsyslog.conf and put there a line
local6.* /var/log/mylog
Instead of PROMPT_COMMAND export PS1 in /etc/profile for example:
export PS1='$(RETRN_VAL=$?;logger -p local6.debug "[$$]: $(history | tail -1) [$RETRN_VAL]")\u#\h:\w\$ '
Of course you can change the command to get rid of unnecessary characters from history output.

Linux top not printing full command name to file in batch mode as a nohup process

I am trying to find the cpu utilization of a process from top.So before that I had to test the below command
top -b -c -d1 -n2
I am using -c option to print the full command name as the process name gets truncated without -c.
No when I run this as nohup sh test.sh & ,the output nohup.out contains truncated process name and because of which I am not able to grep on the process name
159 neutron 30 0 127620 22765 5479 S 0.0 0.6 399:02.56 /usr/bin/p+t
But when I run this as sh test.sh & its printing the full command name to terminal.
Why is the full command name not printed in spite of using -c in batch mode for top command ?
Whats the difference between command name with -c enabled and process name ?
Or to phrase it the process name and command name are different and the process name is picked from /proc/pid/status by commands like ps or top?
We can set the COLUMNS environment variable before the top command to increase the available width.
COLUMNS=1000 top -b -c -d1 -n2
The other way would be is to use ps to find the pid's of the processes by their names and specify the format of ps output.This output can be used to feed top to get the cpu usage for the process based on pid.
ps -eo pid,comms,args
comms = command name only and not the args
args = full argument list used to launch the process

Rscript not working in qsub cluster

I have two Rscripts named iHS.hist.R and Fst.hist.R. I know both scripts work. When I use the following commands in my directory in my ubuntu terminal I get a histogram plot for each script (two total if I do both scripts)
module load R
Rscript iHS.hist.R
or I could do Rscript Fst.hist.R
The point is I know they both work.
The problem is that each Rscript takes about 20 minutes to run because my data is pretty big. And unfortunately it's only going to get bigger. I have access to a cluster and I would like to make use of that. I have created two .sh scripts to send to the cluster with qsub but I am running into issues. Here is my iHS.his.sh script for my iHS.hist.R script.
#PBS -N iHS.plots
#PBS -S /bin/bash
#PBS -l walltime=2:00:00
#PBS -l nodes=1:ppn=8
#PBS -l mem=4gb
#PBS -o $HOME/${PBS_JOBNAME}.o${PBS_JOBID}.log
#PBS -e $HOME/${PBS_JOBNAME}.e${PBS_JOBID}.err
###############related commands
###edit it
#code in qsub
###############cut columns we don't need
###
cut -f1,2,3,4 /group/stranger-lab/ebeiter/test/SNPsnap_mdd_5_100/matched_snps_annotated.txt > /group/stranger-lab/ebeiter/test/SNPsnap_mdd_5_100/cut.matched_snps_annotated.txt
cut -f1,2 /group/stranger-lab/ebeiter/test/SNPsnap_mdd_5_100/input_snps_insufficient_matches.txt > /group/stranger-lab/ebeiter/test/SNPsnap_mdd_5_100/cut.input_snps_insufficient_matches.txt
###
###############only needed columns remain
cd /group/stranger-lab/ebeiter
module load R
Rscript iHS.hist.R
The cuts in the beginning are for setting up the data in the right format.
I have tried qsub iHS.hist.sh and it gives me a job. I check on it, and after about 10 minutes it finishes. So I'm assuming it's running my Rscript. I check the error file and it's empty. I check the log file and it does not give me the usual null device 1 that I get after my jpeg is completed in my Rscript. I don't get the output jpeg file for the Rscript when the cluster job is done. I do get the output jpeg file if I just did the Rscript on it's own like at the top of this. Any idea what is going on?

Concatenating input to svn list command with output, then pass it to grep

I currently have the following shell command which is only partially working:
svn list $myrepo/libs/ |
xargs -P 10 -L 1 -I {} echo $myrepo/libs/ {} trunk |
sed 's/ //g' |
xargs -P 20 -L 1 svn list --depth infinity |
grep .xlsx
where $myrepo corresponds to the svn server address.
The libs folder contains a number of subfolders (currently about 30 although eventually up to 100), each which contain a number of tags, branches and a trunk. I wish to get a list of xlsx files contained only within the trunk folder of each of these subfolders. The command above works fine however it only returns the relative path from $myrepo/libs/subfolder/trunk/, so I get this back:
1/2/3/file.xlsx
Because of the potentially large number of files I would have to search through, I am performing it in two parallel steps by using xargs -P (I do not have and cannot use parallels). It am also trying to do this in one command so it can be used in php/perl/etc. and avoid multiple sytem calls.
What I would like to do is concatenate the input to this part of the command:
xargs -P 20 -L 1 svn list --depth infinity
with the output from it, to give the following:
$myrepo/libs/subfolder/trunk/1/2/3/file.xlsx
Then pass this to the grep to find the xlsx files.
I appreciate any assistance that could be provided.
If I manage to correctly divine your intention, something like this might work for you.
svn list "$myrepo/libs/" |
xargs -P 20 -n 1 sh -c 'svn list -R "$0/trunk/$1" |
sed -n "s%.*\.xlsx$%$0/trunk/$1/&%p"' "$myrepo"
Briefly, we postprocess the output from the inner svn list to filter to just .xslx files and tack the full SVN path back on at the same time. This way, the processing happens where the repo path is still known.
We hack things a bit by passing in "$myrepo" as "$0" to the subordinate sh so we don't have to export this variable. The input from the outer svn list comes as $1.
(The repos I have access to have a slightly different layout so there could be a copy/paste error somewhere.)

how to delete all files except the latest three in a folder

I have a folder which contains some subversion revision checkouts (these are checked out when running a capistrano deployment recipe).
what I want to do really is that to keep the latest 3 revisions which the capistrano script checkouts and delete other ones, so for this I am planning to run some command on the terminal using a run command, actually capistrano hasn't got anything to do here, but a unix command.
I was trying to run a command to get a list of files except the lastest three and delete the rest, I could get the list of files using the following command.
(ls -t /var/path/to/folder |head -n 3; ls /var/path/to/folder)|sort|uniq -u|xargs
now if I add a rm -Rf to the end of this command it returns me with file not found to delete. so thats obvious because this returns only the name of the folder, not the full path to the folder.
is there anyway to delete these files / folders using one unix command?
Alright, there are a few things wrong with your script.
First, and most problematically, is this line:
ls -t /var/path/to/folder |head -n 3;
ls -t will return a list of files in order of their last modification time, starting with the most recently modified. head -n 3 says to only list the first three lines. So what this is saying is "give me a list of only the three most recently modified files", which I don't think is what you want.
I'm not really sure what you're doing with the second ls command, but I'm pretty sure that's just going to concatenate all the files in the directory into your list. That means when it gets sorted and uniq'ed, you'll just be left with an alphabetical list of all the files in that directory. When this gets passed to something like xargs rm, you'll wipe out everything in that directory.
Next, sort | uniq doesn't need the uniq part. You can just use the -u switch on sort to get rid of duplicates. You don't need this part anyway.
Finally, the actual removal of the directory. On that part, you had it right in your question: just use rm -r
Here's the easiest way I can think to do this:
ls -t1 /var/path/to/folder | tail -n +4 | xargs rm -r
Here's what's happening here:
ls -t1 is printing a list, one file/directory per line, of all files in /var/path/to/folder, ordering by the most recent modification date.
tail -n +4 is printing all lines in the output of ls -t1 starting with the fourth line (i.e. the three most recently modified files won't be listed)
xargs rm -r says to delete any file output from the tail. The -r means to recursively delete files, so if it encounters a directory, it will delete everything in that directory, then delete the directory itself.
Note that I'm not sorting anything or removing any duplicates. That's because:
ls only reports a file once, so there are no duplicates to remove
You're deleting every file passed anyway, so it doesn't matter in what order they're deleted.
Does all of that make sense?
Edit:
Since I was wrong about ls specifying the full path when passed an absolute directory, and since you might not be able to perform a cd, perhaps you could use tail instead.
For example:
ls -t1 /var/path/to/folder | tail -n +4 | xargs find /var/path/to/folder -name $1 | xargs rm -r
Below is a useful way of doing the task.......!!
for Linux and HP-UX:
ls -t1 | tail -n +50 | xargs rm -r # to leave latest 50 files/directories.
for SunOS:
rm `(ls -t |head -n 100; ls)|sort|uniq -u`
Hi I found a way to do this we can use the unix &&
so the command will look like this
cd /var/path/to/folder && ls -t1 /var/path/to/folder | tail -n +4 | xargs rm -r

Resources