R's pipe() function and Ubuntu console gives different result - r

I am using R in my Ubuntu machine with latest configuration
In R, I get below result:
> read.fwf(pipe('ps -ef | grep /var/lib/docker/'), width = 60)
V1
1 root 29155 29151 0 11:18 pts/0 00:00:00 sh -c ps -ef
2 root 29157 29155 0 11:18 pts/0 00:00:00 grep /var/li
However in Ubuntu console I get different result
ps -ef | grep /var/lib/docker/
root 29150 2509 0 11:17 pts/0 00:00:00 grep --color=auto /var/lib/docker/
I wanted R to fetch PID of /var/lib/docker/, which is according to Ubuntu 2509
Can anyone help me understand why I am getting different result and how to fetch the PID number correctly?
Thanks,

Use ps() in the ps package. This function outputs a data.frame with the process id information.
library(ps)
pid_df <- ps()
pid_df$pid[grep("docker", pid_df$name)]
or in one line:
subset(ps(), grep("docker", name))$pid

Related

Executing Linux/Unix Command From Within R Using Variables

I'm trying to make a call from within R to execute BASH commands, to get my feet wet:
I wanted to simply capture a listing of my current files located in a specific directory through use of the "ls -al" command. The output would be sent to text file called a01_test.txt.
The directory I would like to capture the contents of is "C:\Users\user00\a01_TEST" which is referenced as "/mnt/c/Users/user00/a01_TEST/" from a WSL Ubuntu 20.04.5 LTS perspective.
The directory contains five (5) files: file_01.txt, file_02.txt ,..., file_05.txt.
FYI, I am running R (R version 4.2.0 (2022-04-22 ucrt)) via RStudio (2022.07.1 Build 554) on Windows 11 (Version 10.0.22000 Build 22000).
I tried:
PATH_UNIX <- "/mnt/c/Users/user00/a01_TEST/"
FILENAME_TEST <-"a01_test.txt"
paste0("system(\"bash -c \'ls -al ",PATH_UNIX," >",PATH_UNIX,FILENAME_TEST,"\'\")")
However that only returned a command prompt -- nothing else:
> paste0("system(\"bash -c \'ls -al ",PATH_UNIX," >",PATH_UNIX,FILENAME_TEST,"\'\")")
[1] "system(\"bash -c 'ls -al /mnt/c/Users/user00/a01_TEST/ >/mnt/c/Users/user00/a01_TEST/a01_test.txt'\")"
>
I thought one could test the code using:
cat(print(paste0("system(\"bash -c \'ls -al ",PATH_UNIX," >",PATH_UNIX,FILENAME_TEST,"\'\")")))
which resulted in:
> cat(print(paste0("system(\"bash -c \'ls -al ",PATH_UNIX," >",PATH_UNIX,FILENAME_TEST,"\'\")")))
[1] "system(\"bash -c 'ls -al /mnt/c/Users/user00/a01_TEST/ >/mnt/c/Users/user00/a01_TEST/a01_test.txt'\")"
system("bash -c 'ls -al /mnt/c/Users/user00/a01_TEST/ >/mnt/c/Users/user00/a01_TEST/a01_test.txt'")
If I do not use variables, such as, PATH_UNIX and FILENAME_TEST and code the entire path manually, I can create a text file (a01_test.txt) giving me the desired listing of the directory's contents:
system("bash -c 'ls -al /mnt/c/Users/user00/a01_TEST > /mnt/c/Users/user00/a01_TEST/a01_test.txt'")
which results in:
> system("bash -c 'ls -al /mnt/c/Users/user00/a01_TEST > /mnt/c/Users/user00/a01_TEST/a01_test.txt'")
[1] 0
>
giving me the file called "a01_test.txt" containing the directory's contents:
total 0
drwxrwxrwx 1 user00 user00 4096 Nov 3 2022 .
drwxrwxrwx 1 user00 user00 4096 Nov 3 05:07 ..
-rwxrwxrwx 1 user00 user00 0 Nov 3 2022 a01_test.txt
-rwxrwxrwx 1 user00 user00 0 Nov 3 05:26 file_01.txt
-rwxrwxrwx 1 user00 user00 0 Nov 3 05:26 file_02.txt
-rwxrwxrwx 1 user00 user00 0 Nov 3 05:26 file_03.txt
-rwxrwxrwx 1 user00 user00 0 Nov 3 05:26 file_04.txt
-rwxrwxrwx 1 user00 user00 0 Nov 3 05:26 file_05.txt
Any assistance to make use of the variables PATH_UNIX & FILENAME_TEST to make a call to Linux/Unix to obtain a directory listing would be appreciated.
sprintf (?sprintf for further details) is a convenient way to create format strings that can subsequently be passed to system:
PATH_UNIX <- '/mnt/c/Users/user00/a01_TEST/'
FILENAME_TEST <- 'a01_test.txt'
cmdstr <- sprintf('bash -c \'ls -al %s > %s\'', PATH_UNIX, FILENAME_TEST)
message('bash command string = ', cmdstr)
system(command = cmdstr)
Expanding on the solution provided by br00t, and doing some testing, one could also use the paste0() function:
# DESIRED CMD TO BE PASSED VIA BASH
cat(paste0("system(bash -c \'ls -al ",PATH_UNIX," >",PATH_UNIX,FILENAME_TEST,"\')"))
# OUTPUT:
# system(bash -c 'ls -al /mnt/c/Users/user00/a01_TEST/ >/mnt/c/Users/user00/a01_TEST/a01_test.txt')
# PLACE DESIRED CMD IN A VAR:
cmdstr_test <- paste0("bash -c \'ls -al ",PATH_UNIX," > ",PATH_UNIX,FILENAME_TEST,"\'")
# CHECK VAR:
message('bash command string = ', cmdstr_test)
# OUTPUT:
# bash command string = bash -c 'ls -al /mnt/c/Users/user00/a01_TEST/ > /mnt/c/Users/user00/a01_TEST/a01_test.txt'
# RUN COMMAND USING system() function:
system(command = cmdstr_test)
# OUTPUT (Will get "0", if successful)
> system(command = cmdstr_test)
[1] 0
>

Snakemake: MissingInputException in snakemake pipeline

I'm trying a SnakeMake pipeline and I'm stucked on an error I really don't understand.
I've got a directory (raw_data) in which I have the input files :
ll /home/nico/labo/etudes/Optimal/data/raw_data
total 41M
drwxrwxr-x 2 nico nico 4,0K mars 6 16:09 ./
drwxrwxr-x 11 nico nico 4,0K mars 6 16:14 ../
-rw-rw-r-- 1 nico nico 15M févr. 27 12:21 sampleA_R1.fastq.gz
-rw-rw-r-- 1 nico nico 19M févr. 27 12:22 sampleA_R2.fastq.gz
-rw-rw-r-- 1 nico nico 3,4M févr. 27 12:21 sampleB_R1.fastq.gz
-rw-rw-r-- 1 nico nico 4,3M févr. 27 12:22 sampleB_R2.fastq.gz
This directory contains 4 files for 2 samples.
I created a config json file for the SnakeMake pipeline named config_snakemake_Optimal_mapping_BaL.json:
{
"fastqExtension": "fastq.gz",
"fastqDir": "/home/nico/labo/etudes/Optimal/data/raw_data",
"outputDir": "/home/nico/labo/etudes/Optimal/data/mapping_BaL",
"logDir": "logs",
"reference": {
"fasta": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta",
"index": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta.bwt"
}
}
And finally the SnakeMake file snakefile_bwa_samtools.py:
import subprocess
from os.path import join
### Globals ---------------------------------------------------------------------
# A Snakemake regular expression matching fastq files.
SAMPLES, = glob_wildcards(join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]))
print(SAMPLES)
### Rules -----------------------------------------------------------------------
# Pipeline output files
rule all:
input: expand(join(config["outputDir"], "{sample}.bam.bai"), sample=SAMPLES)
# Reads alignment on reference genome and BAM file creation
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1_ID = "{sample}_R1."+config["fastqExtension"],
fq2_ID = "{sample}_R2."+config["fastqExtension"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1_ID} and {input.fq2_ID} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
# Sorting the BAM files on genomic positions
rule bam_sort:
input:
join(config["outputDir"], "{sample}.bamUnsorted")
output:
join(config["outputDir"], "{sample}.bam")
log:
join(config["outputDir"], config["logDir"], "{sample}.samtools_sort.log")
version:
subprocess.getoutput(
"samtools --version | "
"head -1 | "
"cut -d' ' -f2"
)
message:
"Genomic sorting of {input} with samtools version {version}."
shell:
"samtools sort -f {input} {output} 2> {log}"
# Indexing the BAM files
rule bam_index:
input:
join(config["outputDir"], "{sample}.bam")
output:
join(config["outputDir"], "{sample}.bam.bai")
message:
"Indexing {input}."
shell:
"samtools index {input}"
I run this pipeline:
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
and I've got the following error outputs:
['sampleB', 'sampleA']
MissingInputException in line 18 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
Missing input files for rule bwa_mem_to_bam:
sampleB_R1.fastq.gz
sampleB_R2.fastq.gz
or depending the moment:
['sampleB', 'sampleA']
PeriodicWildcardError in line 40 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
The value _unsorted in wildcard sample is periodically repeated (sampleB_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.
The samples are correctly detected as they appear in the list (first line of kind of outputs) and I'm surely messing around with the wildcards in the rule bwa_mem_to_bam, but I really don't get why..
Any clue?
I quickly looked your code.
Why didn't the first one work out?
Look when you declare fq1_ID and fq1, same for sample 2. You didn't assign the same string. For fq1 you add a repertory for the file witch is not present for fq1_ID so snakemake is searching it in the workdir (current directory if -d option is not set) a file name with your string. Beacuse these variables are in input section.
So by removing the two fq1/2_ID, it will erase all files searching problems.
Hugo
Finally, I succed with the pipeline removing the fq1_ID and fq2_ID variables in the rule bwa_mem_to_bam and replacing in the message of the rule input.fq1_ID and input.fq2_ID by input.fq1 and input.fq2.
The message is less elegant, but the pipeline is running correctly. Still doesn't understand exactly where was the mistake, if someone can explain, I'm still listening!
The correct code for rule bwa_mem_to_bam:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1} and {input.fq2} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
Thanks Hugo for checking my code and your explanation, it makes sense!
I finally get a flash idea waking up this morning (the best ones), and realized that I neglected the params part of the rule, fq1_ID and fq2_ID are not inputs but params..
I changed the code to that:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1.fastq.gz"),
fq2 = join(config["fastqDir"], "{sample}_R2.fastq.gz")
output:
temp(join(config["outputDir"],"{sample}_unsorted.bam"))
params:
fq1_ID = "{sample}_R1.fastq.gz",
fq2_ID = "{sample}_R2.fastq.gz",
ref_ID = os.path.basename(config["reference"]["fasta"])
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {params.fq1_ID} and {params.fq2_ID} on {params.ref_ID} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
And it works just fine!
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
Provided cores: 3
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 bam_index
2 bam_sort
2 bwa_mem_to_bam
7
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
1 of 7 steps (14%) done
Genomic sorting of sampleB_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB_unsorted.bam.
2 of 7 steps (29%) done
Indexing sampleB.bam.
3 of 7 steps (43%) done
4 of 7 steps (57%) done
Genomic sorting of sampleA_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA_unsorted.bam.
5 of 7 steps (71%) done
Indexing sampleA.bam.
6 of 7 steps (86%) done
localrule all:
input: /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB.bam.bai, /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA.bam.bai
7 of 7 steps (100%) done
And finally get my correct messages:
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on
BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta
with BWA version 0.7.12.

Filter output of 'ps aux'

running ps aux returns :
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -bash
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -test
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -test1
Attempting to filter on bash with wildcards so just
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -bash
is returned :
ps aux|grep "*bash*"
which returns :
invalid option :
grep: invalid option -- 'p'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
How to filter the output for bash ?
You should just use ps aux|grep 'bash' and it will work the way you want.The * when used in the grep command actually refers to the regex repetition operator of "zero or more" , not the * wildcard character.
ps aux | grep bash | grep -v bash
to return all bash process
Some versions of ps support this directly. For example, to list all processes whose name is bash, run ps like this:
ps -C bash

Hung parallel processes in R: icc vs gcc

I've noticed strange behaviour with launching parallel processes in R that only appears when R is built with icc. The spawned parallel processes are not killed when the main process ends.
Example code is as follows:
library(foreach)
library(doMC)
registerDoMC(cores=4)
d <- rep(1,16)
t <- foreach(i=1:4, .combine=c) %dopar% {
s <- foreach(1:4, .combine=c) %do% 1*1
}
identical(t, d)
Here we see the 4 spawned process are orphaned at the completion of the script.
build$ Rscript HungRProcs.R
Loading required package: iterators
Loading required package: parallel
[1] TRUE
build$ ps -elf | grep R
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
1 S root 1427 2 0 80 0 - 0 worker May15 ? 00:00:00 [SCIF INTR 0]
0 S build 19173 26999 0 80 0 - 35960 poll_s 12:27 pts/1 00:00:00 vim RStats-3.0.3-dw.spec
1 S walling 24425 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24426 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24427 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24428 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
0 R walling 24430 21882 0 80 0 - 27561 - 13:11 pts/5 00:00:00 ps -elf
0 S walling 24431 21882 0 80 0 - 25814 pipe_w 13:11 pts/5 00:00:00 grep R
The configure used for the icc build is as follows:
build$ ./configure --prefix=/home1/00157/walling/software/R-3.1.0 CC=icc F77=ifort FC=ifort CXX=icpc
If built with gcc, the spawned processes are terminated when the main process completes. The configure used for the gcc build is as follows:
build$ ./configure --prefix=/home1/00157/walling/software/R-3.1.0 CC=gcc F77=gfortran FC=gfortran CXX=gcc
I have run tests against both R 3.0.3 and 3.1.0, different parallel backends via doMC, doSNOW and straight mclapply. I've also tested with multiple versions of the GNU compiler and Intel compiler and on both Centos 5.10 and 6.5. All tests cases have resulted in the same behaviour.
Any ideas why the compiler would affect proper termination of spawned sub-processes?

How to properly grep filenames only from ls -al

How do I tell grep to only print out lines if the "filename" matches when I'm piping through ls? I want it to ignore everything on each line until after the timestamp. There must be some easy way to do this on a single command.
As you can see, without it, if I searched for the file "rwx", it would return not only the line with rwx.c, but also the first three lines because of permissions. I was going to use AWK but I want it to display the whole last line if I search for "rwx".
Any ideas?
EDIT: Thanks for the hacks below. However, it would be great to have a more bug-free method. For example, if I had a file named "rob rob", I wouldn't be able to use the stated solutions.
drwxrwxr-x 2 rob rob 4096 2012-03-04 18:03 .
drwxrwxr-x 4 rob rob 4096 2012-03-04 12:38 ..
-rwxrwxr-x 1 rob rob 13783 2012-03-04 18:03 a.out
-rw-rw-r-- 1 rob rob 4294 2012-03-04 18:02 function1.c
-rw-rw-r-- 1 rob rob 273 2012-03-04 12:54 function1.c~
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rwx.c
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
The following will list only file name, and one file in each row.
$ ls -1
To include . files
$ ls -1a
Please note that the argument is number "1", not letter "l".
Why don't you use grep and match the file name following the timestamp?
grep -P "[0-9]{2}:[0-9]{2} $FILENAME(\.[a-zA-Z0-9]+)?$"
The [0-9]{2}:[0-9]{2} is for the time, the $FILENAME is where you'd put rob rob or rwx, and the trailing (\.[a-zA-Z0-9]+)? is to allow for an optional extension.
Edit: #JonathanLeffler below points out that when files are older than bout 6 months the time column gets replaced by a year - this is what happens on my computer anyhow. You could do ([0-9]{2}:[0-9]{2}|(19|20)[0-9]{2}) to allow time OR year, but you may be best of using awk (?).
[foo#bar ~/tmp]$ls -al
total 8
drwxrwxr-x 2 foo foo 4096 Mar 5 09:30 .
drwxr-xr-- 83 foo foo 4096 Mar 5 09:30 ..
-rw-rw-r-- 1 foo foo 0 Mar 5 09:30 foo foo
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 rwx.c
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 tmp
[foo#bar ~/tmp]$export filename='foo foo'
[foo#bar ~/tmp]$echo $filename
foo foo
[foo#bar ~/tmp]$ls -al | grep -P "[0-9]{2}:[0-9]{2} $filename(\.[a-zA-Z0-9]+)?$"
-rw-rw-r-- 1 cha66i cha66i 0 Mar 5 09:30 foo foo
(You could additionally extend to matching the whole line if you wanted:
^ # start of line
[d-]([r-][w-][x-]){3} + # permissions & space (note: is there a 't' or 's'
# sometimes where the 'd' can be??)
[0-9]+ # whatever that number is
[\w-]+ [\w-]+ + # user/group (are spaces allowed in these?)
[0-9]+ + # file size (modify for -h switch??)
(19|20)[0-9]{2}- # yyyy (modify if you want to allow <1900)
(1[012]|0[1-9])- # mm
(0[1-9]|[12][0-9]|3[012]) + # dd
([01][0-9]|2[0-3]):[0-6][0-9] +# HH:MM (24hr)
$filename(\.[a-zA-Z0-9]+)? # filename & optional extension
$ # end of line
. You get the point, tailor to your needs.)
Assuming that you aren't prepared to do:
ls -ld $(ls -a | grep rwx)
then you need to exploit the fact that there are 8 columns with space separation before the file name starts. Using egrep (or grep -E), you could do:
ls -al | egrep "^([^ ]+ +){8}.*rwx"
This looks for 'rwx' after the 8th column. If you want the name to start with rwx, omit the .*. If you want the name to end with rwx, add a $ at the end. Note that I used double quotes so you could interpolate a variable in place of the literal rwx.
This was tested on Mac OS X 10.7.3; the ls -l command consistently gives three columns for the date field:
-r--r--r-- 1 jleffler staff 6510 Mar 17 2003 README,v
-r--r--r-- 1 jleffler staff 26676 Mar 3 21:44 ccs.nmd
Your ls -l seems to be giving just two columns, so you'd need to change the {8} to {7} for your machine - and beware migrating between systems.
Well, if you're working with filenames that don't have spaces in them, you could do something like this:
grep 'rwx\S*$'
Aside frrm the fact that you can use pattern matching with ls, exaple ksh and bash,
which is probably what you should do, you can use the fact that filename occur in a
fixed position. awk (gawk, nawk or whaever you have) is a better choice for this.
If you have to use grep it smells like homework to me. Please tag it that way.
Assume the filename starting position is based on this output from ls -l in linux: 56
-rwxr-xr-x 1 Administrators None 2052 Feb 28 20:29 vote2012.txt
ls -l | awk ' substr($0,56) ~/your pattern even with spaces goes here/'
e.g.,
ls -l | awk ' substr($0,56) ~/^val/'
will find files starting with "val"
As a simple hack, just add a space before your filename so you don't match the beginning of the output:
ls -al | grep '\srwx'
Edit: OK, this is not as robust as it should be. Here's awk:
ls -l | awk ' $9 ~ /rwx/ { print $0 }'
This works for me, unlike ls -l & others as some folks pointed out. I like this because its really generic & gives me the base file name, which removes the path names before the file.
ls -1 /path_name |awk -F/ '{print $NF}'
Only one command you needed for this --
ls -al | gawk '{print $9}'
You can use this:
ls -p | grep -v /
this is super old, but i needed the answer and had a hard time finding it. i didn't really care about the one-liner part; i just needed it done. this is down and dirty and requires that you count the columns. i'm not looking for an upvote here, just leaving some options for future searcher-ers.
the helpful awk trick is here -- Using awk to print all columns from the nth to the last
if
YOUR_FILENAME="rob rob"
and
WHERE_FILENAMES_START=8
ls -al | while read x; do
y=$(echo "$x" | awk '{for(i=$WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')
[[ "$YOUR_FILENAME " = "$y" ]] && echo "$x"
done
if you save it as a bash script and swap out the vars with $2 and $1, throw the script in your usr bin... then you'll have your clean simple one-liner ;)
output will be:
> -rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
the question was for a one-liner so...
ls -al | while read x; do [[ "$YOUR_FILENAME " = "$(echo "$x" | awk '{for(i=WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')" ]] && echo "$x" ; done
(lol ;P)
on another note: mathematical.coffee your answer was rad. it didn't solve my version of this problem, so i didn't upvote, but i liked your regex breakdown :D

Resources