What is the syntax to pass a list to ack? - ack

I have installed strawberry-perl on my windows 7 laptop.
I want to search for a list of words using grep or ack on windows.
I have been able to perform basic searches using ack, but I just don't know how to pass a list of words to ack So I want to pass a list of words to ack and find out the line numbers where these words occur. Both words do not have to occur on the same line.
For example, if I am searching for "doll" and "house", I could have doll on line 12 and house on line 244.
I tried something like this ack "doll"|"house" but it throws the following error.
Expressions are only allowed as the first element of a pipeline.
At line:1 char:22
+ ack "doll" | "house" <<<<
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : ExpressionsMustBeFirstInPipeline

So you're saying you want to find all the files where "doll" appears and where "house" appears. So here's what you do.
First, find all the files that contain "doll", and print a list of them.
ack -l doll
Then search that list of files for the word "house".
ack house $(ack -l doll)
Or, if you're not running a shell that lets you do that (i.e. Windows), you can do:
ack -l doll | ack -x house
ack -x says "Take the list of files to search from STDIN".

Related

Exact limit to avoid "Argument list too long "

I know that there are some tricks to avoid the shell's limit, which leads to "argument list too long", but I want to understand why the limit hits in my case (even though it should not). As far as I know the limit of chars in an argument for a command should be able to be determined by the following steps:
Get the maximum argument size by getconf ARG_MAX
Subtract the content of you environment retrieved by env|wc -c
On my machine with Fedora 30 and zsh 5.7.1 this should allow me argument lists with a length of up to 2085763 chars. But I already hit the limit with only 1501000 chars. What did I miss in my calculation?
Minimal working example for reproduction:
Setting up files:
$ for i in {10000..100000}; do touch Testfile_${i}".txt"; done
$ ls Testfile*
zsh: argument list too long: ls
No I deleted stepwise (1000 files per step) files to check when the argument line was short enough to be handled again
for i in {10000..100000..1000}; do echo $(ls|wc -l); rm Testfile_{$i..$((i + 1000))}.txt; ls Testfile_*|wc -l; done
The message zsh: argument list too long: ls stops between 79000 and 78000 remaining files. Each filename has a length of 18 chars (19, including the separating whitespace), so in total at this moment the argument line should have a total length of 79000*19=1501000 respectively 78000*19=1482000 chars.
This result is the same magnitude in comparison to the expected value of 2085763 chars but still it's slighty off. What could explain the difference of 500000 chars?
ADDENDUM1:
Like suggested in the comments I ran xargs --show-limits and the output fits round about my expectation.
$ xargs --show-limits
Your environment variables take up 4783 bytes
POSIX upper limit on argument length (this system): 2090321
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2085538
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
ADDENDUM2:
Following the comment of #Jens I now added 9 Bytes additional overhead to the words (8 Bytes for the pointer, 1 for the terminating NUL-Byte). Now I get the following results (I do not know, how the whitespace is handled, for the moment I leave it out):
79000*(18+9)= 2133000
78000*(18+9)= 2106000
Both values are much closer to the theoretical limit than before...indeed, they are even a bit above it. So together with some safety margin I'm more confident to preestimate the maximal argument length.
Further reading:
There are more posts about this topic, of which none answers the question in a satisfying way, but still they provide good material:
https://unix.stackexchange.com/a/120842/211477
https://www.in-ulm.de/~mascheck/various/argmax/
If you where looking to count files, this worked for me on Mac OSX Ventura (13.1)
find . -maxdepth 2 -name "*.zip" | wc -l
I had 1039999 zip files and the standard "ls /.zip | wc -l" just died ("zsh: argument list too long: ls")

What is a grep command performs?

Im trying to understand this unix command but im not quite an expert on this, could someone explain it more in detail?
grep '^.\{167\}02'
What does it perform?
Found line(s) which starts (^) from any (.) 167 symbols which has been followed by 02.
From the man page (man grep)
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
Check the part in bold: if you don't specify the files you want to search in, it will just wait and listen to your keyboard input and do a regex match for each new line that you type.
If you want to test it, I suggest you using an easier regex, maybe with less characters like this one: ^.\{3\}02 and see what happens:
$ grep '^.\{3\}02'
02
002
0002
00002 <-- this matches and will later be printed and highlighted
00002
You don't normally use grep and type lines yourself to see if matches, but give it files as argument, or another input using the pipe:
ls -la | grep '^.\{167\}02'

Ack — Ignoring multiple directories without repeating the flag

Is it possible to ignore multiple directories in Ack, without repeating the flag?
e.g. I know the following works (i.e. setting multiple flags):
ack --ignore-dir=install --ignore-dir=php 'teststring'
I was hoping that I could separate directories with commas, like I can do with the extensions as follows:
ack --ignore-file=ext:css,scss,orig 'teststring'
However, the following comma separated ignore flag doesn't work:
ack --ignore-dir=install,php 'textstring'
Is it possible to use some short-hand equivalent, so I don't have to repeatedly type out the --ignore-dir flag?
It's actually similar to how you would specify the include patterns for grep:
ack <term> --ignore-dir={dir_a,dir_b}
However, This format does not work with a single directory. So
ack <term> --ignore-dir={log}
will not work.
Since you're using ack 2, you can put --ignore-dir=install and --ignore-dir=php in a .ackrc file in the root of your project. Then, every ack invocation in that tree will use those flags.
So to ignore single directory use
ack <term> --ignore-dir=dir_a
and to ignore multiple directories use
ack <term> --ignore-dir={dir_a,dir_b}
One approach could be to select those directories to exclude with a regular expression in -G option complemented with the option --invert-file-match. Based in your question, something like the following:
ack -a -G 'install|php' --invert-file-match 'textstring' .

List files that contains telephone numbers in a unix directory

i have a directory called testDir and it contains 1000 file, some of them contains telephone numbers and some of them doesn't, the telephone number format is "12-3456789"
how to get the number of files that contains telephone numbers ?
EDIT: i am not familiar with unix, so i couldn't answer the question.
A simple solution could be:
grep -lE "[0-9]{2}-[0-9]{7}" * | wc -l
EDIT:
grep seeks for pattern in files.
-E activates regular expressions (you could use egrep instead)
-l filters grep results, only the file name will be printed
wc counts
-l lines will be count (-w counts words, but it could provide incorrect results in case of spaces in filenames)

Can I create an ack file type based on a filename, not extension?

I would like to include files with a specific name -- not an extension -- in my ack search. Is this possible with ack?
If I use the -G option, this would exclude all other file types. (So I can't put it in my .ackrc file.)
I tried to use --type-set mytype=filename.txt but this only works for extensions, so this would search for files including the pattern .filename.txt, thus not find filename.txt. (That's also what the ack --help types shows: --mytype .filename.txt, not --mytype filename.txt.)
Someone any ideas?
man ack says that the files to be searched in can be given through standard input.
So this should work:
find . -name filename.txt | ack PATTERN -
Unfortunately, it doesn't. It gives ack: Ignoring 1 argument on the command-line while acting as a filter., which apparently is a bug in ack. When this bug will be fixed, we should be able to use
find . -name filename.txt | ack --nofilter PATTERN -
You also do it like if you're using zsh:
ack 'Pattern' **/filename.txt
What you're asking is "can I make a filetype that ack recognizes based on a filename", and the answer is "No, not in ack 1.x, but you can in ack 2.0". ack 2.0 is in alpha release, and we hope to have a beta by Christmas.
As #Christian pointed out above, you can specify the given filename on the command line, but that bypasses filetype checking entirely.
I know this is a late reply, but could you simply specify the filename when you run ack?
ack 'My Text' filename.txt

Resources