Another grep advanced - unix

Q1. I want to grep something like that:
grep -Ir --exclude-dir="some*dirs" "my-text" ~/somewhere
but I don't want to show the whole strings containing "my-text", I want to see only list of files.
Q2. I want to see list of files containing "my-text" but not containing "another-text". How to do that?
Sorry, but I could not find the answer in man grep, neither in google.

Q1. You mustn't have googled very hard on that one.
man grep
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
Q2. Unless you expect both patterns to be on the same line, you'll need multiple invocations of grep. Something like:
$ grep -l my-text | xargs grep -vl another-text

Related

Unix Text Processing - how to remove part of a file name from the results?

I'm searching through text files using grep and sed commands and I also want the file names displayed before my results. However, I'm trying to remove part of the file name when it is displayed.
The file names are formatted like this: aja_EPL_1999_03_01.txt
I want to have only the date without the beginning letters and without the .txt extension.
I've been searching for an answer and it seems like it's possible to do that with a sed or a grep command by using something like this to look forward and back and extract between _ and .txt:
(?<=_)\d+(?=\.)
But I must be doing something wrong, because it hasn't worked for me and I possibly have to add something as well, so that it doesn't extract only the first number, but the whole date. Thanks in advance.
Edit: Adding also the working command I've used just in case. I imagine whatever command is needed would have to go at the beginning?
sed '/^$/d' *.txt | grep -P '(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)' *.txt --colour -A 1
The results look like this:
aja_EPL_1999_03_02.txt:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
A desired output would be this:
1999_03_02:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
First off, you might want to think about your regular expression. While the one you have you say works, I wonder if it could be simplified. You told us:
(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)
It looks to me as if this is intended to match lines that start with a case insensitive "PALL", possibly preceded by any number of other characters that start with a capital letter, and that lines must not end in a backslash or a dot. So valid lines might be any of:
PALLILENNUD : korraga üritavad etc etc
Õlu on kena. Do I have appalling speling?
Peeter Pall is a limnologist at EMU!
If you'd care to narrow down this description a little and perhaps provide some examples of lines that should be matched or skipped, we may be able to do better. For instance, your outer parentheses are probably unnecessary.
Now, let's clarify what your pipe isn't doing.
sed '/^$/d' *.txt
This reads all your .txt files as an input stream, deletes any empty lines, and prints the output to stdout.
grep -P 'regex' *.txt --otheroptions
This reads all your .txt files, and prints any lines that match regex. It does not read stdin.
So .. in the command line you're using right now, your sed command is utterly ignored, as sed's output is not being read by grep. You COULD instruct grep to read from both files and stdin:
$ echo "hello" > x.txt
$ echo "world" | grep "o" x.txt -
x.txt:hello
(standard input):world
But that's not what you're doing.
By default, when grep reads from multiple files, it will precede each match with the name of the file from whence that match originated. That's also what you're seeing in my example above -- two inputs, one x.txt and the other - a.k.a. stdin, separated by a colon from the match they supplied.
While grep does include the most minuscule capability for filtering (with -o, or GNU grep's \K with optional Perl compatible RE), it does NOT provide you with any options for formatting the filename. Since you can'd do anything with the output of grep, you're limited to either parsing the output you've got, or using some other tool.
Parsing is easy, if your filenames are predictably structured as they seem to be from the two examples you've provided.
For this, we can ignore that these lines contain a file and data. For the purpose of the filter, they are a stream which follows a pattern. It looks like you want to strip off all characters from the beginning of each line up to and not including the first digit. You can do this by piping through sed:
sed 's/^[^0-9]*//'
Or you can achieve the same effect by using grep's minimal filtering to return every match starting from the first digit:
grep -o '[0-9].*'
If this kind of pipe-fitting is not to your liking, you may want to replace your entire grep with something in awk that combines functionality:
$ awk '
/[\.]$/ {next} # skip lines ending in backslash or dot
/^([A-ZÖÄÜÕŠŽ].*)?PALL/ { # lines to match
f=FILENAME
sub(/^[^0-9]*/,"",f) # strip unwanted part of filename, like sed
printf "%s:%s\n", f, $0
getline # simulate the "-A 1" from grep
printf "%s:%s\n", f, $0
}' *.txt
Note that I haven't tested this, because I don't have your data to work with.
Also, awk doesn't include any of the fancy terminal-dependent colourization that GNU grep provides through the --colour option.

Unix grep duplicate vowels search

I am stuck on a homework question. The question asks to display the lines, with grep and I can't use -w option, that contain no duplicate vowels.
My teacher said to find the grep command that could display two or more 'a's in a line which would, I think, be grep 'a.*a' file and then find the grep command that would display two or more 'u's which, I think, would be grep 'u.*u' file, combine them and then I should be able to get it. But I don't know how I would combine the grep commands.
You can combine different regular expressions with |:
grep 'a.*a|e.*e|i.*i|o.*o|u.*u' file

Grep: Recursive option produces unexpected behavior when fed pipe-input

I've been using this utility successfully for many years, in many environemnts. But I'm noticing that on one particular environment, it produces very unexpected results.
grep -r 'search-term1' . | grep 'search-term2'
The above code greps recursively for all instances of search-term1, in the current-dir. The results are then piped to another grep, which selects only those lines that also contain search-term2. This works exactly as I would expect.
grep -r 'search-term1' . | grep -r 'search-term2'
The only difference in the above code is that the -r recursive flag in specified in both grep commands. I would expect the behavior to not change for this particular case. After all, the input to the 2nd grep is a pipe-input, and there's nothing further to be found recursively.
I have been using the command successfully, for many years, in many different environments (both unix and mac-os). However, the most recent environment that I started working in (unix), breaks the above behavior. The second piped grep searches for all instances of search-term2, not only in the piped-input, but also all files in my current directory. Because of this, instead of getting only results that contain both search-terms, I get all results in current-dir that contain the 2nd search term.
Is there any reason why this one particular environment produces this odd behavior? Is there any way I can avoid this, while still preserving the -r flag?
FAQ:
Q: Why am I using the -r flag on a piped input?
Ans: I actually have grep saved as an alias, with many different options and flags that I always want to use as a default. The recursive flag is one of them. I would like to always use this alias, instead of having to type out all the flags every time.
Q: If you want to search for all instances matching both search terms, why not do (insert-superior-method-here) instead?
Ans: You're probably right. I'm sure there are things I can change in my usual habits that would workaround this issue. However, as intellectual curiosity, I would like to find out why recursive-greps-on-pipes work as intended on most environments, but not all, and if that can somehow be resolved.
The -r flag to grep changed in grep version 2.11 (release notes to implicitly use the working directory as the input if no file arguments are given.
If no file operand is given, and a command-line -r or equivalent
option is given, grep now searches the working directory.
You aren't giving the second grep any file arguments so it defaults to the current directory despite there being pipe input.
Try grep -r 'search-term1' . | grep -r 'search-term2' - as a workaround.
grep -r 'search-term1' . | grep -r -d skip 'search-term2' may also work around the problem.

how can I highlight just one item from the ls output

real beginner in Unix commands so not sure if the following is actually possible but here goes.
Is it possible to highlight just one item in a ls output?
I.e.: in a directory I use the following
ls -l --color=auto
this lists 4 items in green
file1.xls
file2.xls
file3.xls
file4.xls
But I want to highlight a specific item, in this case file2.
Is this possible?
The ls program will not do this for you. But you could filter the results from ls through a custom script which modifies the text to highlight just one item. It would be simpler if no color was originally given; then you could match on the given filename (for example as the pattern in an awk script, or in a sed script) and modify just that one item, adding colors.
That is, certainly it is possible. Writing a sample script is a different question.
How you approach the problem depends on what you want from the output. If that is (literally) the output from ls with a single filename in color, then a script would be the normal approach. You could use grep as suggested in the other answer, which raises a few issues:
commenting on ls -l --color=auto makes it sound as if you are using GNU ls, hence likely using Linux. An appropriate tag for the question would be linux rather than unix. If you ask for unix, the answers should differ.
supposing that you are using Linux. Then likely you have GNU grep, which can do colors. That would let you do something like this:
ls -l | grep --color=always file2 |less -R
however, there is a known bug in GNU grep's use of color (see xterm FAQ "grep --color" does not show the right output).
using grep like this shows only the matching lines. For ls that might be a good choice. For matches in a manual page -- definitely not.
Alternatively, less (which is found more often on Unix systems than GNU grep) also can highlight matches (not in color) and would show the file you are looking for in context. You could do this:
ls -l | less -p file2
(Both grep and less use patterns aka regular expressions, but I left the example simple — read the documentation to learn more).
If you're a beginner I would strongly suggest you learn the grep command if you want to filter results - A Unix users best friend (mine anyway)
Use grep to only display the list items you want to see...
ls- l | grep "file2"
NOTE: This is no different to typing ls -l file2 by the way but your pattern could be expanded based on what you actually want displayed on the screen.
So if you had a directory full of files ".txt", ".xls", ".doc" and you wanted to only see ".doc" with the word "work" in the name (work1.doc) you could write:
ls -ls | grep "work" | grep "txt"
This would list work1.txt, work2.txt, work3.txt and so on.
This is a very basic example but I use grep extensively whilst in the unix shell and would advise using this to filter all results instead of colours.
A little side note using grep -v will show you everything but the pattern you give it
ls -l | grep -v ".txt" will show everything BUT .txt files.

Is it feasible to narrow down the result returned by ls() with grep in R, much like the `ls -l | grep` command in UNIX?

In Terminal/shell script, you can list all files in the current directory with ls -l, and then pipe it to execute an additional command. For example, ls -l | grep -i "calc" returns all files whose filename includes calc. In R, you can list all objects currently stored in the workspace, with ls() command.
However, I want to do narrow down the list returned by ls() with something like the grep feature in R, where the input is the returned list by ls() and the output is the list narrowed down by grep (or something), much like the UNIX pipe feature I mentioned above. Is it feasible to do it in R?
Also, is it also feasible to narrow down the list by xargs-like functionality in R? So I like to get only the objects on which the literal includes if, so that if a function on the list returned by ls() includes the if-else condition inside it, I want to display the function in console. You can do it in Terminal with find . | xargs grep "if" (of course those are files in the current directory, not an R object in workspace, but I showed it just the purpose of illustration).
Note that this is not a post on how to call shell commands from within R. It's not what I want to do.
I use OS X 10.9.3 and R 3.1.0.
ls() has a pattern parameter that might be what you need:
pattern an optional regular expression. Only names matching pattern
are returned. glob2rx can be used to convert wildcard patterns
to regular expressions.
For the second part of your question, you could use capture.output(getAnywhere()) and grep to look inside function source. You'll need to pass in the functions to that and I'd make that whole operation a function to keep the implementation clean.
You can do
grep("calc",list.files(),value=TRUE)
which should "emulate" ls -l | grep -i "calc". See ?list.files and grep.

Resources