How do I use grep to find words of specified or unspecified length? - unix

In the Unix command line (CentOS7) I have to use the grep command to find all words with:
At least n characters
At most n characters
Exactly n characters
I have searched the posts here for answers and came up with grep -E '^.{8}' /sample/dir but this only gets me the words with at least 8 characters.
Using the $ at the end returns nothing. For example:
grep -E '^.{8}$' /sample/dir
I would also like to trim the info in /sample/dir so that I only see the specific information. I tried using a pipe:
cut -f1,7 -d: | grep -E '^.{8}' /sample/dir
Depending on the order, this only gets me one or the other, not both.
I only want the usernames at the beginning of each line, not all words in each line for the entire file.
For example, if I want to find userids on my system, these should be the results:
1.
tano-ahsoka
skywalker-a
kenobi-obiwan
ahsoka-t
luke-s
leia-s
ahsoka-t
kenobi-o
grievous
I'm looking for two responses here as I have already figured out number 1.
Numbers 2 and 3 are not working for some reason.
If possible, I'd also like to apply the cut for all three outputs.
Any and all help is appreciated, thank you!

You can run one grep for extracting the words, and another for filtering based on length.
grep -oE '(\w|-)+' file | grep -Ee '^.{8,}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{,8}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{8}$'
Update the pattern based on requirements and maybe use -r and specify a directory instead of a file. Adding -h option may also be needed to prevent the filenames from being printed.

Depending on your implementation of grep, it might work to use:
grep -o -E '\<\w{8}\>' # exactly 8
grep -o -E '\<\w{8,}\>' # 8 or more
grep -o -E '\<\w{,8}\>' # 8 or less

Related

how to list a specific string or number in a file in Unix

for Example if your file has following lines
1=10200|2=2343i|3=otit|5=89898|54=9546i96i|10=2459
1=10200|54=9546i96i|10=2459|2=2343i|3=otit|5=8
1=10200|5=IGY|14=897|459=122|132=1|54=9546i96i|10=2459
1=10200|2=2343i|5=0|54=9546i96i
The output should be
5=89898
5=8
5=IGY
5=0
You could use grep with the -o flag to return only the regexp matches.
Assuming you have a file.txt that you want to parse:
cat file.txt | grep -o -E "(\||^)5=[^|]*" | grep -o "5=[^|]*"
This will match anything that starts with 5= up until the first |.
By running this command on the input you provided I get:
5=89898
5=8
5=IGY
5=0
Cheers
Edit: as Walter A suggested, my previous solution did not cover all cases.
I have added an extra parsing step: first, you get all strings that match 5=... at the start of a line, or |5=..., and then you remove the |.
Use (^|[|]) for matching start of field (start of line or |) and remember/match string until next | or end-of-line.
sed -nr 's/.*(^|[|])(5=[^|]*).*/\2/p' file

grepping lines from a document using xargs

Let's say I have queries.txt.
queries.txt:
cat
dog
123
now I want to use them are queries to find lines in myDocument.txt using grep.
cat queries.txt | xargs grep -f myDocument.txt
myDocument has lines like
cat
i have a dog
123
mouse
it should return the first 3 lines. but it's not. instead, grep tries to find them as file names. what am i doing wrong?
Here, you just need:
grep -f queries.txt myDocument.txt
This causes grep to read the regular expressions from the file queries.txt and then apply them to myDocument.txt.
In the xargs version, you were effectively writing:
grep -f myDocument.txt cat dog 123
If you absolutely must use xargs, then you'll need to write:
xargs -I % grep -e % myDocument.txt < queries.txt
This avoids a UUOC — Useless Use of cat – award by redirecting standard input from queries.txt. It uses the -I % option to specify where the replacement text should go in the command line. Using the -e option means that if the pattern is, say --help, you won't run into problems with (GNU) grep treating that as an argument (and therefore printing its help message).
The grep -e option will take a pattern string as an argument. -f treats the argument as a file name of a file with patterns in it.

Unix Pipes for Command Argument [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Read expression for grep from standard input
(1 answer)
Closed last month.
I am looking for insight as to how pipes can be used to pass standard output as the arguments for other commands.
For example, consider this case:
ls | grep Hello
The structure of grep follows the pattern: grep SearchTerm PathOfFileToBeSearched. In the case I have illustrated, the word Hello is taken as the SearchTerm and the result of ls is used as the file to be searched. But what if I want to switch it around? What if I want the standard output of ls to be the SearchTerm, with the argument following grep being PathOfFileToBeSearched? In a general sense, I want to have control over which argument the pipe fills with the standard output of the previous command. Is this possible, or does it depend on how the script for the command (e.g., grep) was written?
Thank you so much for your help!
grep itself will be built such that if you've not specified a file name, it will open stdin (and thus get the output of ls). There's no real generic mechanism here - merely convention.
If you want the output of ls to be the search term, you can do this via the shell. Make use of a subshell and substitution thus:
$ grep $(ls) filename.txt
In this scenario ls is run in a subshell, and its stdout is captured and inserted in the command line as an argument for grep. Note that if the ls output contains spaces, this will cause confusion for grep.
There are basically two options for this: shell command substitution and xargs. Brian Agnew has just written about the former. xargs is a utility which takes its stdin and turns it into arguments of a command to execute. So you could run
ls | xargs -n1 -J % grep -- % PathOfFileToBeSearched
and it would, for each file output by ls, run grep -e filename PathOfFileToBeSearched to grep for the filename output by ls within the other file you specify. This is an unusual xargs invocation; usually it's used to add one or more arguments at the end of a command, while here it should add exactly one argument in a specific place, so I've used -n and -J arguments to arrange that. The more common usage would be something like
ls | xargs grep -- term
to search all of the files output by ls for term. Although of course if you just want files in the current directory, you can this more simply without a pipeline:
grep -- term *
and likewise in your reversed arrangement,
for filename in *; do
grep -- "$#" PathOfFileToBeSearched
done
There's one important xargs caveat: whitespace characters in the filenames generated by ls won't be handled too well. To do that, provided you have GNU utilities, you can use find instead.
find . -mindepth 1 -maxdepth 1 -print0 | xargs -0 -n1 -J % grep -- % PathOfFileToBeSearched
to use NUL characters to separate filenames instead of whitespace

"grep | xargs grep" with search conditions on different strings

I want to grep files that contain text "wp_" but do not contain text "wp3_". E.g. I've got a file with two strings:
wp_123
wp3_123
I try $ grep -lr wp_ ~/tmp | xargs grep -vl wp3_
It outputs this file name! But if I remove the linebreak, it's working like I want, i.e. handles string "wp_123 wp3_123" correctly.
How to make it work with search conditions on different strings?
P.S. Sorry for kind of duplicate, but seems that nobody noticed my comment during last hour...
This should work
$ grep -lr 'wp_' ~/tmp | xargs grep -L 'wp3_'
The single quotes are not necessary in this case, but are a good habit to prevent pattern characters from being interpreted by the shell. In your original attempt, -vl means "print each file with at least one line that does not match". Here, -L means "print each file with no lines that match".

Returning Data Until N'th Occurence of Character X

Withing a directory I have multiple files that have multiple version numbers within the files. I am grepping each file within the directory for these version numbers, sorting them in order to get the most recent version number, and then piping that into 'tail -1' to only the most recent version number and not every grep result.
The data looks something like this:
file1: asdf garbage 1.2.4.1 garbagetext asdf
file2: fsdaf garbage asdfsda 4.3.2.10 fdsaf
and so on. I have already accomplished extracting the most recent version number. I did this with the following:
grep -o '[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}' * | sort | tail -1
The next part is what I am having trouble on. I am trying to extract the number (whether it be one number character or two number characters) before the first period and return that result. Then, I am assuming with a slightly different command do the same thing but for the number after the first period. And again for the number after the second period and finally after the third period.
I have little to no experience with sed or awk, but after a little research I believe either one of these tools are the way to accomplish this.
Thank you!
Edit: Alright I got it, but I am certain this can be done in a much easier way. What I have is the following:
grep -o '[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}\.[0-9]\{1,\}' * | sort | tail -1 | grep -o '[0-9]\{1,\}' | sed -n 2p
or sed -n 1p, 3p, 4p depending on which value I want.
to get the lastest version number:
grep -P -o "\d+\.\d+\.\d+\.\d+" * |sed 's/.*://g'|awk -F'.' '{v[$0]=($1"."$2$3$4)+0;}END{m=0;for(x in v)if(v[x]>m){m=v[x];n=x;}print n}'
to extract numbers:
kent$ echo "10.2.30.4"|awk -F'.' -v OFS="\n" '$1=$1'
10
2
30
4
you can put the two line together.
To extract a version number, without having to know how many dots are in it, I would use
grep -o '[0-9.]\+' filename | sort --version-sort | tail -1
(assuming you have GNU sort, with the --version-sort option)
To get just the major version number, pipe the above into one of
sed 's/\..*//'
while read line; do echo ${line%%.*}; done

Resources