What does ''r:|[._-]=* r:|=*' accomplish in ZSH Completion matching? - zsh

Does r:|[._-]=* r:|=* accomplish anything in ZSH Completion matching? Aren't we already matching forward on the right hand side?
I've tried a few examples with it on and off and it seems like it accomplishes nothing
Completion already completes forward in the right hand side direction, anything that originally matches seems to still be matching with or without that pattern matching list from some trials? I can't think of an example where the matching occurs with the extra rule and no matching occurs without the extra rule.

Related

how access specific part of data as an input of AWK

Suppose I want to access an online dictionary and need to look for a specific word. I just like to have the specific part of data, which is those related to word and its translation as input of AWK,any idea?
In other words, I just want to have on my machine a margin of data, How can I prevent downloading all the data and hopefully save space and time. Is there any way to do so without downloading all the data to local machine?
This question is related to my last question here.
Edit 1:
I select dictionary as an example because when you want to look up for a word, it is enough to access a specific part of data and there is no need to process whole of it.
I am not an expert in programming so i was thinking I can modify this answer to make it work(that is why I add AWK tag again). I dont use any specific OS or tool. this is just a basic idea to see what are the possibilities so I dont know how can I improve the tags.
awk cannot download. You must download the file and pipe it into a command that terminates as soon as it finds a result:
wget -qqO- http://example.com/path |grep -wim1 "word"
wget -qqO- URL will have no output other than the content of the given URL, which is placed on standard out so you can then parse it. grep -wim1 "word" will find the first bounded word matching "word" and then terminate. If you don't need it outputted, you can use -wiq instead. If the dictionary has one word per line (and nothing else), you're better off with -x instead of -w so that you can match "can" in its entirety rather than "can't" (' is a word boundary). Remove the -i if you want to match case.
In the comments, you asked:
it may improve to jumpt to start of "w" character maybe so not to download whole data from "a" to "w". is it possible? I guess not
Some programs can "resume" downloads and you may be able to play with that, but you'd have to guess where to start. This would be a lot of work and you might seek too far and therefore fail to get a match.
If you are querying this dictionary more than once, I'd recommend downloading it and saving it so you can query it locally. Even the largest dictionary I know of is only 213MB (compressed, search with zgrep), though I am assuming you're talking about a traditional word list rather than a hash table or other arbitrary data form. Of course, anything longer would take such a long time to download that you'd only want to do it once.
If you really don't want to store it locally, you should probably consider a database rather than a flat file.

How can I write a regular expression that will allow a URL with or without the "HTTP://" part

I'm looking to write a regular expression to validate a potential web address.
In 'http://www.microsoft.com' for example, I would like the make the 'http://' optional so if only 'www.microsoft.com' were entered into my textbox, it would still work.
I've done some research on regular expressions and my question specifically, but I'm not getting anywhere with finding one or really understanding how to write one.
I already have the regex provided in VS to validate an internet address, I'm more unsure of how to modify it to make parts optional.
Regular Expressions are kind of difficult (in my opinion). If you want to use Regex, more power to you.
You could use something simple like this, too.
If (links.StartsWith("https://") or links.StartsWith("http://") or links.StartsWith("www.")) Then
//links are valid
End If

Drupal Views Exposed Filters with approximate matches

I've got a view with exposed filters to help find matches out of thousands of entries. What I'm looking for is exact matches up top (this is done and working) followed by "approximate" matches underneath. The approximate matches may have one or two elements that are not what the user specified, but should be presented as options anyway. Are there any modules that support this functionality?
You could create 2 different views. One being the page you already created with the exposed filter, and the other being a block with almost the exact same settings, but make the filter less strict (ex. "contains" vs. "is equal to"). Then you could print the block below your results on the page, possibly in the footer, and call it something along the lines of "Approximate Matches". If you're not sure how to print a block, there's a good description here.
There may be a more efficient way of doing this, but this is the first thing that came to mind.

How to find everything that is not matching a regular expression

I would like to search all over thousands of HTML code for bad practice of height, width or any other CSS.
for instance I would like to get all places where height is not provided with units, for instance height:40 should be found, but height:40px shouldn't.
For that I am using the search program agent ransack, in which I can put regular expression to search within files.
Currently my regular expression is:
(height:)[\s]*[0-9]*\.?[0-9]+(px)
this finds everything that is like height:40px. (Later on I want to add width, or other things)
My question is how to make a NOT on top of all that?
Or is there any other good application to search files for regular expressions?
Use a negative lookahead (?!regex), eg:
height:\s*\d+(?:\.\d+)?(?!px|\d)
\d is needed to prevent backtracking alternatives from matching.
Consider using a program that already does this, and add you own rules. In general, the practice of fixing up source code is called 'linting'. So, you can find quickly something like CSSLint which is open source and allows custom rules:
https://github.com/stubbornella/csslint/wiki/Rules
http://csslint.net/
Use the negative lookahead this way:
(?!height:\s*\d+(?:\.\d+)?px)height:\s*\d+(?:\.\d+)?
Considering you are using css, you may want to include other units as valid ones like pt, em, %, etc... like the below regex
(?!height:\s*\d+(?:\.\d+)?(?:px|pt|em|%|cm|mm|in|ex|pc))height:\s*\d+(?:\.\d+)?
You can test it over Rubular

Good whitelist for search terms

I'm implementing a simple search on a website, and right now I'm working on sanitizing the input. My plan is to make a whitelist of allowed characters. I'm using PHP, and so far I've got the current regex:
preg_replace('/[^a-z0-9 -]/i', '', $s);
So, I'm removing anything that's not alphanumeric or a space or a hyphen.
Is there a generally accepted whitelist for this sort of thing, or does it just depend on the application? I'm going to be searching on book titles, author names and book blurbs.
What about 2010 (A space odyssey)? What about Giscard d`Estaing's autobiography? ... This is really impossible to answer generally, it will depend on your application and data structures.
You want to look into the fulltext search functions of the database of your choice, or even specialized search appliances like Sphinx.
Clarify what engine you will use first to actually perform your search, and the rules on what you need to strip out will become much clearer.
Google has some pretty advanced rules for searches, but their basic rule is this:
Generally, punctuation is ignored, including ##$%^&*()=+[]\ and other special characters.
However, Google makes exceptions for common search terms, like C++, C#, or $100.
If you want a search as sophisticated as Google's, you can make rules against the above punctuation and have some exceptions. However, for a simple search, just ignore the characters that Google generally ignores.
There's not a generic regular expression to solve this problem. Your code strips out a lot of things you might want to keep, like commas, exclamation points, (semi-)colons, and non-English letters. If you have a full list of all of the titles in your database, you should be able to write a script that will construct a list of all characters found in all of your titles. If your regular expression strips out any of those characters, then you risk having problems (although passing this test doesn't mean that you won't run into problems).
Depending on how the rest of your search is implemented, you may be able to strip out valid characters and still return relevant search results. In this case, you would want your expression to allow non-English characters (since you don't want to split a word) but you might be able to remove all punctuation marks that aren't inside of a quote-delimited phrase. For example, searching for red haired should give you all of the results you would get from searching for red-haired plus a few extra.

Resources