I would like to find messages which contains this sequence "--->", but Kibana result it's wrong.
How escape this characters to have a good result ?
Thanks,
First I think you should be checking your mappings, whether your fields are not marked as not_analyzed (or don't have keyword analyzer). If it happened to be there as such you won't be able to see any search results. Standard analyzer removes characters when indexing a document.
What if you search it within the quotes including your special character?
This SO could help you. You could also maybe have a look at the Special Characters section at the Lucene doc. Hope it helps!
Related
am having a bad luck in validation of multiple emails seprated with comma Or Semi colon.
ValidationExpression="(\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*([;: ]+)?)+"
This validation works fine when multiple email separated with comma OR colon, but what i need more to bypass ENTER input too. Means if user hits enter after writing each emails.
How can i modify above expression so that it can eat Enter as well as comma and semi colon,
Thanks in advance
Using this in asp.net .
I don't really see how that works with commas but this should make it work with multiple lines.
(\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*([;:\s]+)?)+
If you want it to work for commas I would think it would need to be this:
(\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*([,;:\s]+)?)+
I should point out that I did not do anything to validate whether or not that was actually a good regex for validating email addresses. I just added the requested functionality.
I highly recommend The Regex Coach to help you build regex.
I also highly recommend http://www.regular-expressions.info/ as a reference.
I have some word lists that was extracted from aspell dictionary.
The problem is that some of the words that aspell returns isn't valid words.
I'd like know if there are a way to check if the words returned exists in determined idiom or not.
Thanks!
You could validate them using google spelling api.
I'm implementing a simple search on a website, and right now I'm working on sanitizing the input. My plan is to make a whitelist of allowed characters. I'm using PHP, and so far I've got the current regex:
preg_replace('/[^a-z0-9 -]/i', '', $s);
So, I'm removing anything that's not alphanumeric or a space or a hyphen.
Is there a generally accepted whitelist for this sort of thing, or does it just depend on the application? I'm going to be searching on book titles, author names and book blurbs.
What about 2010 (A space odyssey)? What about Giscard d`Estaing's autobiography? ... This is really impossible to answer generally, it will depend on your application and data structures.
You want to look into the fulltext search functions of the database of your choice, or even specialized search appliances like Sphinx.
Clarify what engine you will use first to actually perform your search, and the rules on what you need to strip out will become much clearer.
Google has some pretty advanced rules for searches, but their basic rule is this:
Generally, punctuation is ignored, including ##$%^&*()=+[]\ and other special characters.
However, Google makes exceptions for common search terms, like C++, C#, or $100.
If you want a search as sophisticated as Google's, you can make rules against the above punctuation and have some exceptions. However, for a simple search, just ignore the characters that Google generally ignores.
There's not a generic regular expression to solve this problem. Your code strips out a lot of things you might want to keep, like commas, exclamation points, (semi-)colons, and non-English letters. If you have a full list of all of the titles in your database, you should be able to write a script that will construct a list of all characters found in all of your titles. If your regular expression strips out any of those characters, then you risk having problems (although passing this test doesn't mean that you won't run into problems).
Depending on how the rest of your search is implemented, you may be able to strip out valid characters and still return relevant search results. In this case, you would want your expression to allow non-English characters (since you don't want to split a word) but you might be able to remove all punctuation marks that aren't inside of a quote-delimited phrase. For example, searching for red haired should give you all of the results you would get from searching for red-haired plus a few extra.
I am using ^[\w-\.\+]+#([\w-]+\.)+[\w-]{2,4}$ to validate email address, when I use it from .aspx.cs it works fine to validate IDN email but when I use it from aspx page directly it doesn't work.
return Regex.IsMatch(
email,
#"^[\w-\.\+]+#([\w-]+\.)+[\w-]{2,4}$",
RegexOptions.Singleline);
the ID that I would like to validate looks like pelai#ÖßÜÄÖ.com
I am too bad at regex do you guys know what am I doing wrong?
You may want to take a look at regexlib.com, they have a fantastic selection of user-created content to do these extremely commont types of matches.
http://regexlib.com/Search.aspx?k=email
First the correct validation of an e-mail address is somewhat more complex as regex. But that apart, the Regex is not at fault, but probably rather how you use it.
Edit (after seeing your code): do you make sure that the string to be tested has no whitespace and such in it? Put a breakpoint on it right there and inspect the string, that might give you an idea of what is going wrong.
You should escape dash (-) within the first char class and no need for dot and plus :
[\w\-.+]
or
[\w.+-]
no need to escape dash if it is the last char.
With "directly from aspx page" you probably mean in a regularexpression validator?
Then you need to be aware that the regex is used by a different system: javascript which has it's own implementation of regex. This means that regexes that work in .Net directly, might fail in js.
The implementations are not too different, the basics are identical. But there might be differences in details (as how an unescaped - is handled) and js lacks some "advanced features" (although your regex doesn't look too "advanced" ;-) ).
Do you see any error messages in the browser?
The problem is those non-ASCII characters in your test address, ÖßÜÄÖ (which you only ever mentioned in a comment to #HansKesting's answer). In .NET, \w matches all Unicode letters and digits, and even several characters besides _ that are classified as connecting punctuation, but in JavaScript it only matches [A-Za-z0-9_].
JavaScript also lacks support for Unicode properties (like \p{L} for letters) and blocks (\p{IsLatin}), so you would have to list any non-ASCII characters you want to allow by their Unicode escapes (\uXXXX). If you just want to support Latin1 letters, I suppose you could use [\w\u00C0-\u00FF], but IDN is supposed to support more than just Latin1, isn't it?
By the way, JavaScript also doesn't support Singleline mode, and even if it did you wouldn't be able to use it. JS does support Multiline and IgnoreCase modes, but there's no way to set them on both the server and client side. The inline modifiers, (?i) and (?m), don't work in JS, and the RegexOptions argument only works server-side.
Fortunately, you don't really need Singleline mode anyway; it allows the . metacharacter to match linefeeds, but the only dots in your regex are matching literal dots.
Is there any reliable way to check if user has entered Arabic words into a form and tries to submit it? Can Javascript handle this? Or, only server script like .NET can do this?
I'm thinking that if possible the script should directly prevent the user from inputting Arabic words into the form and show an alert pop up.
Please share any examples if you have any idea how to do it.
Thanks
In Unicode, Arabic characters fall in a specific range. You can use a regular expression in JavaScript to check if a string contains any characters in that range. (You could also do that in c#.) Here's a really helpful tool that will let you select the ranges you want to search for and create a JS-compatible regex for that.
For example, [\u0600-\u06FF\u0750-\u077F] will match any characters that fall in the Unicode ranges for "Arabic" and/or "Arabic Supplement".
You could use the Google Ajax Language API to detect this. Here is an example.