Wildcard search not returning results for search terms containing single-quote ( ' ) character - wildcard

We're using Google Cloud-Search and searching a particular datasource for "O'Conn*" doesn't returned any results.
...
"valueFilter": {
"operatorName": "lastname",
"value": {
"stringValue": "O'CONN*"
}
...
The field is set as wildcardsearchable:true and all the records were re-indexed after. The datasource contains over 40 records that should match. The wildcard search does work but not for any wildcard search terms that contains a single quote ('). Here's a few test I've done
"O'CONN*" - No match
"*CONN*" - Plenty of matches including "O'Connor"
"O'CONNOR" - Matches all "O'Connor" (not a wildcard search)
Would you know a way to perform the search? Do we need to escape the single quote in anyway?
I thought about replacing the single quote by another character, removing it or adding an alternative form of the term before indexing, but then we'd be opening a can of worm. Needing to process search terms before sending, detect and update before the field is displayed on our search result page, etc...
I found a post that likely explains why I'm experiencing this behaviour, but it's for Lucene search. I couldn't find a setting similar to the one described in that post in Google Cloud Search settings or documentation.
Thanks
I tried:
Escaping the single quote in different ways;
Putting the wild card character at different position to test different theories
I've search Google's documentation for "Cloud Search wildcard single quote", but no satisfactory results.

Related

Azure Cognitive search - Fuzzy search - Stay consistent between suggest api and search api

I have implemented both the suggest+autocomplete a listing page who use the search api but I cannot have consistent results between what is suggested and what I have in the listing.
So my query in suggest mode is:
https://xxx/indexes/my-index/docs/suggest?suggesterName=generalSearchSuggester&top=3&fuzzy=true&$select=sys_Id,Name,Url&search=nin&api-version=2020-06-30
This return 3 results:
Nina
Nina25
Nick
And with the search api my query is:
https://xxx/indexes/my-index/docs?api-version=2020-06-30&&count=true&queryType=full&searchMode=any&%24skip=0&%24top=16&search=nin*~1&%24select=Name
This return 2 results:
Nina
Nina25
In this page: https://learn.microsoft.com/en-us/rest/api/searchservice/suggestions I see "The edit distance is 1 per query string" so I guess that this correspond to ~1 but I don't understand how to make it consistent.
Regards,
In your search example you are using a combination of a wildcard search with fuzzy search. To use fuzzy search as documented, remove the * from your query and specify the edit distance with the tilde character directly.
https://xxx/indexes/my-index/docs?api-version=2020-06-30&&count=true&queryType=full&searchMode=any&%24skip=0&%24top=16&search=nin~1&%24select=Name
This will match tokens with a spelling distance of 1.
nin~1 (matches nina)
nin~2 (matches nick)
nin~3 (matches nina25)
Fuzzy search matches on terms that are similar, including misspelled
words. To do a fuzzy search, append the tilde ~ symbol at the end of a
single word with an optional parameter, a value between 0 and 2, that
specifies the edit distance. For example, blue~ or blue~1 would return
blue, blues, and glue.
Autocomplete vs Search
The intent of the autocomplete suggester is to give you fuzzy suggestions on what to search for. It's telling you that you can search for either nina, nina25 or nick. When searching for these terms you will get all the results containing the token nina (or nina25 or nick).

Riak: searchable list of maps (with CRDTs)

I have a use-case which consist in storing several maps under an attribute. It's the equivalent JSON list:
[
{"tel": "+33123456789", "type": "work"},
{"tel": "+33600000000", "type": "mobile"},
{"tel": "+79001234567", "type": "mobile"}
]
Also, I'd like to have it searchable at the object level. For instance:
- search entries that have a mobile number
- search entries that have a mobile number and whose number starts with the string "+7"
Since Riak doesn't support sets of maps (but only sets of registers), I was wondering if there is a trick to achieve it. So far I've had 2 ideas
Map of maps
The idea is to generate (randomly?) keys for the objects, and store each object of the list in a parent map whose keys are the ones generated for this only purpose to have a key.
It seems to me that it doesn't allow to search the object (maps inside the parent map) because Riak Solr search requires the full path to the attribute. One cannot simply write the following Solr query: phone_numbers.*.tel:+7*. Also composed search (eg. search entries that have a mobile number and whose number starts with the string "+7") seem hard to achieve.
Sets with simulated multi-valued attributes
This solution consists in using a set and insert all the values of the object as a single string, with separators between them. Yes, it's a hack. An attribute value would look like: $tel:+79001234567$type:mobile$ with : as the attribute name-value separator and $ as the attribute separator.
Search could be feasible using the * wildcard (ugly, but still), except the problem with escaping the separators: what if the type attribute includes a :? Are they some separators that are accepted by Riak and would not be acceptable in a string (I'm thinking of control characters)?
In a nutshell
I'm looking for a solution whatever the hackiness of it, as long as it works. Any idea is welcome.

URL Manipulation With Google Analytics Advanced Filters

In Google Analytics, I have a view for a web site in which I'm trying to use Advanced filters to codify a transformation on the "Request URI" field:
if the Request URI matches "/product/[productid]/someproductscreen" then
I want to strip "/[productid]" from the Request URI so I can combine all
visits to /someproductscreen across all products
all Request URIs that do not match the pattern above should be passed
into the view unmodified
When I view the traffic in the Site Content..All Pages report, I don't want to see any values of "/[productid]" in the URIs in the "Page" column - I'd like all visits to a particular product page to roll up under a URI like "/product/warranty" or "/product/description".
Unfortunately I find it difficult to try figuring this out on my own because of the lag in seeing results in Google Analytics after making a change combined with my shaky grasp of how regular expressions are utilized in Advanced Filters.
GA Advanced Filter
Assuming your [product id] was 3 or more consecutive digits, ie: /product/123456789/someproductscreen then this would work:
Advanced Filter
Field A: Request URI: ^/product/\d{3,}(.*)
Field B:
Output to: Request URI: /product/{id}$A1
Check Field A Required and Override Output Field
The above configuration will rewrite the Request URI from:
/product/123456789/someproductscreen
/product/12345
/some/other/url
to:
/product/{id}/someproductscreen
/product/{id}
/some/other/url
You mention you'd want to see /product/warranty. This would obscure the edit. My suggestion is to leave a placeholder with the edit. I use {id} but it could be any string, ie. <product id>
Level Up the Regex
Link to regex101 example
Regular Expressions are used by GA Filters, in the above example we used regex to match a product ID that is all digits. We did this using the regular expression:
^(/.*/)(\d{3,})(.*)
This is true when Request URI has root folder (/.*/) followed by three or more digits: (\d{3,}) Finally, we capture the remainder of the URI using (.*). We used groups so we can access the values in a later step.
GA Advanced Filters can persist groups extractions from Field A and Field B. We use this feature to rebuild a Request URI using the Output To -> Constructor. Below is an example of condensing dynamic Ids to a static string:
$A1{id}$A3
$A1 will extract 1st group from Field A. $A3 would extract the third group from Field A if it were to exist. {id} is a static string that is a placeholder for the dynamic value.
If your product id was a mix of alphanumeric, then we'd simply need to find a pattern that matched. You didn't provide any examples of ID, so here are a few examples of common ID patterns found in URLs:
[A-Z]-\d+ // matches Z-764537389
\d{4}-\d{3}-\d{2} // matches 1234-123-12
Easy mode right? What about if you have a RFC4122 compliant UUID in the URL you need match? No problem:
[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}
// matches 0df98a02-c438-4c57-8d1c-2f6041804e2c
Note: GA Advanced Filter Regex is case insensitive by default, this can be overridden in the filter settings.
Here https://regex101.com/r/kRUJnU/1
Start playing with this tool it ll become really helpful on the future since personalized filters with regex matching and capturing groups are REALLY important in GA.
EDIT: How to go from regex101 to GA.
In the image below you can see how i deleted the last part of URLs when they are something like:
www.mysite.com/vuelos/carrito/checkout/46787654567898765
Or something like:
www.mysite.com/vuelos/carrito/46787654567898765

Google Analytics doesnt apply my filter

I created a filter on my account.
This filter is a custom filter, search and replace.
I use
"Request URI" for Filter Field,
\?.* for Search String
I also attached this filter to my specific view.
My problem is, if I go to the view->Reporting->Behavior->Site Content->All Pages, I see that the filter is not applied. I see pages such as "/xy.html?id=12345".
I would expect "/xy.html" only. Somewhere I've read that filters are not works for past data, but I did some test visits after I applied the filter and the urls wasn't changed :(
If I click on verify, I get this message: "This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small."
Your filter definition should use regular expressions for search&replace.
Search String: (.)?(\?.)
Replace String: \1
This will search for two parts: 1. all symbols before the very first "?" 2. all symbols after the first "?" in your URI.
The replacement will use the first part as replacement (all symbols before the very first "?"
Make sure you google some regex basics.
Filters only apply the new data collected, never the historic data you already have in your properties collected.

Filter to Group URL on Visitors Flow

I have found a similar question earlier here:
Google Analytics Visitors Flow: grouping URLs?
However I'm confused because people suggest different way to write the Replace String, and either way I try it am not able to make it work.
So I have a ecommerce site with hundreds of different pages. The different parts of the website is:
http://example.com/sv/ (Root)
http://example.com/sv/category/1-name/
http://example.com/sv/product/1-name/
http://example.com/sv/designer-tool/1-name/
http://example.com/sv/checkout/
When I go to the visitors flow. I want to see the amount of people that go from example Root to Category, and from Category to Product, and from Product to Designer Tool, and from Designer Tool to Checkout. However now when I have so many different pages it becomes very difficult to follow the visitors flow, because the product pages are for example not grouped together.
So instead of above. I would like to remove the 1-name/ part in the end. And only see /sv/category/, /sv/product/, /sv/designer-tool/.
In the earlier post I understand you can use an advanced filter to do this. I have set the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/(category|product|designer-tool)(/\d*)(.*)
Replace String: /$A1$A3
I guess that my search string and my replace string is wrong. Any ideas?
EDIT: I updated my filter to the following:
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
Still testing and unsure if it's the correct way to set it up.
I was able to solve this by the Search String and the Replace String in my edit above.
So basically what I did was:
Create a secondary view/profile for your site. If you apply your filter to your one and only view/profile that means that you won't be able to see any detailed data about specific pages, because the filter removes/filter that.
Add an Advanced Filter with the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
You need to wait 24h after creating your new profile/view before you can see any data in it.
So my confusion was regarding the Search and Replace String. The Search String is an regular expression for matching everything after your .tld. So for example http://www.example.com/sv/mypage/1-post/, the Search String will only search within /sv/mypage/1-post/.
The Replace String is what it should replace the whole Search String with. So in my case, I matched all URL's that had /sv/category/1-string/. I wanted only to keep the "category" part, so I replaced the whole string with /sv/category/ by inputting Replace String /sv/\1/
/sv/ means just what it says. \1 means that it should take the value of the first () of my Search String (In this case "category"). The ending / is just an ending slash.
All in all, it means that any URLs that looked like http://example.com/sv/category/1-string/ was changed to http://example.com/sv/category/. Meaning that I can now see data for all my categories as a group, instead of individual pages.

Resources