Azure Cognitive search - Fuzzy search - Stay consistent between suggest api and search api - microsoft-cognitive

I have implemented both the suggest+autocomplete a listing page who use the search api but I cannot have consistent results between what is suggested and what I have in the listing.
So my query in suggest mode is:
https://xxx/indexes/my-index/docs/suggest?suggesterName=generalSearchSuggester&top=3&fuzzy=true&$select=sys_Id,Name,Url&search=nin&api-version=2020-06-30
This return 3 results:
Nina
Nina25
Nick
And with the search api my query is:
https://xxx/indexes/my-index/docs?api-version=2020-06-30&&count=true&queryType=full&searchMode=any&%24skip=0&%24top=16&search=nin*~1&%24select=Name
This return 2 results:
Nina
Nina25
In this page: https://learn.microsoft.com/en-us/rest/api/searchservice/suggestions I see "The edit distance is 1 per query string" so I guess that this correspond to ~1 but I don't understand how to make it consistent.
Regards,

In your search example you are using a combination of a wildcard search with fuzzy search. To use fuzzy search as documented, remove the * from your query and specify the edit distance with the tilde character directly.
https://xxx/indexes/my-index/docs?api-version=2020-06-30&&count=true&queryType=full&searchMode=any&%24skip=0&%24top=16&search=nin~1&%24select=Name
This will match tokens with a spelling distance of 1.
nin~1 (matches nina)
nin~2 (matches nick)
nin~3 (matches nina25)
Fuzzy search matches on terms that are similar, including misspelled
words. To do a fuzzy search, append the tilde ~ symbol at the end of a
single word with an optional parameter, a value between 0 and 2, that
specifies the edit distance. For example, blue~ or blue~1 would return
blue, blues, and glue.
Autocomplete vs Search
The intent of the autocomplete suggester is to give you fuzzy suggestions on what to search for. It's telling you that you can search for either nina, nina25 or nick. When searching for these terms you will get all the results containing the token nina (or nina25 or nick).

Related

Wildcard search not returning results for search terms containing single-quote ( ' ) character

We're using Google Cloud-Search and searching a particular datasource for "O'Conn*" doesn't returned any results.
...
"valueFilter": {
"operatorName": "lastname",
"value": {
"stringValue": "O'CONN*"
}
...
The field is set as wildcardsearchable:true and all the records were re-indexed after. The datasource contains over 40 records that should match. The wildcard search does work but not for any wildcard search terms that contains a single quote ('). Here's a few test I've done
"O'CONN*" - No match
"*CONN*" - Plenty of matches including "O'Connor"
"O'CONNOR" - Matches all "O'Connor" (not a wildcard search)
Would you know a way to perform the search? Do we need to escape the single quote in anyway?
I thought about replacing the single quote by another character, removing it or adding an alternative form of the term before indexing, but then we'd be opening a can of worm. Needing to process search terms before sending, detect and update before the field is displayed on our search result page, etc...
I found a post that likely explains why I'm experiencing this behaviour, but it's for Lucene search. I couldn't find a setting similar to the one described in that post in Google Cloud Search settings or documentation.
Thanks
I tried:
Escaping the single quote in different ways;
Putting the wild card character at different position to test different theories
I've search Google's documentation for "Cloud Search wildcard single quote", but no satisfactory results.

Incorrect work of autocomplete with Cyrillic

When sending a request to https://autocomplete.geocode.ls.hereapi.com/6.2/suggest.json?query=Вильнюс with an indication of cyrillic nothing comes and with a latin https: //autocomplete.geocode.ls.heraapi.com/6.2/suggest.json?query=Viln all is well. Tell me what the problem is or what I'm doing wrong?
You're not doing anything wrong. Autocomplete is designed to give you addresses that contain (perfectly match) your input string, and the results are sorted by relevance.
When you make your query in russian and provide only "Вильнюс" as input, the service is finding a lot of results (street names) that it considers are more relevant than the city. The city name is also found, but since the service doesn't think that this is what you're searching for, it puts the city much lower in the results list. You don't see it because you're limiting your query to give you only the first 10 matches (with the maxresults=10 parameter), but if you change the maxresults parameter to 20, for example, you will see that Vilnius appears in the 16th place of the API response.
If you want the service to better understand what is the thing you're querying for, you'll need to provide additional information. For example, if you continue typing and your input string is now "Вильнюс " (with a space at the end) or "Вильнюс Л" (a space and another letter), the service will understand what you mean and will return the result you want.
Another way of providing more information to change the way the service ranks the results is by adding a spatial filter, like the country, mapview, or prox parameters mentioned in the API Reference section of the documentation. Alternatively, the resultType parameter can help you filter out all the results with street names and return only city names, if that's what you want. These are just some options available, the one that is right for you will depend on your use case.

Geocoder Request No Response - Fail

Is it possible to have the Geocoding API works and sometimes doesn't work for some reason?
Here is the detail what I am trying to request:
http://geocoder.cit.api.here.com/6.2/geocode.xml?app_id=DemoAppId01082013GAL&app_code=AJKnXv84fjrb0KIHawS0Tg&gen=4&country=Australia&state=Tas&district=Wynyard&postalcode=7321&street=86 Jackson Street
and Here is the demo version from the official website:
http://geocoder.cit.api.here.com/6.2/geocode.xml
?app_id=DemoAppId01082013GAL
&app_code=AJKnXv84fjrb0KIHawS0Tg
&gen=7
&housenumber=425
&street=W+Randolph
&city=Chicago
I am using the Free version of it and I have no idea why it works sometimes and doesn't in other times.
Thank you
When you are making a structured address query, by default, all parts of the address need to match. Given that there is no international standard for addresses, the HERE geocoder could be placing parts of the address in an alternative part of the structure.
In your case Wynard is recognized as a city, not a district. Now it is possible you could want this to fail as an invalid address, but it is also possible to tell the Geocoder to be a little more lenient by using the FlexibleAdminValues parameter in the AdditionalData
see the User Guide here
FlexibleAdminValues
N (positive integer <= 1). Customizes flexibility in the input values
for the admin hierarchy defined in LocationFilterType. The value is a
bitmask defining which hierarchies might be swapped without impacting
the match level:
0: No swapping at all (default). Exact admin hierarchy values are
expected as input
1: City and District swapping
Please note this
option is for geocoding addresses and needs at least street level
input to work as designed. It will not return expected results when
the input is a named place only (e.g. city or district name).
So the following url will work for you provided you have a street address:
http://geocoder.cit.api.here.com/6.2/geocode.xml?app_id=APP_ID&app_code=APP_CODE&gen=7&AdditionalData=FlexibleAdminValues,1&country=Australia&state=tas&district=Wynyard&...etc
Another alternative is to not use the structured input parameters but let the HERE Geocoder sort out the identification and categorization of the input tokens.
By using the searchtext parameter and providing all your data as the input value the Geocoder can match and score the tokens.
E.g.: http://geocoder.cit.api.here.com/6.2/geocode.xml?app_id=DemoAppId01082013GAL&app_code=AJKnXv84fjrb0KIHawS0Tg&gen=7&searchtext=Australia%20Tas%20Wynyard%207321%2086%20Jackson%20Street

Filter to Group URL on Visitors Flow

I have found a similar question earlier here:
Google Analytics Visitors Flow: grouping URLs?
However I'm confused because people suggest different way to write the Replace String, and either way I try it am not able to make it work.
So I have a ecommerce site with hundreds of different pages. The different parts of the website is:
http://example.com/sv/ (Root)
http://example.com/sv/category/1-name/
http://example.com/sv/product/1-name/
http://example.com/sv/designer-tool/1-name/
http://example.com/sv/checkout/
When I go to the visitors flow. I want to see the amount of people that go from example Root to Category, and from Category to Product, and from Product to Designer Tool, and from Designer Tool to Checkout. However now when I have so many different pages it becomes very difficult to follow the visitors flow, because the product pages are for example not grouped together.
So instead of above. I would like to remove the 1-name/ part in the end. And only see /sv/category/, /sv/product/, /sv/designer-tool/.
In the earlier post I understand you can use an advanced filter to do this. I have set the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/(category|product|designer-tool)(/\d*)(.*)
Replace String: /$A1$A3
I guess that my search string and my replace string is wrong. Any ideas?
EDIT: I updated my filter to the following:
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
Still testing and unsure if it's the correct way to set it up.
I was able to solve this by the Search String and the Replace String in my edit above.
So basically what I did was:
Create a secondary view/profile for your site. If you apply your filter to your one and only view/profile that means that you won't be able to see any detailed data about specific pages, because the filter removes/filter that.
Add an Advanced Filter with the following settings:
Type: Search & Replace
Field: Request URI
Search String: ^/sv/(category|product|designer-tool)(/\d*)(.*)$
Replace String: /sv/\1/
You need to wait 24h after creating your new profile/view before you can see any data in it.
So my confusion was regarding the Search and Replace String. The Search String is an regular expression for matching everything after your .tld. So for example http://www.example.com/sv/mypage/1-post/, the Search String will only search within /sv/mypage/1-post/.
The Replace String is what it should replace the whole Search String with. So in my case, I matched all URL's that had /sv/category/1-string/. I wanted only to keep the "category" part, so I replaced the whole string with /sv/category/ by inputting Replace String /sv/\1/
/sv/ means just what it says. \1 means that it should take the value of the first () of my Search String (In this case "category"). The ending / is just an ending slash.
All in all, it means that any URLs that looked like http://example.com/sv/category/1-string/ was changed to http://example.com/sv/category/. Meaning that I can now see data for all my categories as a group, instead of individual pages.

DBpedia : Get list of Chinese universities and their adresses to populate google map?

I'm trying to get list of Chinese universities and their adresses. The minimum being the City/Town name. I will use these addresses to populate a googlemap, fiddle here.
I saw interesting code such as:
SELECT ?resource ?value
WHERE {
?resource a <http://dbpedia.org/class/yago/CitiesAndTownsInDenmark> .
?resource <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?value > 100000)
}
ORDER BY ?resource ?value
Since CitiesAndTownsInChina doesn't work,
1. Where to find the exact name of the class I'am targeting ? and
2. Where to find dbpedia's operators manual ?
Note: I'am a very active user on Wikipedia, I'am well aware of all the data available there, but the dbpedia ontology/syntaxe/keywords is quite hard to get.
Personal note: queries on http://dbpedia.org/snorql/ , http://dbpedia.org/sparql/ , http://querybuilder.dbpedia.org/
(Expanding on my reply to How to find cities with more than X population in a certain country)
CitiesAndTownsInDenmark exists because people use the category http://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Denmark in wikipedia. Wikipedia categories are pretty loose and as a result there's a lot of variation in style, so even if a useful category exists the name may not be guessable.
In addition categories are maintained manually, and may not be consistently applied.
A good place to start is looking at the data. Visiting http://dbpedia.org/page/Beijing I see yago:MetropolitanAreasOfChina which seems promising, but if you follow that link you'll see it's not well populated.
As a consequence avoid relying on the existence of such categories and directly querying for populated places in a country. This information comes from wikipedia infoboxes, and they're much more consistent than categories. Taking Beijing as an exemplar again I found:
select ?s {
?s a <http://dbpedia.org/ontology/PopulatedPlace> ;
<http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/China>
}
(The relevant properties and values for my query were found by copying link location in the Beijing page)
with the result:
"http://dbpedia.org/resource/Hulunbuir"
"http://dbpedia.org/resource/Guangzhou"
"http://dbpedia.org/resource/Chongqing"
"http://dbpedia.org/resource/Kuqa_County"
"http://dbpedia.org/resource/Changzhou"
... nearly 3000 results ...
You'll notice that position is encoded multiple times (geo:lat and long, georss:point, various dbpprop:latd longd things), and there seem to be two values excitingly. You can either simply deal with the multiple values in whichever format you prefer, or try picking just one using GROUP BY and SAMPLE.
As for a manual, almost everything I know of are academic papers, and not very useful. However the data is reasonably self documenting.
for your first question:
you can see possible classes by querying one member of your intended set of entities (ex: Shanghai).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type WHERE {
<http://dbpedia.org/resource/Shanghai> rdf:type ?type.
FILTER regex(str(?type), ".*China", "i").
} LIMIT 100
which gives this result:
dbpedia:class/yago/MetropolitanAreasOfChina [http]
dbpedia:class/yago/PortCitiesAndTownsInChina [http]
dbpedia:class/yago/MunicipalitiesOfThePeople'sRepuBlicOfChina [http]
dbpedia:class/yago/PopulatedCoastalPlacesInChina [http]
they are CamelCase versions of the categories that you will find at the bottom of wikipedia pages. I was fooled for a while by the erroneous capitalization of RepuBlic and finally saw that it contains only 4 cities, so it is of limited use for you.
so I would propose to go with #user205512 answer and get the cities by linking 2 properties.
for your second question:
I would advice you to search/ask on http://answers.semanticweb.com

Resources