Is there an existing machine-readable enumeration/list/taxonomy of sports/games? - standards

I'm looking for a machine-readable enumeration/list of sports, ideally categorized in some form. This [1] is the closest I could find online, but it isn't machine readable (and contains ambiguity too, e.g. one sport matches to more than one category).
[1] http://en.wikipedia.org/wiki/List_of_sports

Related

In Watson conversation, How to turn on fuzzy matching only for synonyms and not for the value?

I am modelling a conversation in Watson conversation. The conversation is around the facilities available at airports. I have configured airport names as an entity and have added variations of airport names as synonyms and have kept the IATA code for the airport as the entity value. For example, Schiphol airport in Amsterdam looks like the below
I have turned on fuzzy matching on this entity so that I can catch typos people will make when they try to say Schiphol or Amsterdam. However, Watson is now capturing the word am in the below sentence as this entity
I am wondering if there is wifi in schiphol airport?
How do I stop Watson from fuzzy matching on the entity value but only do it on synonyms?
I don't think it's possible to enable Fuzzy matching on synonyms only, at the moment. I see a couple of "easy" workarounds.
Option 1: Leverage Watson's confidence level.
If you place <? entities ?> within your node's response and test it in the Try it out panel, you'll notice that each detected entity has a confidence level associated with it. This is Watson's confidence level in the entity, expressed as a float ranging from 0 to 1.
When testing it with your entity value and synonyms, I got 90% confidence (i.e., 0.9) for amsterdaam but only 70% for am.
So assuming these numbers hold for you, you could use entities[0].confidence > 0.7 as the condition in your node to decide when to assign the airport to a context variable and when to ask for clarification from the user in the response.
You might have to do some testing to see if you can find a confidence level value that works reasonably well as a threshold for your #airport entity values and their common misspellings.
Option 2: Use two entities
#airport-code: Use the airport code for the value, no synonyms, and no fuzzy matching enabled.
#airport-name: Use the airport name for the value, various synonyms including city, and fuzzy matching enabled.
Depending on how your chatbot works, this might be an acceptable compromise or complicate too much your logic.
I'll give you an example. If your $airport context variable will work whether AMS or Amsterdam Airport Schiphol is stored, this solution solves the problem for you as is. If not, there is an extra step before you can assign the value to the $airport context variable. Namely, you may need to implement a lookup to retrieve the airport code for the given airport name.

Bing Search with market return strange result

I have a strange behaviour with Bing Web Search.
I have a search query "hawkers" OR "hawkersco" OR "#hawkersco" OR "#hawkers" OR "www.hawkersco.com" with market = 'es-ES', safeSearch = Strict and responseFilter = webPages.
So, I expect, that result will contain at least one of these words and it will be Spanish posts. In fact I get more of posts in English and its not contain these keywords...
If I try search one by one these keywords, without OR operator, I had expected Spanish posts.
Please, explain why it is? How to use search query for get expected results?..
Check the specification for Bing Web Search API. Possibly this might be as simple as changing market to mkt(since you listed all the other parameters as used). And that means you should have a value for setLang as well.
You're not getting Spanish posts at all?
In that case, see here.
Bing results are based on relevance. Regardless of Market or Language.
If the result is deemed relevant. It will rank higher compared to the
selected language, and appear in the results.
Freshness affects the results, in that you need relevant(popular)
sites in your language. For them to attain sufficient relevance in the
selected time period.
You cannot rely on Bing returning a single language exclusively, with
the settings as they are.

CSV list of all universities - google maps

I have a CSV list of university names around the world - about 13,000 university names. I'm looking for a way to pull the addresses of these universities. Google Maps API / Google Places API looks promising, but requires lat/long to map the locations.
End game is to mark to each school as a 1 if the school is in the US, and 0 if the school is outside of the US.
Any thoughts on how to search these colleges in maps and pull out the addresses - or at least the country?
Example:
is there nothing else in the csv, only the names? that's going to make it hard, i'd bet the names aren't always unique in the world.
you could write something that had different passes at biting the apple - for instance, if the university has a state name in it, check those off as 1's - then find another logic to use to take "another bit" until the apple is gone.
On top of #WEBjuju's answer, since you only want to mark if the school is in US, or outside of US, you can use the "country" type in Place Types in the Google Places API, by setting the option as country='us'.
https://developers.google.com/places/supported_types?csw=1#table2
You may also want to cross check with this list of schools.
https://www.4icu.org/reviews/index2.htm
https://en.wikipedia.org/wiki/Lists_of_universities_and_colleges_by_country

Show number of sessions from organic keywords by language

So i need to see how much sessions is made by certain organic keyword, also by each language. The problem is that there is many variations of language codes, for example: en and en-us so all my keywords are split.
There is certan number of sessions for keyword A for en
And there is also certain number of session for keyford A for en-us
Example of keyword sessions being split couse of language code: http://prntscr.com/483p1i
How do i show traffic from both variation of language code so it is not split in 2?
The same problem is also for other languages. How can i get the report that i need? I tried in Acquisition > Keywords > Organic and also Audience > Geo > Language, but i always get stuck with multiple language codes for same languages.
Set language as second dimension, choose advancend filter, select language as dimension to filter by and select "contains" as condition (you could also use regular expressions, but this is the simple option). Then "en" will give you en-us, en-gb and other combinations.
The first line in the resulting data table will give you the totals for the chosen language.

Travel APIs how to integrate them all?

I may start working on a project very similar to Hipmunk.com, where it pulls the hotel cost information by calling different APIs (like expedia, orbitz, travelocity, hotels.com etc)
I did some research on this, but I am not able to find any unique hotel id or any field to match the hotels between several API's. Anyone have experience on how can to compare the hotel from expedia with orbitz or travelcity etc?
Thanks
EDIT: Google also doing the same thing http://www.google.com/hotelfinder/
From what I have seen of GDS systems, and these API's there is rarely a unique identifier between systems for e.g. hotels
Airports, airlines and countries have unique ISO identifiers: http://www.iso-code.com/airports.2.html
I would guess you are going to have to have your own internal mapping to identify and disambiguate the properties.
:|
When you get started with hotel APIs, the choice of free ones isn't really that big, see e.g. here for an overview.
The most extensive and accessible one is Expedia's EAN http://developer.ean.com/ which includes Sabre and Venere with unique IDs but still each structured differently.
That is, you are looking into different database tables.
You do get several identifies such as Name, Address, and coordinates, which can serve for unique identification, assuming they are free of errors. Which is an assumption.

Resources