How to classify Wikidata items? - wikidata

I am trying to classify items into the main categories supported by Wikidata:
Generic, Person, Organization, Events, Works, Terms, Place, Others.
These categories are listed here:
https://www.wikidata.org/wiki/Wikidata:List_of_properties
I could not find a property that specifies the main category. I looked into
the P31 "instance of" property and P279 "subclass of" but they are not what I need.
For example for "IBM" the P31 returns "public company" and "software house" and for "Swiss International Air Lines" it returns "airline".
So I cannot tell that they are both organizations.
Is there a way to do this?
One option would be to check the properties of an item, so
if an item has the P21 "sex or gender" then it's a human (or animal).
But I don't think that is stable since no property is mandatory.
I'm using the Wikidata Toolkit for my queries.

Wikidata used to have a main type property but it was deleted in favour of instance of and a more flexible schema.
You can see lots of archived discussion about the main type at https://www.wikidata.org/wiki/Property_talk:P107
You probably want to take a look at the SPARQL endpoint at http://query.wikidata.org
Q4830453 is business enterprise / company.
To find all items that are a company or a subclass of company just do:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?item
WHERE {
?item wdt:P31/wdt:P279* wd:Q4830453
}
The query takes a little time, there are currently 150k results.

Related

How to get multilingual sites after Wikidata reconciliation in OpenRefine

I have a column of reconciled entities in OpenRefine which include entities like Q56085233 and I would like to retrieve all links inside "Multilingual sites", if possible with a separator or only one at a time.
That is Q56085233, for instance, has two pages, one from "commons" (https://commons.wikimedia.org/wiki/Category:Wikimedia_Tamazight) and the other one from "meta" (https://meta.wikimedia.org/wiki/Wikimedians_of_Tamazight_User_Group).
Is there a way to retrieve both websites from the "Add column from reconciled values" function? Moreover, is it possible to first call only "meta" pages, and then only "commons" pages?
The Wikidata reconciliation service supports special "properties" to fetch such things, as documented at https://wikidata.reconci.link/ (look for "special properties" there).
The links inside the "Multilingual sites" are called "sitelinks". For meta.wikimedia.org they can be fetched by using the Smetawiki code in the "Add columns from reconciled values" dialog. You could similarly fetch sitelinks for other Wikimedia sites (Sitwiki for the Italian Wikipedia, for instance).

Getting Papers based on certain keyword from microsoft academic

How can we get id of papers based on keyword.
e.g to get all paper ids of computer science from Microsoft Academic API
Microsoft academic
graph search API
Given your example, I assume you mean "field of study" for computer science.
Please, try this query:
https://api.projectoxford.ai/academic/v1.0/evaluate?expr=Composite(F.FN='computer science')&model=latest&attributes=Id
Currently, values of attribute "W" (keyword) are single words, therefore in order to get both "computer" and "science" you need to use And() query.
See example below (I added "Ti" [title] attribute to output):
https://api.projectoxford.ai/academic/v1.0/evaluate?expr=And(W='computer', W='science')&model=latest&attributes=Id,Ti
Please, note that you need to specify "Ocp-Apim-Subscription-Key" field in header of HTTP request, the value should be your subscription id.

How to extract Companies per country from Freebase

I am new to Freebase. I am trying to extract all companies per country (The Head Quarter's country). The simplest approach I thought was to list them all and filter by country such as this test
[{
"name": null,
"type": "/organization/organization",
"/location/location/containedby": "Japan",
"limit": 4
}]
The problem is that I get schools as well. It is not very clear unlike DBpedia that has a class called "Company", how one can find distinguish the companies in Freebase while there is no clear type for it? I thought the organization/organization domain will do but it is too general also there is Business domain.
Why not use /business/business_operation or /business/consumer_company or some other more appropriate type if /organization/organization is too broad?
A bigger issue with your query is that it is only going to find entities contained directly in Japan, not those contained in all locations contained by Japan (e.g. prefectures, cities, etc). You may want to investigate using the Freebase Search API instead of MQL since I think it will compute the closure for you (or do radius searches). Alternatively, you'll probably need to run a few variations of you query with different levels of location nesting.
Here are some example search queries/filters:
https://developers.google.com/freebase/v1/search-output
restaurants near SF Ferry building - filter=(all type:restaurant (within radius:1000ft lon:-122.39 lat:37.7955))
https://developers.google.com/freebase/v1/search-cookbook
Japanese volcanos - filter: (all category:volcano (any part_of:japan))

Amazon MWS Report Type "_GET_MERCHANT_LISTINGS_DATA_" result attributes meaning

In Amazon MWS API, when requesting report of type "_GET_MERCHANT_LISTINGS_DATA_"
What is the difference between the returned attributes:
product-id
listing-id
asin1
I also have tried to find any reference for the tab-delimited report types, but it seems to be scattered all around the web. The best description I found was part of the instructions for the Amazon Inventory Loader. (Note: may require a MWS seller login, the corresponding XLS does not have all columns described on the linked webpage) That page should answer most of your questions.
Since the link above might require a login, here's a short description on what these columns do:
asin1 refers to an item's Amazon Standard Identification Number. Every item on Amazon has such a number, there even is a Wikipedia entry describing what it is.
product-id along with product-id-typerefers to the item's non-Amazon standard identification number, if such a thing exists (otherwise it'll contain a copy of the item's ASIN).
product-id-type=1 -> product-id is ASIN
product-id-type=2 -> product-id is ISBN.
product-id-type=3 -> product-id is UPC
product-id-type=4 -> product-id is EAN (now called GTIN)
sku is your own item identifier such as part number. You created the link between an ASIN and your own SKU by creating the product. (I know you didn't ask for this, but this is for the sake of completeness)
listing-id There does not seem to be a lot of documentation on what theses are. There is a page explaining how to find out an item's listing id. It does not say why you'd ever want to know, though. I assume a listing ID identifies a certain seller's (your) offer for a specific item, but all MWS requests I've ever done either required me to link to a ASIN or my own SKU, but there may be others that require this id.
Sidenote: I find it weird that a single listing-id may relate to more than one ASIN - otherwise, why are there columns named asin2 and asin3?

DBpedia : Get list of Chinese universities and their adresses to populate google map?

I'm trying to get list of Chinese universities and their adresses. The minimum being the City/Town name. I will use these addresses to populate a googlemap, fiddle here.
I saw interesting code such as:
SELECT ?resource ?value
WHERE {
?resource a <http://dbpedia.org/class/yago/CitiesAndTownsInDenmark> .
?resource <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?value > 100000)
}
ORDER BY ?resource ?value
Since CitiesAndTownsInChina doesn't work,
1. Where to find the exact name of the class I'am targeting ? and
2. Where to find dbpedia's operators manual ?
Note: I'am a very active user on Wikipedia, I'am well aware of all the data available there, but the dbpedia ontology/syntaxe/keywords is quite hard to get.
Personal note: queries on http://dbpedia.org/snorql/ , http://dbpedia.org/sparql/ , http://querybuilder.dbpedia.org/
(Expanding on my reply to How to find cities with more than X population in a certain country)
CitiesAndTownsInDenmark exists because people use the category http://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Denmark in wikipedia. Wikipedia categories are pretty loose and as a result there's a lot of variation in style, so even if a useful category exists the name may not be guessable.
In addition categories are maintained manually, and may not be consistently applied.
A good place to start is looking at the data. Visiting http://dbpedia.org/page/Beijing I see yago:MetropolitanAreasOfChina which seems promising, but if you follow that link you'll see it's not well populated.
As a consequence avoid relying on the existence of such categories and directly querying for populated places in a country. This information comes from wikipedia infoboxes, and they're much more consistent than categories. Taking Beijing as an exemplar again I found:
select ?s {
?s a <http://dbpedia.org/ontology/PopulatedPlace> ;
<http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/China>
}
(The relevant properties and values for my query were found by copying link location in the Beijing page)
with the result:
"http://dbpedia.org/resource/Hulunbuir"
"http://dbpedia.org/resource/Guangzhou"
"http://dbpedia.org/resource/Chongqing"
"http://dbpedia.org/resource/Kuqa_County"
"http://dbpedia.org/resource/Changzhou"
... nearly 3000 results ...
You'll notice that position is encoded multiple times (geo:lat and long, georss:point, various dbpprop:latd longd things), and there seem to be two values excitingly. You can either simply deal with the multiple values in whichever format you prefer, or try picking just one using GROUP BY and SAMPLE.
As for a manual, almost everything I know of are academic papers, and not very useful. However the data is reasonably self documenting.
for your first question:
you can see possible classes by querying one member of your intended set of entities (ex: Shanghai).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type WHERE {
<http://dbpedia.org/resource/Shanghai> rdf:type ?type.
FILTER regex(str(?type), ".*China", "i").
} LIMIT 100
which gives this result:
dbpedia:class/yago/MetropolitanAreasOfChina [http]
dbpedia:class/yago/PortCitiesAndTownsInChina [http]
dbpedia:class/yago/MunicipalitiesOfThePeople'sRepuBlicOfChina [http]
dbpedia:class/yago/PopulatedCoastalPlacesInChina [http]
they are CamelCase versions of the categories that you will find at the bottom of wikipedia pages. I was fooled for a while by the erroneous capitalization of RepuBlic and finally saw that it contains only 4 cities, so it is of limited use for you.
so I would propose to go with #user205512 answer and get the cities by linking 2 properties.
for your second question:
I would advice you to search/ask on http://answers.semanticweb.com

Resources