How to search in Wikidata in specific categories? - wikidata

I am currently searching in Wikidata with the following query:
https://www.wikidata.org/w/api.php?action=wbsearchentities&language=da&limit=20&format=json&search=jordb%C3%A6r&uselang=da
I need to find different ingredients and food stuff.
So the query is searching for strawberries in Danish. My problem is that I get results like paintings and persons. Is there anyway to search in specific categories like food? or somehow limit the "noise" of "false" hits?
I tried to look at Wikidata and search on Google but its not clear to me what options I have.

You can use the Wikidata Query Service to do this.
To find all the food in Danish, you could use a query like this:
SELECT DISTINCT ?food ?label WHERE {
?food (wdt:P31?/wdt:P279*) wd:Q2095.
?food rdfs:label ?label.
SERVICE wikibase:label { bd:serviceParam wikibase:language "da". }
FILTER((LANG(?label)) = "da")
} ORDER BY ?label
query link
Or, to get all the food items labeled 'Jordbær' in Danish, you could do something like this:
SELECT DISTINCT ?food ?foodLabel WHERE {
?food (wdt:P31?/wdt:P279*) wd:Q2095 ;
rdfs:label "Jordbær"#da;
SERVICE wikibase:label { bd:serviceParam wikibase:language "da". }
}
query link

I'm afraid there is, at the moment, no easy way to get this kind of tailored search result, but having this same need (for books in our case), we ended up with 2 work-arounds:
1 - search and filter
make a general search
collect the Qids: here Q13158, Q14458220, Q12320330, etc
fetch their claims
parse the result to get each entity's list of P31 values
filter the claims to keep only entities with a P31 value of the desired domain. For instance, to keep only books, we keep entities that have a claim P31 → Q571 or what we consider being aliases of Q571. This list is static because books are "rather" consistent in their P31 values, but for your domain, you would probably need to generate that list dynamically from a SPARQL query to get a complete list of things that are considered subclasses of food or ingredients.
2 - filter and search
make the SPARQL request that get's all the valid results (see query.wikidata.org documentation), once for all (but needs to be updated periodically)
put all those results in your own search engine. See our Wikidata Subset Search Engine project
then, when needed, make your request to that search engine instead

Related

Search as you type functionality in amazon CloudSearch

How should I go about implementing search as you type in amazon CloudSearch to search Amazon dynamodb. Like the way algolia does it.
You can search-as-you-type by using a prefix search every time the user enters a character -- it would look something like this:
(prefix field=name 'dri')
The prefix search is necessary because a regular search for q=dri would not match drive, drivel, etc.
Here are the prefix search docs: http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html#searching-text-prefixes
If you don't want to specify the fields for your prefix search you can use a query of the form q=dri* | dri (the non-* term is necessary because q=dri* does not match the word "dri" -- it requires there to be at least one additional character).

How to model Not In query in Couch DB [duplicate]

Folks, I was wondering what is the best way to model document and/or map functions that allows me "Not Equals" queries.
For example, my documents are:
1. { name : 'George' }
2. { name : 'Carlin' }
I want to trigger a query that returns every documents where name not equals 'John'.
Note: I don't have all possible names before hand. So the parameters in query can be any random text like 'John' in my example.
In short: there is no easy solution.
You have four options:
sending a multi range query
filter the view response with a server-side list function
using a CouchDB plugin
use the mango query language
sending a multi range query
You can request the view with two ranges defined by startkey and endkey. You have to choose the range so, that the key John is not requested.
Unfortunately you have to find the commit request that somewhere exists and compile your CouchDB with it. Its not included in the official source.
filter the view response with a server-side list function
Its not recommended but you can use a list function and ignore the row with the key John in your response. Its like you will do it with a JavaScript array.
using a CouchDB plugin
Create an additional index with e.g. couchdb-lucene. The lucene server has such query capabilities.
use the "mango" query language
Its included in the CouchDB 2.0 developer preview. Not ready for production but will be definitely included in the stable release.

Freebase get singer from song

I want to develop an app that pulls the singers of any song that we query for. So if someone types in Carry On from the Some Nights album, the app is supposed to pull out who all sang that song. Thanks.
You can search for this using the Freebase Search API and Search Metaschema like this:
https://www.googleapis.com/freebase/v1/search?query=Carry+On&filter=(all+/music/release_track/release:"Some+Nights")&output=(/music/release_track/release+/music/release_track/recording./music/recording/artist)
There are three parts to this API request: the query, the filter and the output parameter. The query is simply the name of the track that you're looking for:
query=Carry+On
The filter parameter constrains the results to only tracks which are part of an album release named "Some Nights"
filter=(all+/music/release_track/release:"Some+Nights")
The output parameter tells the API which properties to return in the response. In this case we want to know which release the track is part of and which artist recorded the track.
output=(/music/release_track/release+/music/release_track/recording./music/recording/artist)
You'll notice that this query actually returns 8 matching tracks right now. This is because there were many different releases of the album which all contained recordings of that track (and not necessarily the exact same recording).
For what you're building it sounds like you should be able to just take the first result. You can constrain the search API to only return the first result by adding a limit parameter to the request:
limit=1

rails nested model query

I'm trying to work out how to use active record to return some data based on a nested model.
My relationship is setup as below:
User - Has many books
Book - Has many users
UserBook - belongs to user and belongs to book
I can access users through books like so:
book.users.first
book.users.second
etc.
I'd like to select all the books, that does not have a particular user.
I have generated a query like this, please note, the 'near', method is provided by the Geocoder gem.
Book.near(location, distance).joins(:users).where("users.id != #{#current_user.id}")
I believe the syntax is correct, no errors occur, however, the query still returns books with the current user.
The issue appears to be that if book.users contains a user id that is not current user id AND also contains the current user id, book is still returned.
I can get the desired result using code like this, but I presume there is a way to get ActiveRecord to do it for me.
search = Book.near(location, distance).reject do |book|
if book.users.include?(#current_user)
book
end
end

How to structure data in Riak?

I'm trying to figure out how to model data in Riak. Let's say you are building something like a CMS with two features, news and products. You need to be able to store this information for multiple clients X and Y. How would you typically structure this?
One bucket per client and then two keys news and products. Store multiple objects under each key and then use map/reduce to order them.
Store both the news and the products in the same bucket, but with a new autogenerated key for each news item and product item. That is, one bucket for X and one for Y.
One bucket per client/feature combination, that is, the buckets would be X-news, X-products, Y-news and Y-products. Then use map/reduce on the whole bucket to return the results in order.
Which would be the best way to handle this problem?
I'd create 2 buckets: news and products.
Then I'd prefix keys in each bucket with client names.
I'd probably also include dates in news keys for easy date ranging.
news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01
And optionally prefix product names with category names
products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle
Then in your map/reduce you could use key filtering:
// BigCorp News items
{
"inputs":{
"bucket":"news",
"key_filters":[["starts_with", "bigcorp"]]
}
// ... rest of mapreduce job
}
// Acme Blacksmithing items
{
"inputs":{
"bucket":"products",
"key_filters":[["starts_with", "acme_blacksmithing"]]
}
// ... rest of mapreduce job
}
// News for all clients from Feb 12th to 19th
{
"inputs":{
"bucket":"news",
"key_filters":[["tokenize", "_", 2],
["between", "2011-02-12", "2011-02-19"]]
}
// ... rest of mapreduce job
}
An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.
Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.
You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.
1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).
2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.
3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key).
Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.

Resources