Resource-efficient way to obtain list of all Wikidata items with labels, altLabels and Description? - wikidata

I want to check if string s is contained in any wikidata item's label, altLabel or description and if so, return all of them. The sheer number of Wikidata items prohibits the use of SPARQL, because it will reach a timeout, so I need to do it locally. I did the same for properties before by performing this query and parsing the result locally:
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
?property a wikibase:Property .
OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
}
GROUP BY ?property ?propertyLabel ?propertyDescription
It produces a table that looks similar to this "official" one on wikidata.
What is a space- (and ideally time-)efficient way of obtaining a list/table of all wikidata items with labels, descriptions and altLabels just like the one above? Namely, can I somehow avoid downloading the whole Wikidata dump, parsing it and building the list myself with standard hardware?
I found this tool, but am not sure if is capable of doing what I need. I do not want to waste community resources either.

The wdumps tool works and would seem to be the closest to what you're asking for, i. e. a complete list. If you look at the list of recent runs of the tool, you may find what you need anyway, because it's a common ask.
Aside from working with the whole list locally, the documentation recommends the SPARQL interface to wikipedia's "regular" search engine, like this:
SELECT ?item ?label
WHERE
{
SERVICE wikibase:mwapi
{
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "Generator";
mwapi:generator "search";
mwapi:gsrsearch "inlabel:Frankfurt";
mwapi:gsrlimit "max".
?item wikibase:apiOutputItem mwapi:title.
}
?item rdfs:label ?label.
FILTER CONTAINS(?label, "Frankfurt")
}
And, as a third possibility, I want to mention the interface at https://query.wikidata.org/bigdata/ldf. This is a little-known API for the data. Judging by its speed and from its documentation, it is very efficient. But, as the linked example query shows, there are half a billion labels, so even a fast method of access such as this will be a challenge.

Related

Sort Weaviate results based on number field

We are using Weaviate to serve e-commerce results.
Our Weaviate database stores all the products we sell.
Based on the customer and the search term we create a vector and use this to query the database. This property is called search_engine_query_vector.
For example if a customer has a habit of buying expensive products and searches for "TV" the system will most likely create a vector which is "closer" to the more expensive TVs in the database. So their first page of results is the most expensive TVs.
While this works well 99% of the time we also want ppl to be able to sort based on price.
For this we will send a query to Weaviate, where we only return products which are close to our vector(It is assumed this is all the TVs). like below:
client.query.get("Product", ["sku", "responseBody", "_additional { certainty }",
"stores { ...on Store {storeId salesPrice additionalResponseBody}}"]).with_near_vector(
{"vector": search_engine_query_vector, "similarity": TV_CUTOFF})
.limit(10)
.sort_base_on_price()
My question is there functionality in the api analogous to sort_base_on_price?
you can assume price is a number field in the schema.
Great to hear you're working with Weaviate for an e-commerce solution.
Weaviate added an initial version of sorting functionality in version 1.13!
With the Weaviate version 1.13.0 we also released the python-client v3.5.0 that introduced this functionality too. You can find the needed method documentation for python here or for other clients here!
For your use case, you could also try the following GraphQL query in the Weaviate Console:
{
Get {
Product(nearVector :{
vector:[search_engine_query_vector],
certainty: TV_CUTOFF
}
sort:{
order:desc,
path:["price_field"]
}
limit: 10
) {
sku
responseBody
stores { ...on Store {
storeId
salesPrice
additionalResponseBody
}
_additional {
certainty
}
}
}
}

Trouble getting a SPARQL query using a URI

I'm having trouble acquiring instances from a subclass on Protege using SPARQL. The task is to create an ontology using information from a Website. To start this I have tried to get the contact information such as the Email addresses and Phone Numbers. I have provided screenshots of the Classes and Individual Tabs below:
Classes:
Individuals:
I want it to display a list of the two email addresses. I heard there was a way to get them using the URI.
How do I get the URI and how do I enter it after the rdf:type in the query's code?
As you can see in your screenshot, the URI of the 'Contact Email' class is http://www.semanticweb.org/cthom/ontologies/2021/untitled-ontology-34#Contact_Email. So, to get all instances of this class, you can use the following SPARQL query:
select ?contactEmail
where { ?contactEmail a <http://www.semanticweb.org/cthom/ontologies/2021/untitled-ontology-34#Contact_Email> }
Tip: try one of the online SPARQL tutorials to get a bit more familiar with the language.

How to store keywords in firebase firestore

My application use keywords extensively, everything is tagged with keywords, so whenever use wants to search data or add data I have to show keywords in auto complete box.
As of now I am storing keywords in another collection as below
export interface IKeyword {
Id:string;
Name:string;
CreatedBy:IUserMin;
CreatedOn:firestore.Timestamp;
}
export interface IUserMin {
UserId:string;
DisplayName:string;
}
export interface IKeywordMin {
Id:string;
Name:string;
}
My main document holds array of Keywords
export interface MainDocument{
Field1:string;
Field2:string;
........
other fields
........
Keywords:IKeywordMin[];
}
But problem is auto complete reads data frequently and my document reads quota increases very fast.
Is there a way to implement this without increasing reads for keyword ? Because keyword is not the real data we need to get.
Below is my query to get main documents
query = query.where("Keywords", "array-contains-any", keywords)
I use below query to get keywords in auto complete text box
query = query.orderBy("Name").startAt(searchTerm).endAt(searchTerm+ '\uf8ff').limit(20)
this query run many times when user types auto complete search which is causing more document reads
Does this answer your question
https://fireship.io/lessons/typeahead-autocomplete-with-firestore/
Though the receommended solution is to use 3rd party tool
https://firebase.google.com/docs/firestore/solutions/search
To reduce documents read:
A solution that come to my mind however I'm not sure if it's suitable for your use case is using Firestore caching feature. By default, firestore client will always try to reach the server to get the new changes on your documents and if it cannot reach the server, it will reach to the cached data on the client device. you can take advantage of this feature by using the cache first and reach the server only when you want. For web application, this feature is disabled by default and you can enable it like in
https://firebase.google.com/docs/firestore/manage-data/enable-offline
to help you understand this feature more check this article:
https://firebase.google.com/docs/firestore/manage-data/enable-offline
I found a solution, thought I would share here
Create a new collection named typeaheads in below format
export interface ITypeAHead {
Prefix:string;
CollectionName:string;
FieldName:string;
MatchingValues:ILookupItem[]
}
export interface ILookupItem {
Key:string;
Value:string;
}
depending on the minimum letters add either 2 or 3 letters to Prefix, and search based on the prefix, collection and field. so most probably you will end up with 2 or 3 document reads for on search.
Hope this helps someone else.

Dealing with numbers using Spring ExampleMatcher

I am new to Java and Spring, and I am building a sytem using Spring JPA. I am now working on my service and controller classes, and I would like to create a dynamic query. I have created a form, in which the user can enter values in the fields, or leave them blank. I then use example matcher to create an example based on non null fields and query objects in the database that match non null fields of the object.
It is working fine with Strings, and it works ok with numbers, in case the number entered by the user is matching the number in the database. What I would like to ask the community is: how can we, using Spring ExampleMatcher, add logic so that the query relating to numbers is not Select * from Projects where project.return = 10 but for instance Select * from Projects where project.return >=10?
It is a pretty basic question, but I have looked everywhere on the web, and I could not find an answer. All sources that I found said that ExampleMatcher deals only with Strings, but I find that strange that such a powerful system does not have a logic to deal with higherthan / lowerthan number type of criteria.
My code for the example matcher:
ExampleMatcher matcher = ExampleMatcher.matching()
.withIgnoreNullValues()
.withIgnoreCase()
.withIgnorePaths("projectId", "businessPlans", "projectReturn", "projectAddress.addressId")
I would like to add something like:
.withMatcher("projectAmountRaised", IsMoreThan(Long.parseLong()));
What I would have loved to have, but it is deprecated:
public static List getStockDailyRecordCriteria(Date startDate,Date endDate,
Long volume,Session session){
Criteria criteria = session.createCriteria(StockDailyRecord.class);
if(startDate!=null){
criteria.add(Expression.ge("date",startDate));
}
if(endDate!=null){
criteria.add(Expression.le("date",endDate));
}
if(volume!=null){
criteria.add(Expression.ge("volume",volume));
}
criteria.addOrder(Order.asc("date"));
return criteria.list();
}
I am thus looking for something similar... I could create a broad results list from just Strings criteria using ExampleMatcher, and then write my own logic to delete objects that do not fit number criteria, but I am sure there is a more elegant approach.
Thank you a lot for your help, and for your indulgence!
This is how you can use QBE and pageable with additional filters:
ExampleMatcher matcher = UntypedExampleMatcher.matching()
.withIgnoreCase()
.withIgnorePaths("startDate");
MyDao probe = new MyDao()
final Example<MyDao> example = Example.of(probe, matcher);
Query q = new Query(new Criteria().alike(example)).with(pageable);
q.addCriteria(Criteria.where("startDate").gte(probe.getStartDate()));
List<MyDao> list = mongoTemplate.find(q, example.getProbeType(), "COLLECTION_NAME");
PageableExecutionUtils.getPage(list, pageable, () -> mongoTemplate.count(q, example.getProbeType(), "COLLECTION_NAME"));

APIGEE querying data that DOESN'T match condition

I need to fetch from BaaS data store all records that doesn't match condition
I use query string like:
https://api.usergrid.com/<org>/<app>/<collection>?ql=location within 10 of 30.494697,50.463509 and Partnership eq 'Reject'
that works right (i don't url encode string after ql).
But any attempt to put "not" in this query cause "The query cannot be parsed".
Also i try to use <>, !=, NE, and some variation of "not"
How to configure query to fetch all records in the range but Partnership NOT Equal 'Reject' ?
Not operations are supported, but are not performant because it requires a full scan. When coupled with a geolocation call, it could be quite slow. We are working on improving this in the Usergrid core.
Having said that, in general, it is much better to inverse the call if possible. For example, instead of adding the property when the case is true, always write the property to every new entity (even when false), then edit the property when the case is true.
Instead of doing this:
POST
{
'name':'fred'
}
PUT
{
'name':'fred'
'had_cactus_cooler':true
}
Do this:
POST
{
'name':'fred'
'had_cactus_cooler':'no'
}
PUT
{
'name':'fred'
'had_cactus_cooler':'yes'
}
In general, try to put your data in the way you want to get it out. Since you know upfront that you want to query on whether this property exists, simply add it, but with a negative value. The update it when the condition becomes true.
You should be able to use this syntax:
https://api.usergrid.com/<org>/<app>/<collection>?ql=location within 10 of 30.494697,50.463509 and not Partnership eq 'Reject'
Notice that the not operator comes before the expression (as indicated in the docs).

Resources