Can Riak do facet queries? - riak

Riak has both secondary indexes and (solr-ish) search.
But can it do faceted searches like Solr does? That is:
fetch facets that are applicable to the results returned
drill down into facets by contraining facet values
bucket the ranges (eg: cities that start with a C)

The Riak 2.0 release coming later this year includes integrated Solr support. I.e. it ships with Solr 4.x included. The project is called "Yokozuna" and has been under development for the last year. If enabled it allows you to create indexes, associate a Riak bucket with an index, and all objects stored under that bucket will be converted to Solr documents and then shipped to Solr for indexing. You can then query via a pass-through HTTP interface (which allows you to use standard Solr clients) or via Riak's protobuff search interface. Basically, it combines the distributed and highly-available aspects of Riak with the robust search capabilities of Solr. Here are various links to learn more.
Code: https://github.com/basho/yokozuna
Slides Berlin Buzzwords June 2013: https://speakerdeck.com/rzezeski/yokozuna-scaling-solr-with-riak

Solr-compatible interface of Riak is more like a marketing feature, than actually usable in real applications. Secondary indices are simple exact match and value range queries. So out-of-the-box Riak can not do it, some time ago it was clearly stated in official wiki, but that sentance is gone, only some traces left: http://news.ycombinator.com/item?id=2377680.
But this functionality can be quite easily implemented using MapReduce with search results as input or simply on client by running through search results and generating data structure with possible filters and counts of available items matching that criteria.

Related

How add ontology to `virtuoso-sparql-endpoint-quickstart`?

I'm new to Docker's world. I want to query an ontology locally. I have already configured
virtuoso-sparql-endpoint-quickstart.
It works, and my endpoint is http://localhost:8890/sparql.
Now I want to query my own ontology (not DBpedia). So can I still use the same endpoint? How can I add my ontology to virtuoso?
Please note that an ontology is a vocabulary used to describe one or more classes of entities. The descriptions themselves are typically referred to as instance data, and queries are usually run over such instance data. (There are a few ontologies used to describe ontologies, and these descriptions are also instance data, and queries might be made against them.)
There are a number of ways to load data into Virtuoso. The most useful for most people is the Bulk Load facility. For most purposes, you'll want to load your data into one or more distinct Named Graphs, such that queries can be scoped to one, some, or all of the those Named Graphs.
Any and all queries can be made against the same http://localhost:8890/sparql endpoint. Results will vary depending on the Named Graphs identified in your query.

How can I sort my results by property length?

I have these user vertices:
g.addV("user").property(single,"name", "bob")
g.addV("user").property(single,"name", "thomas")
g.addV("user").property(single,"name", "mike")
I'd like to return these sorted by the length of the name property.
bob
mike
thomas
Is this possible with Gremlin on AWS Neptune without storing a separate nameLength property to sort on?
Currently the Gremlin language does not have a step that can return the length of a string. This is something that may be added to Gremlin in a future version, possibly in the 3.6 release. You can of course do it using closures (in-line code) but many hosted TinkerPop graph stores, including Amazon Neptune, do not allow arbitrary code blocks to be run as part of Gremlin queries. At this moment in time this will need to be handled application side when using Neptune, or as you suggest, using a nameLength property. This is an area where the TinkerPop community recognizes some additional steps are needed and does plan to prioritize this work.

Weaviate Search Graph Vs. GA of IBM Graph

IBM Graph service is only compared to how it can add and store properties in the form of key/value pairs associated with the data, for both vertices and nodes connected by edges, rather than the more traditional form of storing the data in tabular form using rows and columns. However, how is the GA of IBM Graph compared to Knowledge Graph with Weaviate Search Graph (GraphQL - RESTful - Semantic Search - Semantic Classification - Knowledge Representation)?
The answer is based on the assumption that you mean JanusGraph because of this article.
JanusGraph is a Graph DBMS optimized for distributed clusters.
Weaviate is a GraphQL based semantic search engine.
The main difference is that JanusGraph focusses on large graphs whereas Weaviate focusses on search where results are represented in graph format.
You might pick JanusGraph if you want to store and analyze a large graph.
You might pick Weaviate if you build a search engine and/or store data in graph format.
Something Weaviate can do what other search engines (regardless if they are graph-based or not) is index your data semantically, e.g., you can find a data object about the publication Vogue by searching for fashion.
Weaviate query example:
Disclaimer:
I'm part of the team working on Weaviate.

CosmosDB API selection: does it dictate how the data is stored, or only how we communicate with the instance?

When creating a CosmosDB instance, we can choose the API that we will use to communicate with the instance (e.g. SQL, MongoDB, Cassandra, etc.)
What is not clear to me is - does this selection dictates how the data is stored, or only the way we communicate with the instance? For example, if we choose MongoDB, does it mean that CosmosDB will store data in a MongoDB fashion?
The choice of API does not change how the data is stored. Cosmos DB always stores data using something called atom-record-sequence (ARS) which is essentially a set of primitive types, structs and arrays. The database engine translates the native ARS format into the data structures used by the various APIs (i.e. json documents, table rows, etc.)
So the answer to your question is that the choice of API only impacts how you communicate with the databases for that Cosmos DB account.
As David Makogon points out in his comment on another answer, while the way the data is stored is the same regardless of the API used, the content of the data will be different because each API requires it's own metadata so that the underlying data can be projected into the format expected by each API.
Here is a good technical overview of how Cosmos works under the hood.
https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/
Data is always stored in the same fashion (as a bunch of json documents), only the way you interact with the data changes
https://learn.microsoft.com/en-us/azure/cosmos-db/introduction#develop-applications-on-cosmos-db-using-popular-open-source-software-oss-apis

Is there a definitive list of all possible labels or objects or tags for the Microsoft Cognitive Services Computer Vision API?

I'm building a web application using multiple APIs which are a part of the Microsoft cognitive Services bundle.
The API returns detected objects such as person, man, fly, kite, etc. I require a list of all possible objects that the API is capable of detecting and also the hierarchy(if available).
From the database normalization perspective, it is information that I need. Is there any documentation that I am missing out on ?
There are thousands of objects to detect, and their list is not available publicly.
That being said, the image categories are available publicly in the documentation
Computer vision can categorize an image broadly or specifically, using
the list of 86 categories in the following diagram.
If you generally need a list of objects to use then you can use publicly available object datasets including the following (arranged from oldest to newest):
COIL100
SFU
SOIL-47
ETHZ Toys
NORB
CalTech 101
PASCAL VOC
GRAZ-02
ALOI
LabelME
Tiny
CIFAR10 and CIFAR100
ImageNet
BOSS
Office
BigBIRD
MS-COCO
iLab-20M
CURE-OR
However, it is recommended to normalize your database based on the JSON you receive from the API, for example, you already know that you are going to receive objects when trying to use Detect Objects, and categories when trying to use Analyze Image, then you can work with that!

Resources