Searchable data in CosmosDb with Graph api - azure-cosmosdb

My team uses CosmosDb to store data.
For our use case some of this data needs to be searchable.
Currently there are some filters in the Gremlin that has been implemented in CosmosDb so far, but not enough to suit our needs, which are mainly search in text.
This would be implemented to make a fuzzy search for a vertex, say, a person, where both name, email and company name would be included in the text.
In https://github.com/Azure/azure-documentdb-dotnet/issues/413 there was some talk of some string filters, but there has been no updates for a while.
My question is would it be better to use Azure Search for this use case?
We could add a step in the pipeline that would synchronize our data to an Azure Search service upon doing CRUD, but this would mean slower CRUD as well as data duplication, and the consumer of our api would have to use a search endpoint to get an id, and then do an additional lookup afterwards to get any related data.

If you can expose the data you want to make searchable to Azure Search using a "vanilla" (non-Gremlin) SQL API query, consider using Azure Search indexers for Cosmos DB. However, for simple string matching searches Azure Search may be an overkill - use it if you need more sophisticated searches (natural language-aware in many languages, custom tokenization, custom scoring, etc.).
If you need a tighter integration between Cosmos DB Graph API and Azure Search, vote for this UserVoice suggestion.

Related

Effective way to query BMC Helix Remedy database other than Smart Reporting

Currently we are connecting the AR System through the Oracle database for this purpose. I need to know: is there any alternative way to access or query the Remedy database effectively? Is there any built-in API which we can utilise which will increase the efficiency of the work?
What could be used is the REST api, in which you can query directly the forms.
Please check following url:
REST API Doc
It will result with JSON object containing all data.
In order to obtain access to all forms you need to create a "service" user with fixed license and permissions to the forms which you would like to read using the API call.
You can query the Oracle back-end directly, with a few caveats. It should only be for reading data, not writing or modifying data. Otherwise, you could break data integrity as well as bypass workflow that should be fired. Also, this direct access does not enforce any permissions, nor does it translate any of the data. For example, selection fields come back as a number instead of their string value, dates are in epoch format, etc.
There is a Remedy ODBC driver, which isn't being updated, nor does it support joins. However, you can open multiple connections with it and join them manually. Plus, it does handle permissions and translations for you.
https://docs.bmc.com/docs/ars1911/odbc-database-access-introduction-896318914.html
If you know in advance what joins you will be doing, you should setup join forms within Remedy. That way the joins are done efficiently in the database. Otherwise, you are stuck with either of the above solutions or using one of the APIs which don't support ad-hoc joins.

Is there a reason why not model social network with SQL API?

In one of our apps we need to introduce a feature of users being able to choose friends, see friends activity, etc.
So far we use CosmosDb container with SQL API (for things app does beside this social network aspect).
I am wondering is there a reason not to model it with SQL API but to go strictly with Gremlin?
I’ve seen examples on Microsoft site about modeling basic social network done with ordinary SQL API but i am not sure if i am missing something that would bite me down the road in a case not going with Gremlin?
You should be safe in choosing either. From docs:
Each API operates independently, except the Gremlin and SQL API, which
are interoperable.
The stored data is JSON in both cases.
More on choosing an API.

CosmosDB creation - API option (SQL vs Graph)

I'm interested in using the Azure CosmosDB for it's Graph capability.
Looking through the docs i saw that it sores graph vertices and edges as JSON documents (with an agreed schema) and so it can be accessed as a plain old DocumentDB.
Taking this into consideration what is the meaning of the API selection you need to make when creating a new instance (link)?
eg :
what am i losing if i create the DB as SQL (DocumentDB) and
manipulating data via the graph part of the client (eg CreateGremlinQuery)
what am i losing if i create the DB as Graph and
manipulating data via the DocumentDB part of the client (eg CreateDocumentAsync)
UPDATE : I am aware of the portal difference (as described below by Jesse Carter). I am interested if this switch drives anything else under the hood in the specific scenario of choosing between SQL(Document DB) vs Graph
There is no functional difference from the perspective of interacting with your Cosmos Collection through either SQL or Graph APIs regardless of which API you choose at creation time.
HOWEVER, there is a difference from the perspective of the Azure portal when navigating your resources. Collections created specifically using the Graph API will get tagged as such and enable additional UI features in the portal for executing Gremlin queries and basic graph visualization.
If you don't care about those querying abilities in the Azure portal, then you're fine to create the collection with either option.
API selection is to avoid confusion for users who are only familier with gremlin and don’t want to learn documentDB.
If you are an advanced user, using both graph and documentDB will give you more power.
Note that we are committed to making the gremlin and documentDB SQL integration even more seamless.
Please drop us a note askcosmosdbgraphapi#microsoft.com, if you want to lean more or set up a time to talk to us.
Jayanta

How to work around a lack of an `IN` clause in Firebase Query with a query for 500 filters

I need to create a typical find friends feature in my mobile app that's using Firebase. The user would upload a list of hashed contact emails or phone numbers from their address book and the server would return a list of usernames that are already using the application. The typical user would have around 500 contacts in their address book.
This would be pretty straightforward to set up using a traditional SQL or Mongo database but in Firebase this would be difficult because I don't see any WHERE IN clauses with Firebase Query and it seems like it would be very inefficient using a Firebase Database for this. Even if I created a specific HashedPhoneNumbers collection with the hash being the id, it still seems like a monster query. Is there a way to make this query run efficiently in Firebase?
i.e. SELECT username from Users WHERE phoneHash IN [list of 500 phone hashes]
Alternatively, if I were to use Google Cloud DataStore looks like it supports chaining a bunch of AND email_hash = XXX filters together, but I don't know how efficient that would be if the filter list is 500 filters chained together.
Yes it does. For that i recomand you to see: The Firebase Database For SQL Developers and NoSQL Data Modeling Techniques.
It's it does not support this kind of query but there method with which you can achieve the same thing. Explained in the above tutorial.
Yes it does. For this i recomand you to use Cloud Storage For Firebase.
As a conclusion, i kindly recomand you using Firebase.

Putting records into the Elasticsearch index before the relational database

I have an application which consumes RSS feeds and makes them searchable by performing the following steps:
pulling article from the feed URL
storing that data in a relational DB
indexing the data in Elasticsearch
I'd like to reverse this process so that I can use the RSS River Elasticsearch plugin to pull data from feeds. However, this plugin integrates directly with Elasticsearch, bypassing my relational DB (which is a problem for other parts of the application which rely on each article having a record in the DB).
How can I have Elasticsearch notify the DB when a new article has been indexed (and de-indexed)?
Edit
Currently I'm using Ruby on Rails 4 with a PostgreSQL DB. RSS feeds are fetched in the background using Sidekiq to manage jobs. They go directly into PG and are then indexed by Elasticsearch. I'm using Chewy to provide an interface to the ES index. It doesn't support callbacks like I'm looking for (no Ruby library does afaik?).
Searching queries ES for matches then loads the records from PG to display results.
It sounds like you are looking for the sort of notification/trigger functionality described in this feature request. In the absence of that feature I think the approach suggested in that thread by the user "cravergara" is your best bet - that is, you can alter the RSS river Elasticsearch plugin to update your DB whenever an article is indexed.
That would handle the indexing requirement. To sync the de-indexing, you should make sure that any code that deletes your Elasticsearch documents also deletes the corresponding DB records.

Resources