Apache Solr - Lucene - Zip Code Radius Search - collections

I have an existing collection of PERSON's records already loaded to my Solr Server. Each record has a field for a ZIPCODE. I know that Solr now supports spatial search with the geodist() function, the problem is that I do not have the lat and long for each person.
Can I add another collection to Solr mapping the ZipCodes to LATs and LONGs then JOIN them like you would with SQL? Is this even possible with Solr?

AFAIK, there isn't a way to translate Zipcode to Lat/Long in Solr.
Most Geospatial queries (not restricted to Solr) use Latitude and Longitude to perform a radius search. So this isn't a Solr specific problem.
I would recommend enriching the data that is imported into Solr using a GeoCoding API.
You can update your existing index to have these fields populated for every document, but if possible I would prefer recreating the Solr Index with this data.
I wouldn't include another collection for this purpose, JOIN isn't supported in Solr, as it isn't a relational data store.
Edit:
Solr does support JOIN, but in your case I would still not go for it, unless I have to.

Related

cosmos database to be indexed in pull approach + files

I have items and files. There is a 1:m relationship between items and files. Items are stored in a relational database and files in folders. The association between items and files is stored in the relational database. Files can be pdfs, word docs, email etc. I intend to POC cognitive search to be able to search items and associated documents.
My current understanding is, that a pull approach might be cheaper in comparison to the push approach when using cognitive search (the latency requirements are not stringent and eventual consistency is OK). Hence, I intend to move the data into a cosmos database, which can then be indexed via the pull approach. Curious, how does this work with the documents? Would I need to crack them on prem?
There is also the option of attachments and blob storage of documents. The latter is most likely more future proofed. I would think that if I put documents into blob storage, cognitive search indexing would still need to crack the documents and apply skills?
This sounds like a good approach. In terms of data sources, Cognitive Search supports CosmosDB and blob storage and some relationship databases. I would probably:
Create a new Cognitive Search resource in the Azure portal.
In that Cognitive Search resource, click "Import data" to create a new indexer (this is the "pull" option that you mention above). You may want to do this twice, assuming that your items are in CosmosDB or a relational DB, and your documents are stored separately in blob storage.
The first indexer has a data source which points to your items/relationship data in whatever DB you decide to put them, applies any skills that you want, and puts everything in an index.
The second indexer has a different data source which points to your documents in blob storage, applies any skills that you want, and puts everything in the same index.
If you use indexers, they will take care of the document cracking. If you push data directly into the index, you will need to crack the documents yourself.
This gives a simple walkthrough of creating an indexer with the portal (skillset is optional, and change the data source to your own data): https://learn.microsoft.com/en-us/azure/search/cognitive-search-quickstart-blob

Does data always stored in JSON format even if we choose MongoDB API, Cassandra, Tables, or Gremlin(Grapgh API)?

I was going through the Indexing policies article by Microsoft: https://learn.microsoft.com/en-us/azure/cosmos-db/index-overview
states that "Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. I could able to relate it when we opt SQL API, Azure table storage API. I am still wondering what is the ultimate underlying structure inside CosmosDB? It is vary as per the data model we opt?

How can I add fields named Latitude and Longitude using GeoFire in Firebase?

I'm developing a Location Tracking App with Android Studio and need to store coordinates in a normalized manner (One Variable in One Field). Can I modify the location data created by GeoFire in Firebase? I would like to have a single field called Latitude and another one called Longitude. In the picture above you can see that the coordinates are both stored below one field called ' l ', I would like to modify this, or copy them as separate childs of the current userID in the node Location. Would greatly appreciate any help, thanks!
There's nothing built into Geofire for you control the property names. But since Geofire is open-source, you can modify it to use the property names you want.
Just be aware that Geofire uses these shorter names to limit the bandwidth it uses, so your users see a bandwidth increase.

What's the recommended way of manually indexing a record in OrientDB?

What's the recommended way of manually indexing a newly inserted record, and re-indexing a changed record, in OrientDB?
I know of the possibility of automatic indexes with OrientDB, but I need to map the indexed property to a lowercase version (for case insensitive search). So I will need to index records manually. Is it recommended/possible to make a database procedure to (re-)index a record?
In OrientDB you've both:
automatic indexes, updated by OrientDB (like RDBMS)
manual indexes, on your charge, it's like a persistent map
For more information look at the official guide: https://github.com/orientechnologies/orientdb/wiki/Indexes

Can Riak do facet queries?

Riak has both secondary indexes and (solr-ish) search.
But can it do faceted searches like Solr does? That is:
fetch facets that are applicable to the results returned
drill down into facets by contraining facet values
bucket the ranges (eg: cities that start with a C)
The Riak 2.0 release coming later this year includes integrated Solr support. I.e. it ships with Solr 4.x included. The project is called "Yokozuna" and has been under development for the last year. If enabled it allows you to create indexes, associate a Riak bucket with an index, and all objects stored under that bucket will be converted to Solr documents and then shipped to Solr for indexing. You can then query via a pass-through HTTP interface (which allows you to use standard Solr clients) or via Riak's protobuff search interface. Basically, it combines the distributed and highly-available aspects of Riak with the robust search capabilities of Solr. Here are various links to learn more.
Code: https://github.com/basho/yokozuna
Slides Berlin Buzzwords June 2013: https://speakerdeck.com/rzezeski/yokozuna-scaling-solr-with-riak
Solr-compatible interface of Riak is more like a marketing feature, than actually usable in real applications. Secondary indices are simple exact match and value range queries. So out-of-the-box Riak can not do it, some time ago it was clearly stated in official wiki, but that sentance is gone, only some traces left: http://news.ycombinator.com/item?id=2377680.
But this functionality can be quite easily implemented using MapReduce with search results as input or simply on client by running through search results and generating data structure with possible filters and counts of available items matching that criteria.

Resources