Adding shard in collection using collection API in solr - collections

I am using Apache solr to create collection, shards. I am able to build collection using
sudo curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=demo&numShards=2&replicationFactor=1'
Here, collection name = "demo"
number of Shards = "2"
but when I am adding new shard using
sudo curl 'http://localhost:8983/solr/admin/collections?action=CREATESHARD&shard=shard3&collection=demo'
It is giving error :
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">1</int></lst><lst name="error"><str name="msg">shards can be added only to 'implicit' collections</str><int name="code">400</int></lst>
</response>

From the documentation for CREATESHARD:
Shards can only created with this API for collections that use the 'implicit' router. Use SPLITSHARD for collections using the 'compositeId' router. A new shard with a name can be created for an existing 'implicit' collection.
So the proper way to do this is to issue a SPLITSHARD command instead, and then remove the old shard after the two new shards have been created. From the SPLITSHARD documentation:
Splitting a shard will take an existing shard and break it into two pieces. The original shard will continue to contain the same data as-is but it will start re-routing requests to the new shards. The new shards will have as many replicas as the original shard. After splitting a shard, you should issue a commit to make the documents visible, and then you can remove the original shard (with the Core API or Solr Admin UI) when ready.

Shards can only created with this API for collections that use the 'implicit' router (i.e., when the collection was created, router.name=implicit). A new shard with a name can be created for an existing 'implicit' collection.
reference,https://solr.apache.org/guide/8_6/collection-management.html

Related

Re-run all changes in Lease Collection

I created several new pipelines in Azure Data Factory to process CosmosDB Change Feed (which go into Blob storage for ADF processing to on-prem SQL Server), and I'd like to "resnap" the data from the leases collection to force a full re-sync. Is there a way to do this?
For clarity, my set-up is:
Change Feed ->
Azure Function to process the changes -> Blob Storage to hold the JSON documents -> Azure Data Factory which picks up the Blob Storage documents and maps them to on-prem SQL Server stored proc inserts/updates.
The easiest and simplest way is to do it is to simply delete the lease documents and make sure that the StartFromBeginning setting is set to true. Once restarted the change feed service will recreate the leases (if the appropriate setting is configured to true) and reprocess all the documents.
The other way to do so is to update every single lease document and reset the Continuation token "checkpoint" to null, however I don't recommend this method since you might accidentally miss a lease which can lead to issues.

How to delete all data in a partition?

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

Determine if Cosmos DB NotFound due to missing collection vs. document

Is there a way to programmatically determine from a DocumentClientException where StatusCode == HttpStatusCode.NotFound whether it was the document, the collection, or the database that was not found?
I'm trying to figure out whether I can implement on-demand collection provisioning and only call DocumentClient.CreateDocumentCollectionIfNotExistsAsync when I need to. I'm trying to avoid calling it before making every request (presumably this adds an extra network roundtrip to every request). Likewise, I'm trying to avoid calling it on error recovery when I know it won't help.
From experimentation with the local emulator, the only field I see varying in these three cases is DocumentClientException.Error.Message, and only when the database cannot be found. I generally try to avoid exception dispatching based on human-readable messages.
Wrong database name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Owner resource does not exist\"]}...
Correct database name, wrong collection name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
Correct database name, correct collection name, incorrect document ID:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
I'm planning to use a database with its own offer. Since collections inside a database with its own offer are cheap, I'm trying to see whether I can segregate each tenant in my multi-tenant application into its own collection. Each tenant ends up having a different indexing and default TTL policy. The set of collections is not fixed and changes dynamically during runtime as new tenants sign up. I cannot predict when I will need to add a new collection. There's no new tenant notification: I just get a request that I need to handle by creating a document in a possibly non-existent collection. There's a process to garbage collect unused collections.
I'm using the NuGet package Microsoft.Azure.DocumentDB.Core Version 1.9.1 in a .NET Core 2.1 app targeting a SQL API Cosmos DB instance.
If you look at the Message property in detail, you should see following strings that informs whether 404 Not Found response was generated due to Document vs Collection.
ResourceType: Document
ResourceType: Collection
It's not ideal but you can try to regex this information out of error message.

Does Firebase Realtime Database REST API support multi path updates at different entity locations?

I am using the REST API of Firebase Realtime Database from an AppEngine Standard project with Java. I am able to successfully put data under different locations, however I don't know how I could ensure atomic updates to different paths.
To put some data separately at a specific location I am doing:
requestFactory.buildPutRequest("dbUrl/path1/17/", new ByteArrayContent("application/json", json1.getBytes())).execute();
requestFactory.buildPutRequest("dbUrl/path2/1733455/", new ByteArrayContent("application/json", json2.getBytes())).execute();
Now to ensure that when saving a /path1/17/ a /path2/1733455/ is also saved, I've been looking into multi path updates and batched updates (https://firebase.google.com/docs/firestore/manage-data/transactions#batched-writes, only available in Cloud Firestore?) However, I did not find whether this feature is available for the REST API of the Firebase Realtime Database as well or only through the Firebase Admin SDK.
The example here shows how to do a multi path update at two locations under the "users" node.
curl -X PATCH -d '{
"alanisawesome/nickname": "Alan The Machine",
"gracehopper/nickname": "Amazing Grace"
}' \
'https://docs-examples.firebaseio.com/rest/saving-data/users.json'
But I don't have a common upper node for path1 and path2.
Tried setting as the url as the database url without any nodes (https://db.firebaseio.com.json) and adding the nodes in the json object sent, but I get an error: nodename nor servname provided, or not known.
This would be possible with the Admin SDK I think, according to this blog post: https://firebase.googleblog.com/2015/09/introducing-multi-location-updates-and_86.html
Any ideas if these atomic writes can be achieved with the REST API?
Thank you!
If the updates are going to a single database, there is always a common path.
In your case you'll run the PATCH command against the root of the database:
curl -X PATCH -d '{
"path1/17": json1,
"path2/1733455": json2
}' 'https://yourdatabase.firebaseio.com/.json'
The key difference with your URL seems to be the / before .json. Without that you're trying to connect to a domain on the json TLD, which doesn't exist (yet) afaik.
Note that the documentation link you provide for Batched Updates is for Cloud Firestore, which is a completely separate database from the Firebase Realtime Database.

Creating collections in solr4.4 cloud with different schemas

How can I create collections in solr4.4 cloud with different schema.
I want to create collection in solr4.4 cloud having two shard. However each collection should have schema of its own i.e each collection have different schema
you would have to upload two configs to zookeeper for each schema, and then make two collection apis call like:
curl -v "http://localhost:8080/solr/admin/collections?action=CREATE&name=collection1&numShards=2&collection.configName=collection1"
and
curl -v "http://localhost:8080/solr/admin/collections?action=CREATE&name=collection2&numShards=2&collection.configName=collection2"

Resources