I can't delete a collection it's telling me false every single time..
I do a getCollections() it gives me a lot of tmp.mr.mapreduce_1299189206_1618_inc (the ones I want to drop) I thought they were deleted during disconnection, but they're not (in my case).
Then when I do: db["tmp.mr.mapreduce_1299188705_5595"].drop() I always get false and it's not deleted.
The logs are not really helpful:
Wed Mar 9 11:05:51 [conn4] CMD: drop customers.tmp.mr.mapreduce_1299188705_5595
Now I maxed up my namespaces and I cannot create more collections help?
BTW, I can go down, this is not production (or even in production I can put it down too).
Now I maxed up my namespaces and I cannot create more collections help?
By default MongoDB has 20k namespaces, so that's a lot of dead M/R collections. According to the logs, the DB is getting the request to drop the collection. So the question now is whether or not MongoDB has gotten into a bad state.
Can you take it down and re-start to ensure that all connections are closed?
Is it a lot of data? Can you take it down and run --repair?
Related
I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.
Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.
I'm a SolrCloud newbie, my setup is 3 shards, 3 replicas, external Zookeeper
Today I found shard3 down, replica3 had taken over as leader, so indexing was occurring to replica3 not shard3. I stopped Tomcat/SOLR in reverse order (R3,R2,R1,S3,S2,S1) and restarted in forward order (S1,S2,S3,R1,R2,R3). I did not delete any tlog or replication.properties files. The cloud graph shows all hosts with their correct assignments. As I understand it these assignments are set in Zookeeper on the first startup.
My question is how does the data that was indexed to replica3 get back to the revived shard3?
And surprisingly shard3 = 87G while replica3 = 80G.
Confused!
Dan,
The size of replicas are not important, only the number of documents that collection has.
The way Solr works, you can have deleted documents in your collection that only are deleted in merge operations, this extra 7G can be deleted documents.
1) As far as I know when the shard3 is up, live and running it is zookeeper which does the data sync job between shard and replica3.
2) Regarding your second question, may be the replica3 is in optimization state and hence you are seeing less data size and shard3 is yet to be optimized by SOLR. (This is just a wild guess)
We have a long request that does a catalog search and then calculates some information and then stores it in another document. Do call is made to index the document after the store.
In the logs we're getting errors such as INFO ZPublisher.Conflict ConflictError at /pooldb/csvExport/runAgent: database conflict error (oid 0x017f1eae, class BTrees.IIBTree.IISet, serial this txn started with 0x03a30beb4659b266 2013-11-27 23:07:16.488370, serial currently committed 0x03a30c0d2ca5b9aa 2013-11-27 23:41:10.464226) (19 conflicts (0 unresolved) since startup at Mon Nov 25 15:59:08 2013)
and the transaction is getting aborted and restarted.
All the documentation I've read says that due to MVVC in ZODB ReadConflicts no longer occur. Since this is written in RestrictedPython putting in a savepoint is not a simple option (but likely my only choice I'm guessing).
Is there another to avoid this conflict? If I need use savepoints, can anyone think of a security reason why I shouldn't whitelist the savepoint transaction method for use in PythonScripts?
The answer is there really is no way to do a large transaction that involves writes, when there are other small transactions on the same objects at the same time. BTrees are supposed to have some special conflict handling code but it doesn't seem to work in the cases I'm dealing with.
The only way I think to manage this is something like this code which breaks the reindex into smaller transactions to reduce the chance of a conflict. There really should be something like this built into zope/plone.
After moving most of our BT-Applications from BizTalk 2009 to BizTalk 2010 environment, we began the work to remove old applications and unused host. In this process we ended up with a zombie host instance.
This has resulted in that the bts_CleanupDeadProcesses startet to fail with error “Executed as user: RH\sqladmin. Could not find stored procedure 'dbo.int_ProcessCleanup_ProcessLabusHost'. [SQLSTATE 42000] (Error 2812). The step failed.”
After looking at the CleanupDeatProcess process, I found the zombie host instance found in the BTMsgBox.ProcessHeartBeats table, with dtNextHeartbeatTime set to the time when the host was removed.
(I'm assuming that the Host Instance Processes don't exist in your services any longer, and that the SQL Agent job fails)
From looking at the source of the [dbo].[bts_CleanupDeadProcesses] job, it loops through the dbo.ProcessHeartbeats table with a cursor (btsProcessCurse, lol) looking for 'dead' hearbeats.
Each process instance has its own cleanup sproc int_ProcessCleanup_[HostName] and a sproc for the heartbeat watchdog to call, viz bts_ProcessHeartbeat_[HostName] (although FWR the SPROC calls it #ApplicationName), filtered by WHERE (s.dtNextHeartbeatTime < #dtCurrentTime).
It is thus tempting to just delete the record for your deleted / zombie host (or, if you aren't that brave, to simply update the Next dtNextHeartbeatTime on the heartbeat record for your dead host instance to sometime next century). Either way, the SQL agent job should skip the dead instances.
An alternative could be to try and re-create the Host and Instances with the same name through the Admin Console, just to delete them (properly) again. This might however cause additional problems as BizTalk won't be able to create the 2 SPROCs above because of the undeleted objects.
However, I wouldn't obviously do this on your prod environment until you've confirmed this works with a trial run first.
It looks like someone else got stuck with a similar situation here
And there is also a good dive into the details of how the heartbeat mechanism works by XiaoDong Zhu here
Have you tried BTSTerminator? That works for one-off cleanups.
http://www.microsoft.com/en-us/download/details.aspx?id=2846
Within alfresco, I want to delete a node but I don't want to be used by any other users in a cluster environment.
I know that I will use LockService for lock a node (in a cluster environment) as in the folloing lines:
lockService.lock(deleteNode);
nodeService.deleteNode(deleteNode);
lockService.unlock(deleteNode);
the last line may cause an exception because the node has already been deleted, and indeed it causes the exception is
A system error happened during the operation: Node does not exist: workspace://SpacesStore/cb6473ed-1f0c-4fa3-bfdf-8f0bc86f3a12
So how to ensure concurrency in a cluster environment when delete a node to prevent two users to access the same node at the same time one of them want to update it and the second once want o delete it?
Depending on your cluster environment (e.g. same DB server used by all Alfresco instances), transactions might most likely just be enough to ensure no stale content is used:
serverA(readNode)
serverB(deleteNode)
serverA(updateNode) <--- transaction failure
The JobLockService allows more control in case of more complex operations, which might involve multiple, dynamic nodes (or no nodes at all, e.g. sending emails or similar):
serverA(acquireLock)
serverB(acquireLock) <--- wait for the lock to be released
serverA(readNode1)
serverA(if something then updateNode2)
serverA(updateNode1)
serverA(releaseLock)
serverB(readNode2)
serverB(releaseLock)