How to resolve celery.backends.rpc.BacklogLimitExceeded error - asynchronous

I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.

Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.

Related

Cosmos DB Emulator hangs when pumping continuation token, segmented query

I have just added a new feature to an app I'm building. It uses the same working Cosmos/Table storage code that other features use to query and pump results segments from the Cosmos DB Emulator via the Tables API.
The emulator is running with:
/EnableTableEndpoint /PartitionCount=50
This is because I read that the emulator defaults to 5 unlimited containers and/or 25 limited and since this is a Tables API app, the table containers are created as unlimited.
The table being queried is the 6th to be created and contains just 1 document.
It either takes around 30 seconds to run a simple query and "trips" my Too Many Requests error handling/retry in the process, or hangs seemingly forever and no results are returned, the emulator has to be shut down.
My understanding is that with 50 partitions I can make 10 unlimited tables, collections since each is "worth" 5. See documentation.
I have tried with rate limiting on and off, and jacked the RU/s to 10,000 on the table. It always fails to query this one table. The data, including the files on disk, has been cleared many times.
It seems like a bug in the emulator. Note that the "Sorry..." error that I would expect to see upon creation of the 6th unlimited table, as per the docs, is never encountered.
After switching to a real Cosmos DB instance on Azure, this is looking like a problem with my dodgy code.
Confirmed: my dodgy code.
Stand down everyone. As you were.

Spring Kafka batch within time window

Spring Boot environment listening to kafka topics(#KafkaListener / #StreamListener)
Configured the listener factory to operate in batch mode:
ConcurrentKafkaListenerContainerFactory # setBatchListener
or via application.properties:
spring.kafka.listener.type=batch
How to configure the framework so that given two numbers: N and T, it will try to fetch N records for the listener but won't wait more than T seconds, like described here: https://doc.akka.io/docs/akka/2.5/stream/operators/Source-or-Flow/groupedWithin.html
Some properties I've looked at:
max-poll-records ensures you won't get more than N numbers in a batch
fetch-min-size get at least this amount of data in a fetch request
fetch-max-wait but don't wait more than necessary
idleBetweenPolls just sleep a bit between polls :)
It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
It is obviously possible to implement that by hand, I'm looking whether it's possible to configure Spring to to that for me.
It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
That is correct, unfortunately, Kafka provides no mechanism such as fetch.min.records.
I don't anticipate that Spring would layer this functionality on top of the kafka-clients; it would be better to ask for a new feature in Kafka itself.
Spring does not manipulate the records returned from the poll at all, except you can now specify subBatchPerPartition to get batches containing just one partition in order to properly support zombie fencing when using exactly once read/prcess/write.

how to use the example of scrapy-redis

I have read the example of scrapy-redis but still don't quite understand how to use it.
I have run the spider named dmoz and it works well. But when I start another spider named mycrawler_redis it just got nothing.
Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.
And if the spiders on different machines want to share the same request queue, how can I get it done? It seems that I should firstly make the slave machine connect to the master machine's redis, but I'm not sure which part to put the relative code in,in the spider.py or I just type it in the command line?
I'm quite new to scrapy-redis and any help would be appreciated !
If the example spider is working and your custom one isn't, there must be something that you have done wrong. Update your question with the code, including all relevant parts, so we can see what went wrong.
Besides I'm quite confused about how the request queue is set. I
didn't find any piece of code in the example-project which illustrate
the request queue setting.
As far as your spider is concerned, this is done by appropriate project settings, for example if you want FIFO:
# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
As far as the implementation goes, queuing is done via RedisSpider which you must inherit from your spider. You can find the code for enqueuing requests here: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73
As for the connection, you don't need to manually connect to the redis machine, you just specify the host and port information in the settings:
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
And the connection is configured in the Ä‹onnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py
The example of usage can be found in several places: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17

SolrCloud shard recovery

I'm a SolrCloud newbie, my setup is 3 shards, 3 replicas, external Zookeeper
Today I found shard3 down, replica3 had taken over as leader, so indexing was occurring to replica3 not shard3. I stopped Tomcat/SOLR in reverse order (R3,R2,R1,S3,S2,S1) and restarted in forward order (S1,S2,S3,R1,R2,R3). I did not delete any tlog or replication.properties files. The cloud graph shows all hosts with their correct assignments. As I understand it these assignments are set in Zookeeper on the first startup.
My question is how does the data that was indexed to replica3 get back to the revived shard3?
And surprisingly shard3 = 87G while replica3 = 80G.
Confused!
Dan,
The size of replicas are not important, only the number of documents that collection has.
The way Solr works, you can have deleted documents in your collection that only are deleted in merge operations, this extra 7G can be deleted documents.
1) As far as I know when the shard3 is up, live and running it is zookeeper which does the data sync job between shard and replica3.
2) Regarding your second question, may be the replica3 is in optimization state and hence you are seeing less data size and shard3 is yet to be optimized by SOLR. (This is just a wild guess)

How the Connection is calculated in Firebase

How are the connections are being calculated?
Let's assume that I have a web app which one load sends a message to all connected clients, and let's say I have 5 connected clients. Does it means that as long as the browser tab with the web app is open it will count as 1 connections, which means that I will have 6 concurrent connections and that's count towards what you define as "Connection" in the pricing page?
If not, please explain how you calculate the "Connection". Thanks
This question was bugging me ever since I ran through the thinkster.io angular+firebase tutorial and I saw my firebase analytics tab showing a peak concurrent of 6 even though I only remember having the one page open. I looked back at the code and thought it could be to do with how the tutorial has you create a new Firebase(url) for each location in your firebase.
I wanted to test the difference between creating a new Firebase(url) vs taking the root reference and then accessing the .child() location. My theory was that new Firebase(url) would create a new connection each time, while .child() would re-use the existing connection.
Setup
Created two new firebases each with identical data
Setup an angularjs project using yeoman
Included angularfire
Code
For simplicity, I just put everything in the main controller of the generated code.
To test out the connections created with new Firebase() I did the following:
$scope.fb_root = $firebase(new Firebase(FBURL_NEW));
$scope.fb_root_apps = $firebase(new Firebase(FBURL_NEW + '/apps'));
$scope.fb_root_someApp = $firebase(new Firebase(FBURL_NEW + '/apps/someApp'));
$scope.fb_root_users = $firebase(new Firebase(FBURL_NEW + '/users'));
$scope.fb_root_mike = $firebase(new Firebase(FBURL_NEW + '/users/mike'));
To test out the connections created with ref.$child() I did the following:
$scope.fb_child = $firebase(new Firebase(FBURL_CHILD));
$scope.fb_child_apps = $scope.fb_child.$child("apps");
$scope.fb_child_someApp = $scope.fb_child_apps.$child("someApp");
$scope.fb_child_users = $scope.fb_child.$child("users");
$scope.fb_child_mike = $scope.fb_child_users.$child("mike");
I then bound these objects in my view so I can see them, and I played around with updating data via my firebase forge and watching the data update live on my app.
Results
I opened up my local app into 17 browser tabs, hoping that a large number of tabs would exaggerate any differences between the connection methods.
What I found is that each tab only opened up one single web socket connection back to firebase for each firebase db. So at the end of the test, both methods resulted in the same peak count of 17 connections.
Conclusion
From this simple test I think it's safe to say that the Firebase JS library does a good job of managing its connection.
Regardless of your code calling new Firebase() a bunch of times, or by referencing child locations via .child(), the library will only create a single connection as far as your metering is concerned. That connection will stay online for as long as your app is open.
So in your example - yes I believe you will see 6 concurrent connections, 1 for the app where someone is sending the message, and 5 for the apps receiving the message.
Update
One other thing worth mentioning is that Firebase measures connections for paid plans based on the 95th percentile of usage during the month. This is listed in the FAQ section of their Pricing page # https://www.firebase.com/pricing.html
Update 11-Mar-16: Firebase no longer appears to measure connections based on 95th %. Instead, the 101st concurrent connection is denied.
https://www.firebase.com/pricing.html :
All our plans have a hard limit on the number of database connections.
Our Free and Spark plans are limited to 100. The limit cannot be
raised. All other plans have a courtesy limit of 10,000 database
connections. This can be removed to permanently allow Unlimited
connections if you email us at firebase-support#google.com.. The
reason we impose this courtesy limit is to prevent abuse and to ensure
that we are prepared to handle our largest customers. Please contact
us at least 24 hours in advance so we can lift this limit and ensure
we have enough capacity available for your needs.

Resources