Load balanced Fiware Orion

Load balanced Fiware Orion - nginx

I just created a dockerized load balanced version of OCB using Nginx and supervisord running separate instances of Orion balanced by Nginx. Only for testing purposes.
My question is if I use this approach, would I have some troubles with TIMEINTERVAL subscriptions? (I don't want 'n' notifications for each OCB process).
Any help will be sure appreciated.

Current Orion version (0.23.0) works in the following way: at creation time, the ONTIMEINTERVAL subscribeContext is dispatched by the LB to one of the CB nodes, which creates a permantent thread in charge of sending notification messages at the notification frequency.
However, there are two kind of problems:
If the client wants to cancel the subscription sending unsubscribeContext, that request could be recived by a CB not managing the subscription. Thus, the operation may result in the subscription being deleted from DB, but the notification continues being sent.
Let's consider that in a given moment CB1 managed subscriptions S1 and S2 and CB2 managed S3 and S4. Let's consider that CB2 fails and that it is restarted. The CB2 will "see" four subscription (S1, S2, S3 and S4) at starting time, thus 4 threads are created and the final result is that S3 and S4 notifications are duplicated (being sent at the same time by CB1 and CB2).
Thus, in sum ONTIMEINTERVAL subscription are discouraged in HA and/or horizontal scaling scenarios. However, note that all use cases based on ONTIMEINTERVAL can be "reversed" running a queryContext-based polling at the same frequency at the notification receptor, so it doesn't use to be a big problem.
EDIT: ONTIMEINTERVAL subscription were removed in Orion 1.0.0. ONTIMEINTERVAL subscriptions had several problems (as the ones described in the above answer). Actually, they aren't really needed, as any use case based on ONTIMEINTERVAL notification can be converted to an equivalent use case in which the receptor runs queryContext at the same frequency (and taking advantage of the features of queryContext, such as pagination or filtering)

Related

Spring Kafka batch within time window

Spring Boot environment listening to kafka topics(#KafkaListener / #StreamListener)
Configured the listener factory to operate in batch mode:
ConcurrentKafkaListenerContainerFactory # setBatchListener
or via application.properties:
spring.kafka.listener.type=batch
How to configure the framework so that given two numbers: N and T, it will try to fetch N records for the listener but won't wait more than T seconds, like described here: https://doc.akka.io/docs/akka/2.5/stream/operators/Source-or-Flow/groupedWithin.html
Some properties I've looked at:
max-poll-records ensures you won't get more than N numbers in a batch
fetch-min-size get at least this amount of data in a fetch request
fetch-max-wait but don't wait more than necessary
idleBetweenPolls just sleep a bit between polls :)
It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
It is obviously possible to implement that by hand, I'm looking whether it's possible to configure Spring to to that for me.

It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
That is correct, unfortunately, Kafka provides no mechanism such as fetch.min.records.
I don't anticipate that Spring would layer this functionality on top of the kafka-clients; it would be better to ask for a new feature in Kafka itself.
Spring does not manipulate the records returned from the poll at all, except you can now specify subBatchPerPartition to get batches containing just one partition in order to properly support zombie fencing when using exactly once read/prcess/write.

Make scheduler run in only one instance of multiple micro-service

I have built a micro-service where there is an API called deleteToken. This API(when invoked) is supposed to change the status in a tuple in db corresponding to token (identified with token id) to "MARK-DELETE". Once that tuple has status "MARK_DELETE" then after 30 days there should be a rest call made to downstream service API called deleteTokenFromPartner. There is no such mandate like call to deleteTokenFromPartner has to be made right after 30 days, it can be done few hours later 30 days also. So what I thought was I will write a scheduler (using Quartz, Java Executor service) with scheduled period in such a way that it will run once everyday. what it will do is it will query db and find out all rows which has status="MARK_DELETE" and status update is older than 30 days. After then it will iteratively call deleteTokenFromPartner for each and every row. There is one db which is highly available and we may not have any issue with consistency as we delete after 30 days. But the problem I am seeing is, as this is a micro-service which has N instances so every instance will query db, get the same set of rows and make call to same rows. Can I make any tweak so that this duplicated calls can be avoided. FYI we don't make any config changes using hostnames and if only one instance will be capable of running the scheduler that too will be fine.

How to resolve celery.backends.rpc.BacklogLimitExceeded error

I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.

Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.

Hitting 100 active connections limit in test env with only two users

I have a single web client and a few Lambda functions which use the Admin SDK. I've noticed recently that I've bumped into the 100 simultaneous connection limit but I really shouldn't be anywhere near that limit. Also it would appear that the connections established by my Lamba functions are not dropping off even after the function has completed.
Any idea on:
how I can prevent this run-up on connections from happening?
how I can release connections established by past Lambda scripts?
how can I monitor which processes/threads/stacks are holding connections?
Note: this is a testing environment I'm working out of so I'd prefer to keep this in the free tier and my requirements should definitely not be running into the 100 active limit. I am on a paid plan in prod.
I attempt to avoid calling initializeApp more than once by using the following connection code. In the example I'm talking about I only have a single database as a backend and so the default "name" of DEFAULT is used each time.
const runningApps = new Set(firebase.apps.map(i => i.name));
this.app = runningApps.has(name)
? firebase.app()
: firebase.initializeApp({
credential: firebase.credential.cert(serviceAccount),
databaseURL: config.databaseUrl
});
I'm now trying to explicitly close connections with goOffline but that leads to another issue where on the second connection -- aka, where the DEFAULT application is already setup and it just reuses the connection already established I get the following logging:
# Generated as result of `goOnline`
Connecting to Firebase: [https://xyz.firebaseio.com]
appears to be already connected
# Listening on ".info/connected" comes back as true, resulting in:
AbstractedAdmin: connected to [DEFAULT]
# but then I get this error
NotAllowed: You must first connect before using the database() API at Object._getFirebaseType

The fact that you have unexpected incoming connections to the database, makes it seem like the stale instances keep an open connection.
Best I can think off is to call goOffline() in your function before it completes to explicitly disconnect. That would probably also mean you have to call goOnline at the start of the function, since it might be running on an instance that previously went offline. Both goOnline and goOffline are synchronous calls afaik, but there's definitely going to be some time between going online and the data becoming available in your app.
If Lambda has a way for you to detect life-cycle events of its instances, that would be the preferred place to call goOffline and goOnline.

admin.initializeApp should only get called once in your script/node app.
The Firebase SDK's talks HTTP2 to the Firebase cloud system, so I'm not sure why you would encounter max connection issues as unique sockets are not stood up per call.
One thing to look out for is that calls to 3rd part API's (such as sendgrid) are not supported on the free tier.

How the Connection is calculated in Firebase

How are the connections are being calculated?
Let's assume that I have a web app which one load sends a message to all connected clients, and let's say I have 5 connected clients. Does it means that as long as the browser tab with the web app is open it will count as 1 connections, which means that I will have 6 concurrent connections and that's count towards what you define as "Connection" in the pricing page?
If not, please explain how you calculate the "Connection". Thanks

This question was bugging me ever since I ran through the thinkster.io angular+firebase tutorial and I saw my firebase analytics tab showing a peak concurrent of 6 even though I only remember having the one page open. I looked back at the code and thought it could be to do with how the tutorial has you create a new Firebase(url) for each location in your firebase.
I wanted to test the difference between creating a new Firebase(url) vs taking the root reference and then accessing the .child() location. My theory was that new Firebase(url) would create a new connection each time, while .child() would re-use the existing connection.
Setup
Created two new firebases each with identical data
Setup an angularjs project using yeoman
Included angularfire
Code
For simplicity, I just put everything in the main controller of the generated code.
To test out the connections created with new Firebase() I did the following:
$scope.fb_root = $firebase(new Firebase(FBURL_NEW));
$scope.fb_root_apps = $firebase(new Firebase(FBURL_NEW + '/apps'));
$scope.fb_root_someApp = $firebase(new Firebase(FBURL_NEW + '/apps/someApp'));
$scope.fb_root_users = $firebase(new Firebase(FBURL_NEW + '/users'));
$scope.fb_root_mike = $firebase(new Firebase(FBURL_NEW + '/users/mike'));
To test out the connections created with ref.$child() I did the following:
$scope.fb_child = $firebase(new Firebase(FBURL_CHILD));
$scope.fb_child_apps = $scope.fb_child.$child("apps");
$scope.fb_child_someApp = $scope.fb_child_apps.$child("someApp");
$scope.fb_child_users = $scope.fb_child.$child("users");
$scope.fb_child_mike = $scope.fb_child_users.$child("mike");
I then bound these objects in my view so I can see them, and I played around with updating data via my firebase forge and watching the data update live on my app.
Results
I opened up my local app into 17 browser tabs, hoping that a large number of tabs would exaggerate any differences between the connection methods.
What I found is that each tab only opened up one single web socket connection back to firebase for each firebase db. So at the end of the test, both methods resulted in the same peak count of 17 connections.
Conclusion
From this simple test I think it's safe to say that the Firebase JS library does a good job of managing its connection.
Regardless of your code calling new Firebase() a bunch of times, or by referencing child locations via .child(), the library will only create a single connection as far as your metering is concerned. That connection will stay online for as long as your app is open.
So in your example - yes I believe you will see 6 concurrent connections, 1 for the app where someone is sending the message, and 5 for the apps receiving the message.
Update
One other thing worth mentioning is that Firebase measures connections for paid plans based on the 95th percentile of usage during the month. This is listed in the FAQ section of their Pricing page # https://www.firebase.com/pricing.html
Update 11-Mar-16: Firebase no longer appears to measure connections based on 95th %. Instead, the 101st concurrent connection is denied.
https://www.firebase.com/pricing.html :
All our plans have a hard limit on the number of database connections.
Our Free and Spark plans are limited to 100. The limit cannot be
raised. All other plans have a courtesy limit of 10,000 database
connections. This can be removed to permanently allow Unlimited
connections if you email us at firebase-support#google.com.. The
reason we impose this courtesy limit is to prevent abuse and to ensure
that we are prepared to handle our largest customers. Please contact
us at least 24 hours in advance so we can lift this limit and ensure
we have enough capacity available for your needs.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex