How to implement SignalR scale-out without using existing backplane options - signalr

I am using SignalR hosted in multiple servers behind a load balancer. I am storing the connnection id and the user id in the custom database table in sql server. Every time, I need to send notification to the selected users. It is working fine in the single server environment. How do I scale the SignalR implementation with custom database table without using existing backplane options?

I am not sure what is your current implementation because it seems to be a bit mixed your explanation. If you have multiple servers behind a load balancer it means you applied some techniques (I think so!). But you said it's working fine in the single server environment but not in multiple servers. Let's review what is mandatory for multiple servers (scale out)
Communication between instances: It means that any message in one instance is available on all the other instances. The classic implementation is any type of queue, SignalR supports Redis, you can use SQL Server but it's clear the limitations of any SQL solution. Azure has a Redis Cache as a PaaS
In-memory storage: You normally use this in a single server but it's mandatory to implement shared memory. Again, Redis has a shared memory solution in case you have the server available. There is not any possibility of implementing this without a solution like Redis.
Again, a lower-performance solution would be a MemStorage implementation in SQL.
Authentication: The out-of-the-box implementation of security uses a cookie to store the encrypted key. But once you have multiple servers every server has its unique key. To solve the problem you have to implement your own DataProtector in case this is your method used.
The examples are extremely beyond this explanation, most of the code (even templates without the actual methods implemented) would take several pages. I suggest you to take a look at the 3 items that are mandatory to scale out your application.

Related

Responsibility of the http server vs responsibility of the web app hosted using this server

I'm evaluating various hosting options for ASP.NET Core application.
In the new programming model of ASP.NET you process a request with a set of middlewares (which are mixture of older IHttpModule & IHttpHandler).
You can have a middleware which can be responsible for authentication, handling of static files or compressing the response before sending (just to name some).
Here comes the confusion.
Where to set a border between server and an app in context of responsibility?
Which side should be responsible for compressing the response? With IIS this was handled by the server and configured in web.config. Kestrel doesn't provide this functionality AFAIK, so you need to implement a custom middleware in the app which will handle this for you. Which one is more appropriate?
What about authentication? IIS provides settings for authentication (anonymous, impersonation, forms auth). On the opposite, in ASP.NET Core we can also write an app middleware which can handle this for us.
Ok, SSL is handled by server, because it's below in the protocol layer and app operates on HTTP(S) only.
What responsibilities should server have? What responsibilities should an app have?
The server is responsible for implementing the base HTTP protocol, managing connections, etc.. It may also choose to offer other features (e.g. windows auth), but we recommend against it unless it can provide a distinct advantage over a middleware implementation. E.g. Windows auth could be implemented in middleware, but it would be much more difficult due to some of the connection management constraints. Compression could be implemented in middleware just as easily as in the server.
As stated on wikipedia:
"The primary function of a web server is to store, process and deliver
web pages to clients"
The thing is that all famous http servers (nginx, apache, IIS, ...) come with a lot of modules that can handle lots of different tasks including the ones you mentioned in your question (authentication, compression, ...).
It's quite likely that the more modules you'll add the slowest your http server will be. IIS for instance is, by far, not known to be the fastest http server around, but if you remove all the modules and use it just for serving resources, then it will become really fast because this what it has been built for back in the days!
The problem of responsibility goes the same with all kind of software application.
Think about databases whose main role is to store data. RDBMS like Oracle or SQL Server are pretty good at it. But as soon as they release a new version, they also release a new functionality that has nothing to do with storing data. And people use it! ;-)
How many times people used their DB as a search engine? I saw people sending mails with SQL Server! But the worse was some guys trying to call webservices within store procedures ;-)
It's always tempting to have one tool to do everything but you need to keep in mind that it has not been built for every purpose. I'd rather use a bunch of lightweight tools that have one single responsibility and that handle it correctly instead.
Now back to your question, I think it's a good approach to make use of middlewares. That way you have control on the entire pipeline and you know exactly what your request have been through. Middlewares are also testable! Getting rid of all the unnecessary modules will definitely lead you to a more lightweight http server.
The righteous "it depends" answer is also acceptable. If you make some tests and realize that gzip compression module is 10x faster than the middleware, go with the module! Don't be dogmatic neither!

How to Design a Database Monitoring Application

I'm designing a database monitoring application. Basically, the database will be hosted in the cloud and record-level access to it will be provided via custom written clients for Windows, iOS, Android etc. The basic scenario can be implemented via web services (ASP.NET WebAPI). For example, the client will make a GET request to the web service to fetch an entry. However, one of the requirements is that the client should automatically refresh UI, in case another user (using a different instance of the client) updates the same record AND the auto-refresh needs to happen under a second of record being updated - so that info is always up-to-date.
Polling could be an option but the active clients could number in hundreds of thousands, so I'm looking for a more robust and lightweight (on server) solution. I'm versed in .NET and C++/Windows and I could roll-out a complete solution in C++/Windows using IO Completion Ports but feel like that would be an overkill and require too much development time. Looked into ASP.NET WebAPI but not being able to send out notifications is its limitation. Are there any frameworks/technologies in Windows ecosystem that can address this scenario and scale easily as well? Any good options outside windows ecosystem e.g. node.js?
You did not specify a database that can be used so if you are able to use MSSQL Server, you may want to lookup SQL Dependency feature. IF configured and used correctly, you will be notified if there are any changes in the database.
Pair this with SignalR or any real-time front-end framework of your choice and you'll have real-time updates as you described.
One catch though is that SQL Dependency only tells you that something changed. Whatever it was, you are responsible to track which record it is. That adds an extra layer of difficulty but is much better than polling.
You may want to search through the sqldependency tag here at SO to go from here to where you want your app to be.
My first thought was to have webservice call that "stays alive" or the html5 protocol called WebSockets. You can maintain lots of connections but hundreds of thousands seems too large. Therefore the webservice needs to have a way to contact the clients with stateless connections. So build a webservice in the client that the webservices server can communicate with. This may be an issue due to firewall issues.
If firewalls are not an issue then you may not need a webservice in the client. You can instead implement a server socket on the client.
For mobile clients, if implementing a server socket is not a possibility then use push notifications. Perhaps look at https://stackoverflow.com/a/6676586/4350148 for a similar issue.
Finally you may want to consider a content delivery network.
One last point is that hopefully you don't need to contact all 100000 users within 1 second. I am assuming that with so many users you have quite a few servers.
Take a look at Maximum concurrent Socket.IO connections regarding the max number of open websocket connections;
Also consider whether your estimate of on the order of 100000 of simultaneous users is accurate.

Is there a direct way to query and update App data from within a proxy or do I have to use the management API?

I have a need to change Attributes of an App and I understand I can do it with management server API calls.
The two issues with using the management server APIs are:
performance: it’s making calls to the management server, when it
might be possible directly in the message processor. Performance
issues can probably be mitigated with caching.
availability: having to use management server APIs means that the system is
dependent on the management server being available. While if it were
done directly in the proxy itself, it would reduce the number of
failure points.
Any recommended alternatives?
Finally all entities are stored in the cassandra ( for the runtime )
Your best choice is using access entity policy for getting any info about an entity. That would not hit the MS. But just for your information - most of the time you do not even need an access entity policy. When you use a validate apikey or validate access token policy - all the related entity details are made available as flow variable by the MP. So no additional access entity calls should be required.
When you are updating any entity (like developer, application) - I really assume it is management type use case and not a runtime use case. Hence using management APIs should be fine.
If your use case requires a runtime API call to in-turn update an attribute in the application then possibly that attribute should not be part of the application. Think how you can take it out to a cache, KVM or some other place where you can access it from MP (Just a thought without completely knowing the use cases ).
The design of the system is that all entity editing goes through the Management Server, which in turn is responsible for performing the edits in a performant and scalable way. The Management Server is also responsible for knowing which message processors need to be informed of the changes via zookeeper registration. This also ensures that if a given Message Processor is unavailable because it, for example, is being upgraded, it will get the updates whenever it becomes available. The Management Server is the source of truth.
In the case of Developer App Attributes, (or really any App meta-data) the values are cached for 3 minutes (I think), so that the Message Processor may not see the new values for up to 3 minutes.
As far as availability, the Management Server is designed to be highly available, relying on the same underlying architecture as the message processor design.

how to sync data between company's internal database and externally hosted application's database

My organisation (a small non-profit) currently has an internal production .NET system with SQL Server database. The customers (all local to our area) submit requests manually that our office staff then input into the system.
We are now gearing up towards online public access, so that the customers will be able to see the status of their existing requests online, and in future also be able to create new requests online. A new asp.net application will be developed for the same.
We are trying to decide whether to host this application on-site on our servers(with direct access to the existing database) or use an external hosting service provider.
Hosting externally would mean keeping a copy of Requests database on the hosting provider's server. What would be the recommended way to then keep the requests data synced real-time between the hosted database and our existing production database?
Trying to sync back and forth between two in-use databases will be a constant headache. The question that I would have to ask you is if you have the means to host the application on-site, why wouldn't you go that route?
If you have a good reason not to host on site but you do have some web infrastructure available to you, you may want to consider creating a web service which provides access to your database via a set of well-defined methods. Or, on the flip side, you could make the database hosted remotely with your website your production database and use a webservice to access it from your office system.
In either case, providing access to a single database will be much easier than trying to keep two different ones constantly and flawlessly in sync.
If a webservice is not practical (or you have concerns about availability) you may want to consider a queuing system for synchronization. Any change to the db (local or hosted) is also added to a messaging queue. Each side monitors the queue for changes that need to be made and then apply the changes. This would account for one of the databases not being available at any given time.
That being said, I agree with #LeviBotelho, syncing two db's is a nightmare and should probably be avoided if you can. If you must, you can also look into SQL Server replication.
Ultimately the data is the same, customer submitted data. Currently it is being entered by them through you, ultimately it will be entered directly by them, I see no need in having two different databases with the same data. The replication errors alone when they will pop-up (and they will), will be a headache for your team for nothing.

ASP.NET In a Web Farm

What issues do I need to be aware of when I am deploying an ASP.NET application as a web farm?
All session state information would need to be replicated accross servers. The simplest way would be to use the MSSQL session state provider as noted.
Any disk access, such as dynamic files stored by users, would need to be on an area avialable to all servers. Such as by using some form of Network Attached storage. Script files, images and html etc would just be replicated on each server.
Attempting to store any information in the application object or to load information on application startup would need to be reviewed. The events would fire each time the user hit a new machine in the farm.
Machine keys across each server is a very big one as other people have suggested. You may also have problems if you are using ssl against an ip address rather than a domain.
You'll have to consider what load balancing strategy your going to go through as this could change your approach.
Sessions is a big one, make sure you use SQL Server for managing sessions and that all servers point to the same SQL Server instance.
One of the big ones I've run across is issues with different machineKeys spread across the different servers. ASP.NET uses the machineKey for various encryption operations such as ViewState and FormsAuthentication tickets. If you have different machineKeys you could end up with servers not understanding post backs from other servers. Take a look here if you want more information: http://msdn.microsoft.com/en-us/library/ms998288.aspx
Don't use sessions, but use profiles instead. You can configure a SQL cluster to serve them. Sessions will query your session database way too often, while profiles just load themselfs, and that's it.
Use a distributed caching store like memached for caching data, and ASP.Net cache for stuff you'll need alot
Use a SAN or an EMC to serve your static content
Use S3 or something similar to have a fallback on 3.
Have some decent loadbalancer, so you can easily update per server, without ever needing to shut down the site
HOW TO: Set Up Multi-Server ASP.NET Web Applications and Web Services
Log aggregation is easily overlooked - before processing HTTP logs, you might need to combine them to create a single log that includes requests sent to across servers.

Resources