I have an Opencpu server with my package in it. This package contains a function which generates a plot image. When I send a POST request to this function via Opencpu, I get the session id in response which in turns is actually a folder on the Opencpu server containing the resources, image be the one. I pass on this image url (being served by Opencpu server) to my application and it uses that to create a PDF report.
Now, I have scaled this whole scenario by creating multiple instances of Opencpu containing my package. The instances are behind a Load Balancer. When I do the same, I get the image url. When my application uses this image url, it may not be found because now the request may have gone to some another instance of the Opencpu.
How can I approach the solution of this problem? One thing I have done for now is uploading the image to some public instance and return the corresponding path to the application. But that is too coupled.
Thanks.
Load balancing is always a bit complicated, so if possible, it is
easier to just move to a larger server. Most (cloud) providers offer (virtual) instances with many cores and 100GB+ RAM, which will allow you to
support many users.
If you really need load balancing there are a few methods.
One approach is to map the ocpu-store directory on the ocpu servers to
a shared NSF server. By default opencpu stores all sessions in the
/tmp/ocpu-store directory on the server. You can set a different location by setting the
tmpdir option in your /etc/opencpu/server.conf. There is an example configuration file that sets tmpdir in the /etc/opencpu/server.conf.d/ec2.conf.disabled on your server (rename to activate).
If you don't want to setup an NSF server, a simpler approach is to
configure your load balancer to always send particular clients to a
particular backend. For example if you use nginx you can set the load
balancing method to ip-hash.
Obviously this method requires that clients do not change ip address during the session. And
it will only be effective if you have clients connecting from a variation
of ip addresses.
Related
First off, I'll explain my situation to you. I'm building a server for storing and retrieve data for my phone application. I'm new to NGINX. What I know point of using load balancing/reverse proxy is to improve performance and reliability by distributing the workload across multiple servers. But i'm not understand while it working with image/video file. Let say below is mine NGINX config file
upstream backend {
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://backend;
}
}
I have few question here.
First while i upload an image/video file, did i need to upload image into those all backend server or there is other way?
Second if I just save it to a separate server which only store image, while I requesting and download image or video file I proxy it to that specified server then what is the point of load balancing with image/video file since reverse proxy is to improve performance and reliability by distributing the workload across multiple servers?
Third does amazon s3 really better for storing file? Does it cost cheaper?
I'm looking for solution which can be done it by my own server beside using third parties.
Thx for any help!!
You can either use shared storage e.g. NFS, or upload to both servers, or incorporate a strategy to distribute files between servers, storing each file on a single server.
First two options logically are the same and provide fallover, hence improving reliability.
Third option, as you note, does not improve reliability (may be somehow, if one server fails second may still serve some files). It can improve performance, though, if you have many concurrent requests for different files and distribute them evenly between servers. This is not achieved through nginx load balancing but rather by redirecting to different servers based on request (e.g. file name or key).
For shared storage solution, you can use, for example, NFS. There are many resources going into deeper detail, for example https://unix.stackexchange.com/questions/114699/nfs-automatic-fail-over-or-load-balanced-or-clustering
For duplicate upload solution, you can either send file twice from client or do it server side with some code. Server side solution has the benefit of single file traffic from client and sending to second server only on fast network. In simple case this can be achieved, for example, by receiving file in a servlet, storing incoming data to disk and simultaneously upload to another servlet on the second server through http or other protocol.
Note that setting up any of these options correctly can involve quite significant effort, testing and maintanance.
Here comes S3, ready to use distributed/shared storage, with simple API, integrations, clients and reasonable price. For simple solution usually it is not cheaper in terms of $ per storage, but much cheaper in terms of R&D. It also has the option to serve flies and content through http (balanced, reliable and distributed), so you either can download file in client directly from S3 hosts or make permanent or temporary redirects there from your http servers.
I'm writing a website that is going to start using a load balancer and I'm trying to wrap my head around it.
Does IIS just do all the balancing for you?
Do you have a separate web layer that sits on the distributed server that does some work before sending to the sub server, like auth or other work?
It seems like a lot of the articles I keep reading don't really give me a straight answer, or I'm just not understanding them correctly, I'd like to get my head around how true load balancing works from a techincal side, and if anyone has any code to share that would also be nice.
I understand caching is gonna be a problem but that's a different topic, session as well.
IIS do not have a load balancer by default but you can use at least two Microsoft technologies:
Application Request Routing that integrates with IIS, there you should ideally have a separate web layer to do routing work,
Network Load Balancing that is integrated with Microsoft Windows Server, there you can join existing servers into NLB cluster.
Both of those technologies do not require any code per se, it a matter of the infrastructure. But you must of course remember about load balanced environment during development. For example, to make a web sites truly balanced, they should be stateless. Otherwise you will have to provide so called stickiness between client and the server, so the same client will be connecting always to the same server.
To make service stateless, do not persist any state (Session, for example, in case of ASP.NET website) on the server but on external server shared between all servers in the farm. So it is common for example to use external ASP.NET Session server (StateServer or SQLServer modes) for all sites in the cluster.
EDIT:
Just to clarify a few things, a few words about both mentioned technologies:
NLB works on network level (as a networking driver in fact), so without any knowledge about applications used. You create so called clusters consisting of a few machines/servers and expose them as a single IP address. Then another machine can use this IP as any other IP, but connections will be routed to the one of the cluster's machines automatically. A cluster is configured on each server, there is no external, additional routing machine. Depending on the clusters settings, as we have already mentioned, a stickiness can be enabled or disabled (called here a Single or None Affinity). There is also a Load weight parameter, so you can set weighed load distribution, sending more connections to the fastest machine for example. But this parameter is static, it can't be dynamically based on network, CPU or any other usage. In fact NLB does not care if target application is even running, it just route network traffic to the selected machine. But it notices servers went offline, so there will be no routing there. The advantages of NLB is that it is quite lightweight and requires no additional machines.
ARR is much more sophisticated, it is built as a module on top of IIS and is designed to make the routing decisions at application level. Network load balancing is only one of its features as it is a more complete, routing solution. It has "rule-based routing, client and host name affinity, load balancing of HTTP server requests, and distributed disk caching" as Microsoft states. You create there Server Farms with many options like load balance algorithm, load distribution and client stickiness. You can define health tests and routing rules to forward request to other servers. Disadvantage of all of it is that there should be a dedicated machine where ARR is installed, so it takes more resources (and costs).
NLB & ARR - as using a single ARR machine can be the single point of failure, Microsoft states that it is worth consideration to create a NLB cluster of ARR machines.
Does IIS just do all the balancing for you?
Yes,if you configure Application Request Routing:
Do you have a separate web layer that sits on the distributed server
Yes.
that does some work before sending to the sub server, like auth or other work?
No, ARR is pretty 'dumb':
IIS ARR doesn't provide any pre-authentication. If pre-auth is a requirement then you can look at Web Application Proxy (WAP) which is available in Windows Server 2012 R2.
It just acts as a transparent proxy that accepts and forwards requests, while adding some caching when configured.
For authentication you can look at Windows Server 2012's Web Application Proxy.
Some tips, and perhaps items to get yourself fully acquainted with:
ARR as all the above answers above state is a "proxy" that handles the traffic from your users to your servers.
You can handle State as Konrad points out, or you can have ARR do "sticky" sessions (ensure that a client always goes to "this server" - presumably the server that maintains state for that specific client). See the discussion/comments on that answer - it's great.
I haven't worn an IT/server hat for so long and frankly haven't touched clustering hands on (always "handled for me automagically" by some provider), so I did ask this question from our host, "what/how is replication among our cluster/farm" done?" - The question covers things like
I'm only working/setting things on 1 server, does that get replicated across X VMs in our cluster/farm? How long?
What about dynamically generated,code and/or user generated files (file system)? If it's on VM1's file system, and I have 10 load balanced VMs, and the client can hit any one of them at any time, then...?
What about encryption? e.g. if you use DPAPI to encrypt web.config stuff (e.g.db conn strings/sections), what is the impact of that (because it's based on machine key, and well, the obvious thing is now you have machine(s) or VM(s). RSA re-write....?
SSL: ARR can handle this for you as well, and that's great! But as with all power, comes a "con" - if you check/validate in your code, e.g. HttpRequest.IsSecureConnection, well, it'll always be false. Your servers/VMs don't have the cert, ARR does. The encrypted conn is between client and ARR. ARR to your servers/VMs isn't. As the link explains, if you prefer it the other way around (no offloading), you can...but that means all your servers/VMs should then have a cert (and how that pertains to "replication" above starts popping in your head).
Not meant to be comprehensive, just listing things out from memory...Hth
In Azure, I have a webrole exposed to public and 2 workerroles only accessible within the private network. Now I want to internally load balance the workerroles, so I have set an internal endpoint for workerroles, but what address should I use to communicate to the workers, it can't be the internal IP address because it's specific to a particular instance and wouldn't go through the load balancer right?
Thx a lot!
There is no internal load balancer in Windows Azure. The only load balancer is the one that has the public IP Addresses.
If you want to load balance only internal addresses (workers) you have to maintain it yourself. Meaning you have to install some kind of a load balancer on Azure VM, which is part of the same VNet. That load balancer may be of your choice (Windows or Linux). And you have to implement a watchdog service for when topology changes - i.e. workers are being recycled, hardware failure, scaling events. I would not recommend this approach unless this is absolutely necessary.
Last option is to have a (cached) pool of IP Endpints of all the workers and randomly chose one when you need.
Azure based Internal Load Balancers (ILB) are available as of TUESDAY, MAY 20, 2014.
http://azure.microsoft.com/blog/2014/05/20/internal-load-balancing/
Can be used for SQL AlwaysON deployments and publishing and internal endpoint accessible accessible from your VNET only (i.e not public routable).
note: Searching for ILB help and spotted this thread. Thought it was worth updating , if not let me know and I will delete.
You can configure IIS of your WebRole to act as Reverse Proxy for you using Application Request Routing then configure it's rules to Load-Balance requests using your chosen algorithm.
The easiest way will be to modify your WebRole.cs to obtain the list of (internal) endpoints of your WorkerRole then add it programatically (see example here).
Alternatively, you can use a Startup-Script to invoke appcmd to achieve the same results.
Lastly, you'll have to change your client settings to point requests to the (proxied) IIS endpoint instead of the regular WorkerRole endpoints.
Please note that Azure now supports Internal Load Balancing (ILB).
http://azure.microsoft.com/blog/2014/05/20/internal-load-balancing
We have deployed an application on the server.
Problem is, sometimes the application will be down due to some issue (Ex: While Downloading huge volume of data into Excel).
The application will be up after manually restarting the IIS.
We are creating a new application, so we are not working to fix this issue.
As a workaround, we are trying to build an exe with the below requirement:
Ping the application deployed on the server and find out whether the application is up or down, If the application is down, restart IIS.
Is it possible to ping a local website on the IIS? Is there any other way to do a temporary fix?
Hmmm, that kind of stability isn't good. However, you're interested in monitoring a URL and determining whether it is active...
TBH, I'm sure there are a few monitoring applications knocking around, some even free if that's you thing that will recognise specific ports and utilise appropriate protocols such as HTTP. But if you fancy having a go yourself you could always utilise the HttpWebRequest to mock up a request to the server and hopefully it will respond in a timely manner. Typically if you're just touching the server you can utilise a 'HEAD' request you just receives the header data rather than all the data. Check out this example.
I am developing a C# ASP.NET 4.0 application that will reside on a Windows Server 2003. By mean of accessing this application through a network computer, any user would be able to upload files to the windows server. But also, once these files are stored on server, he/she would be able to copy these files from the windows server to another networked computer.
I have found a way to upload files to a specified location on the server disk,
but now I need to send these files that are on server disk to the client computers.
My question is: Is there any way to send or copy files from server to other client computers (not the one that is accessing the web service) without needing a program recieving those files on the client computers? FTP, WCF, cmd commands, sockets?
Any idea?
If you want users of your webapp to download files, I'd look into an "ashx generic handler." It will allow you to send files back down to clients over HTTP(s).
If you are looking to have remote users, tell your webserver to copy files to other servers ON THE SAME LAN AS THE SERVER, you would write using normal System.IO operations.
Over a LAN, if you have the correct permissions and so on, you can write to a disk on a different machine using File.Copy -- there's nothing special about that.
If we're talking about remote machines over the internet, that's a different story. Something has to be listening whether it's FTP, WCF, DropBox, etc.
If the problem is that it can be painful to get something like WCF to work from a client due to problems like firewall issues under Windows 7, you could take a different route and have the client periodically ping the server looking for new content. To give the server a point of reference, the ping could contain the name or creation date of the most recent file received. The server could reply with a list of new files, and then the client could make several WCF calls, one by one, to pull the content down. This pattern keeps all the client traffic outbound.
You can if you can run the program as an account that has access to that computer. However having this sort of access on your network that would grant access to the outside world to put an unfiltered file on your internal network is just asking to be hacked.
Finally, I decided to install a FileZilla FTP server on each client computer and my page is working very well. But another option is to create a work group in the windows server and put every client computer to work in this work group, so that Windows server have access to the computers in the same work group.
Here are some links that may help to create the work groups:
http://helpdeskgeek.com/networking/cannot-see-other-computers-on-network-in-my-network-places/
http://www.computing.net/answers/windows-2003/server-2003-workgroup-setup-/1004.html