I'm trying to make a distributed RPC-type web application that uses websockets for its main interface. I want to use a queuing system (like RabbitMQ) in order to distribute the expensive jobs that are requested through the websocket connections.
Basically, the flow would go like this:
A client sends a job via websocket connection to the server
The server would send this message to a RabbitMQ exchange to be processed by a worker
The worker would execute the job and add the result of the job to a response queue
The server would check the response queue and send the result of the job back to the client via websocket connection.
As far as I can tell, on the server I need two event loops that share memory. The websocket server needs to be listening for incoming jobs, and a RabbitMQ consumer needs to be listening for job results to send back to the clients.
What's the appropriate technologies for me to use here? I've considered the following:
multithreading the application and starting one event loop on each thread
using two processes with shm (shared memory)
using two processes that communicate via socket (either a unix socket or maybe even set up the workers as special websocket clients)
hooking into the websocket server's event loop to check the result queue
I'm new to both websockets and distributed computing, so I really have no idea which of these (or maybe something I didn't think of) would work best for me.
As far as I can tell, on the server I need two event loops that share memory. The websocket server needs to be listening for incoming jobs, and a RabbitMQ consumer needs to be listening for job results to send back to the clients.
Since you can have multiple clients sending jobs concurrently, you will need a multithreaded server. Unless your application would process client per client. Now there are multiple approaches to implement a multithreaded server, each with their own advantages/disadvantages. Take a look at multithreading through :
A thread per request (+ : throughput potentially maximized, - : threads are expensive to create, must manage concurrency)
A thread per client (+ : less thread management overhead, - : doesn't scale to many many connections, still manage concurrency)
A thread pool (+ : Avoids overhead of thread creation, scalable up to N concurrent connections (N = size of thread pool), - : Manage concurrency between N threads)
It's up to you to choose one of the above approaches (I would opt for a thread per client as it is relatively easy to implement and the chance that you'll have tens of thousands of clients is relatively small).
Notice that this is a multithreaded approach and not an event-driven approach ! But since you are not limited to one thread (in which case it should be event driven in order to be able to process multiple clients "concurrently") I wouldn't go for that option as it is more difficult to implement. (Programmers sometimes speak about a "callback hell" in an event-driven approach).
This is how I would implement it (one thread per client, Java) :
Basically, the flow would go like this:
A client sends a job via websocket connection to the server
Server part :
public class Server {
private static ServerSocket server_skt;
private static ... channel; // channel to communicate with the rabbitMQ distributed priority queue.
// Constructor
Server(int port) {
server_skt = new ServerSocket(port);
/*
* Set up connection with the distributed queue
* channel = ...;
*/
}
public static void main(String argv[]) {
Server server = new Server(5555); // Make server instance
while(true) {
// Always waiting for new clients to connect
try {
System.out.println("Waiting for a client to connect...");
// Spawn new thread for communication with client (hence one thread per client approach)
new CommunicationThread(server_skt.accept(), server.channel).start(); // Will listen for new jobs and send them
} catch(IOException e) {
System.out.println("Exception occured :" + e.getStackTrace());
}
}
}
}
The server would send this message to a RabbitMQ exchange to be processed by a worker
...
The server would check the response queue and send the result of the job back to the client via websocket connection.
public class CommunicationThread extends Thread {
private Socket client_socket;
private InputStream client_in;
private OutputStream client_out;
private ... channel; // Channel to communicate with rabbitMQ
private ... resultQueue;
public CommunicationThread(Socket socket, ... channel) { // replace ... by type of the rabbitMQ channel
try {
this.client_socket = socket;
this.client_in = client_socket.getInputStream();
this.client_out = client_socket.getOutputStream();
this.channel = channel;
this.resultQueue = ...;
System.out.println("Client connected : " + client_socket.getInetAddress().toString());
} catch(IOException e) {
System.out.println("Could not initialize communication properly. -- CommunicationThread.\n");
}
}
public yourJobType readJob() {
// Read input from client (e.g. read a String from "client_in")
// Make a job from it (e.g. map String to a job)
// return the job
}
#Override
public void run() {
while(active) {
/*
* Always listen for incoming jobs (sent by client) and for results (to be sent back to client)
*/
// Read client input (only if available, else it would be blocking!)
if(client_in.available() > 0) {
yourJobType job = readJob();
channel.basicPublish(...); // Send job to rabbitMQ
}
/* Check result queue (THIS is why reading client input MUST be NON-BLOCKING! Else while loop could be blocked on reading input
* and the result queue won't be checked until next job arrives)
*/
ResultType next_result = resultQueue.poll(); // Could be "null" if the queue is empty
if(next_result != null) {
// There is a result
client_out.write(next_result.toByteArray());
client_out.flush();
}
}
client_in.close();
client_out.close();
}
}
Note that when reading from the result queue it is important that you only read results of jobs sent by that client.
If you have one result queue containing the results of jobs (of all clients) and you retrieve a result like in the code above, then that result could be the result of a job of another client, hence sending the result back the the wrong client.
To fix this you could poll() the result queue with a filter and a wildcard (*) or have a result queue for each client, hence knowing that a result retrieved from our queue wil be sent to the corresponding client.
(*) : You could assign an ID to every client. When receiving a job from a client, pair the job with the client ID (e.g. in a tuple < clientID, job >) and put it in the queue. And do the same for the results (pair the result with the client ID and put it in the result queue). Then in the run() method of CommunicationThread you would have to poll the result queue only for results of the form < clientID, ? >.
Important : You'll also have to assign an ID for every job! Because sending job A and then job B doesn't guarantee that result of job A will come before the result of job B. (Job B could be less time consuming then job A and thus the result could be sent back to the client before job A's result).
(PS : It's up to you to see how to implement the workers (executed by server with one thread for each worker? Or executed by other processes?))
The above answer is a possible, multithreaded solution. I only discussed the server part, the client part should send jobs and wait for results (how to implement this depends on your goals, do clients first send all jobs and then receive the results of each job or can this be mixed ?).
There are other ways it could be implemented, but for a beginner in distributed computing I think this is the easiest solution (using thread pools, ... would make it trickier).
Related
I am using the new HttpClient shipped with JDK 11 to make many requests (to Github's API, but I think that's irrelevant), especially GETs.
For each request, I build and use an HttpClient, like this:
final ExecutorService executor = Executors.newSingleThreadExecutor();
final HttpClient client = client = HttpClient
.newBuilder()
.followRedirects(HttpClient.Redirect.NORMAL)
.connectTimeout(Duration.ofSeconds(10))
.executor(executor)
.build();
try {
//send request and return parsed response;
} finally {
//manually close the specified executor because HttpClient doesn't implement Closeable,
//so I'm not sure when it will release resources.
executor.shutdownNow();
}
This seems to work fine, except every now and then, I get the bellow exception and requests will not work anymore until I restart the app:
Caused by: java.net.ConnectException: Cannot assign requested address
...
Caused by: java.net.BindException: Cannot assign requested address
at java.base/sun.nio.ch.Net.connect0(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.connect(Net.java:476) ~[na:na]
at java.base/sun.nio.ch.Net.connect(Net.java:468) ~[na:na]
Note that this is NOT the JVM_Bind case.
I am not calling localhost or listening on a localhost port. I am making GET requests to an external API. However, I've also checked the etc/hosts file and it seems fine, 127.0.0.1 is mapped to localhost.
Does anyone know why this happens and how could I fix it? Any help would be greatly appreciated.
You can try using one shared HttpClient for all requests, since it manages connection pool internally and may keep connections alive for same host (if supported). Performing a lot of requests on different HttpClients is not effective, because you'll have n thread pools and n connection pools, where n is an amount of clients. And they won't share underlying connections to the host.
Usually, an application creates a single instance of HttpClient in some kind of main() and provides it as a dependency to users.
E.g.:
public static void main(String... args) {
final HttpClient client = client = HttpClient
.newBuilder()
.followRedirects(HttpClient.Redirect.NORMAL)
.connectTimeout(Duration.ofSeconds(10))
.build();
new GithubWorker(client).start();
}
Update: how to stop current client
According to JavaDocs in internal private class of JDK in HttpClientImpl.stop method:
// Called from the SelectorManager thread, just before exiting.
// Clears the HTTP/1.1 and HTTP/2 cache, ensuring that the connections
// that may be still lingering there are properly closed (and their
// possibly still opened SocketChannel released).
private void stop() {
// Clears HTTP/1.1 cache and close its connections
connections.stop();
// Clears HTTP/2 cache and close its connections.
client2.stop();
// shutdown the executor if needed
if (isDefaultExecutor) delegatingExecutor.shutdown();
}
This method is called from SelectorManager.showtdown (SelectorManager is created in HttpClient's constructor), where shutdown() method called in finally block around busy loop in SelectorManager.run() (yes, it implements Thread). This busy loop is while (!Thread.currentThread().isInterrupted()). So to enter this finally block you need to either fail this loop with exception or interrupt the running thread.
I'm trying to understand the idea of non-blocking web server and it seems like there is something I miss.
I can understand there are several reasons for "block" web request(psuedocode):
CPU bound
string on_request(arg)
{
DO_SOME_HEAVY_CPU_CALC
return "done";
}
IO bound
string on_request(arg)
{
DO_A_CALL_TO_EXTERNAL_RESOURCE_SUCH_AS_WEB_IO
return "done";
}
sleep
string on_request(arg)
{
sleep(VERY_VERY_LONG_TIME);
return "done";
}
are all the three can benefit from non-blocking server?
how the situation that do benefit from the non-blocking web server really do that?
I mean, when looking at the Tornado server documentation, it seems
like it "free" the thread. I know that a thread can be put to sleep
and wait for a signal from the operation system (at least in Linux),
is this the meaning of "freeing" the thread? is this some higher
level implementation? something that actually create a new thread
that is waiting for new request instead of the "sleeping" one?
Am I missing something here?
Thanks
Basically the way the non-blocking sockets I/O work is by using polling and the state machine. So your scheme for many connections would be something like that:
Create many sockets and make them nonblocking
Switch the state of them to "connect"
Initiate the connect operation on each of them
Poll all of them until some events fire up
Process the fired up events (connection established or connection failed)
Switch the state those established to "sending"
Prepare the Web request in a buffer
Poll "sending" sockets for WRITE operation
send the data for those who got the WRITE event set
For those which have all the data sent, switch the state to "receiving"
Poll "receiving" sockets for READ operation
For those which have the READ event set, perform read and process the read data according to the protocol
Repeat if the protocol is bidirectional, or close the socket if it is not
Of course, at each stage you need to handle errors, and that the state of each socket is different (one may be connecting while another may be already reading).
Regarding polling I have posted an article about how different polling methods work here: http://www.ulduzsoft.com/2014/01/select-poll-epoll-practical-difference-for-system-architects/ - I suggest you check it.
To benefit from a non-blocking server, your code must also be non-blocking - you can't just run blocking code on a non-blocking server and expect better performance. For example, you must remove all calls to sleep() and replace them with non-blocking equivalents like IOLoop.add_timeout (which in turn involves restructuring your code to use callbacks or coroutines).
How To Use Linux epoll with Python http://scotdoyle.com/python-epoll-howto.html may give you some points about this topic.
Using a SignalR persistent connection with a JS long polling client we see inconsistent reconnection behavior in different scenarios. When the client machine's network cable is unplugged the JS connection does not enter the reconnecting state and it never (at least not after 5 minutes) reaches the disconnected state. For other scenarios such as a restart of the IIS web application a long polling JS connection does enter the reconnecting state and successfully reconnects. I understand that the reasoning behind this is that keep-alive is not supported for the long polling transport.
I can see that a suggestion has been made on github to better support reconnections for the long polling transport (https://github.com/SignalR/SignalR/issues/1781), but it seems that there is no commitment to change it.
First, is there a proper workaround for detecting disconnections on the client in the case of long polling.
Second, does anyone know if there are plans to support reconnection in the case described?
Cheers
We've debated different alternatives to support a keep alive "like" feature for long polling; however, due to how long polling works under the covers it's not easy to implement without affecting the vast majority of users. As we continue to debate the "correct" solution I'll leave you with one work around for detecting network failure in the long polling client (if it's absolutely needed).
Create a server method, lets call it ping:
public class MyHub : Hub
{
public void Ping()
{
}
}
Now on the client create an interval in which you will "ping" the server:
var proxy = $.connection.myHub,
intervalHandle;
...
$.connection.hub.disconnected(function() {
clearInterval(intervalHandle);
});
...
$.connection.hub.start().done(function() {
// Only when long polling
if($.connection.hub.transport.name === "longPolling") {
// Ping every 10s
intervalHandle = setInterval(function() {
// Ensure we're connected (don't want to be pinging in any other state).
if($.connection.hub.state === $.signalR.connectionState.connected) {
proxy.server.ping().fail(function() {
// Failed to ping the server, we could either try one more time to ensure we can't reach the server
// or we could fail right here.
TryAndRestartConnection(); // Your method
});
}
}, 10000);
}
});
Hope this helps!
I have been exploring Netty for the past days, as I am writing a quick and tight HTTP server that should receive lots of requests, and Netty's HTTP server implementation is quite simple and does the job.
My next step is as part of the request handling, I need to launch an HTTP request to an external web server. My intuition is to implement an asynchronous client that can send a lot of requests simultaneously, but I am a little confused as what is the right approach. My understanding is that Netty server uses a worker thread for each incoming message, therefore that worker thread would not be freed to accept new messages until my handler finishes its work.
Here is the punch: even if I have an asynchronous HTTP client in hand, it won't matter if I need to wait for each response and process it back with my server handler - the same worker thread would remain blocking all this time. The alternative is to use the async nature of the client, returning a future object quickly to release the thread and place a listener (meaning I have to return 200 or 202 status to the client), and check my future object to indicate when the response is received and I can push it to the client.
Does this make sense? Am I way off with my assumptions? What is a good practice to implement such kind of Netty acceptor server + external client with high concurrency?
Thanks,
Assuming you're asking about Netty 4.
Netty configured with a ServerBootstrap will have a fixed number of worker threads that it uses to accept requests and execute the channel, like so:
Two threads accepting / processing requests
bootstrap.group(NioEventLoopGroup(2))
One thread accepting requests, two threads processing.
bootstrap.group(NioEventLoopGroup(1), NioEventLoopGroup(1))
In your case, you have a channel includes a bunch of Http Codec decoding/encoding stuff and your own handler which itself makes an outgoing Http request. You're right that you don't want to block the server from accepting incoming requests, or decoding the incoming Http message, and there are two things you can do to mitigate that, you've struck on the first already.
Firstly, you want to use an Async netty client to make the outgoing requests, have a listener write the response to the original requests channel when the outgoing request returns. This means you don't block and wait, meaning you can handle many more concurrent outgoing requests than the number of threads available to process those requests.
Secondly, you can have your custom handler run in its own EventExecutorGroup, which means it runs in a separate threadpool from the acceptor / http codec channel handlers, like so:
// Two separate threads to execute your outgoing requests..
EventExecutorGroup separateExecutorGroup new DefaultEventExecutorGroup(2);
bootstrap.childHandler(new ChannelInitializer<SocketChannel>() {
#Override
public void initChannel(SocketChannel ch) {
ChannelPipeline pipeline = ch.pipeline();
.... http codec stuff ....
pipeline.addLast(separateExecutorGroup, customHandler);
}
};
Meaning your outgoing requests don't hog the threads that would be used for accepting / processing incoming ones.
I have an API which uses netty to open client connection to a tcp server. The server may send data to the client at any time. I'm facing the following scenario:
Client connects to server
Sends data to server
Disconnects and the JVM exist (not sure happens first)
This is what I expect:
Client connects to server
Sends data to server
Client simply keeps the connections open, waiting to receive data or for the user of client API to send data.
This is an outline of my connection method (obviously there is a much larger API around it):
```
public FIXClient connect(String host, int port) throws Throwable {
...
ChannelPipeline pipe = org.jboss.netty.channel.Channels.pipeline(...);
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
ClientBootstrap bootstrap = new ClientBootstrap(factory);
bootstrap.setPipeline(pipe);
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
ChannelFuture future = bootstrap.connect(new InetSocketAddress(host, port));
//forcing the connect call to block
//don't want clients to deal with async connect calls
future.awaitUninterruptibly();
if(future.isSuccess()){
this.channel = future.getChannel();
//channel.getCloseFuture();//TODO notifies whenever channel closes
}
else{
throw future.getCause();//wrap this in a more specific exception
}
return this;
}
```
That has nothing todo with netty... You need to make sure your "main" method will not exist if you call it from there. Otherwise it the job of the container..
There's a couple of ways you can do this, but one thing I have observed, is that with this code:
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
... if you make a successful connection, your JVM will not shutdown of it's own accord for some time until you force it (like a kill) or you call a releaseExternalResources() on your channel factory. This is because:
The threads created by Executors.newCachedThreadPool() are nonDaemon threads.
At least 1 thread would be created once you submit your connection request.
The cached thread pool threads have a keep alive time of 60 seconds, meaning they don't go away until they've been idle for 60 seconds, so that would be 60 seconds after your connect and send (assuming that they both completed).
So I'm not sure if you're diagnosing the issue correctly. Having said that, I recommend you handle the task this this way:
Once you boot in your main method (in the main thread)
Now launch all your actual useful work in new threads.
Once the useful threads have been launched, in the main thread, call Thread.currentThread().join(). Since main is always non-dameon, you have made sure the JVM will not shutdown until you're good and ready.
At some point, unless you want to kill -9 the JVM as a shutdown strategy, you will want a controlled shutdown, so you can add a shutdown hook to shutdown Netty and then interrupt the main thread.
I hope that's helpful.