I have a WIFI (NetworkType.UNMETERED) constraint set on my Worker.
The Worker is uploading a heavy file to my server. It is busy waiting on an async upload task. (using Tasks.await(task)) After it finishes uploading, my Worker should return Result.success().
I have tried testing what happens when the WIFI constraint becomes unmet while the Worker is busy waiting on my upload task.
So, first it looks like the onStopped() method is called as expected. And then, there is like a 2 seconds delay, until the Worker proceeds the Tasks.await statement (as it failed). Afterwards, I tried checking if isStopped() and if true, I should return a Result.retry().
So I expect, that when the WIFI is back, it should start-over my Worker, preferably immediately, but it never happens, so I'm pretty much stuck here.
What is exactly the flow of when one of the constraints of my running Worker, becomes unmet?
Related
Scenario : The server is in middle of processing a http request and the server shuts down. There are multiple points till where the code has executed. How are such cases typically handled ?. A typical example could be that some downstream http calls had to be made as a part of the incoming http request. How to find whether such calls were made or not made when the shutdown occurred. I assume that its not possible to persist every action in the code flow. Suggestions and views are welcome.
There are two kinds of shutdowns to consider here.
There are graceful shutdowns: when the execution environment politely asks your process to stop (e.g. systemd sends a SIGTERM) and expects it to exit on its own. If your process doesn’t exit within a few seconds, the environment proceeds to kill the process in a more forceful way.
A typical way to handle a graceful shutdown is:
listen for the signal from the environment
when you receive the signal, stop accepting new requests...
...and then wait for all current requests to finish
Exactly how you do this depends on your platform/framework. For instance, Go’s standard net/http library provides a Server.Shutdown method.
In a typical system, most shutdowns will be graceful. For example, when you need to restart your process to deploy a new version of code, you do a graceful shutdown.
There can also be unexpected shutdowns: e.g. when you suddenly lose power or network connectivity (a disconnected server is usually as good as a dead one). Such faults are harder to deal with. There’s an entire body of research dedicated to making distributed systems robust to arbitrary faults. In the simple case, when your server only writes to a single database, you can open a transaction at the beginning of a request and commit it before returning the response. This will guarantee that either all the changes are saved to the database or none of them are. But if you call multiple downstream services as part of one upstream HTTP request, you need to coordinate them, for example, with a saga.
For some applications, it may be OK to ignore unexpected shutdowns and simply deal with any inconsistencies manually if/when they arise. This depends on your application.
I have a Nifi processor that is calling an external service that can take days before a result is returned. During this time the processor can call Thread.sleep() periodically to relinquish CPU.
The issue is that even if Thread.sleep() is called in an onTrigger() method, the NiFi processor will not read in and handle new FlowFiles since it is waiting for onTrigger() to finish. From NiFi's perspective the cpu is still blocking for the asynchronous call to finish.
Is there a way to maintain concurrency when asynchronous calls are being made in the onTrigger() method of a NiFi processor?
Val Bonn's suggestion of pushing asynchronous FlowFiles back to a WAIT queue works well. As asynchronous requests come in, java Process objects are created and held in memory. The FlowFile is then routed to a WAIT relationship which is connected back into the processor. Periodically FlowFiles from the WAIT queue are checked against the corresponding Process to see if it completed and are then routed to a SUCCESS relationship, otherwise they are penalized. This allows many long running asynchronous processes to be kicked off without allocating precious cpu resources for each incoming request. One source of complexity was handling processor shutdowns invoked from the UI. In these situations an onStopped method is invoked that waits for all in memory processes to complete and archives the stderr and stdout to disk. When the processor is started again, the archive is read back in and paired against any FlowFiles in the WAIT queue.
If I call WinUSB_AbortPipe() just as WinUSB_ReadPipe() starts, I get into a deadlock state. I ran the debug trace log that is provided here. Below is the last 5 lines in the log where the problem occurs. I think ReadPipe must have missed the signal, and AbortPipe is waiting for ReadPipe to complete.
[0]4E34.4B58::06/09/2015-15:42:12.528 - IOCTL_WINUSB_READ_PIPE
[0]4E34.4B58::06/09/2015-15:42:12.528 - PIPE129: (00000019) The read has been added to the raw io queue
[0]4E34.4B58::06/09/2015-15:42:12.528 - PIPE129: (00000019) The read is being handled
[2]4E34.4ECC::06/09/2015-15:42:12.529 - IOCTL_WINUSB_ABORT_PIPE
[2]4E34.4B58::06/09/2015-15:42:12.529 - PIPE129: (00000019) Reading 64 bytes from the device
In my design, I have the IN endpoints read asynchronously into buffers. I found that it is best to set the timeout of the read operation to infinite because the driver hates it when I cause STALLs to occur (ran into other issues with that). So I need to have the disconnect sequence cause the threads to wake up to realize that we need to close. Is there any way to safely do that?
My workaround for this is to instead call WinUsb_ResetPipe(). This causes WinUSB_ReadPipe() to unblock, and doesn't seem to lock up as WinUSB_AbortPipe() sometimes does. The only evidence that I have that this works is through successfully running tests over several hours, so I can't guarantee that this is a solution.
I have a bash script where i kill a running process by sending the SIGTERM signal to it's process ID. However, i want to know the return code of the process i just sent the signal.
Is that possible?
i cannot use 'wait' because the process to kill was not started from my script and i'm receiving
"pid ##### is not a child of this shell"
I did some tests in a command line, in a console where the process was running, after i send the SIGTERM signal (from another console), i checked the exit code and it was 143.
I want to kill the process from a different script and catch that number.
As shellter said, you cannot get the exit code of a process except using wait (or waitpid(), etc...) and you can only do that if you are its parent.
But even if you could, think about this:
When you send a process a SIGTERM, only one of three things can happen:
The process has not installed any signal handler for SIGTERM. In this case it dies immediately as a result of the signal. But in this case the exit code is uninteresting – you already know what it is. On most platforms it is 143 (128 + integer value of SIGTERM), indicating, unsurprisingly, that the process has died as a result of SIGTERM.
The process has configured SIGTERM to be ignored. In this case, nothing happens, the process does not die, and so there is no exit code to obtain anyway.
The process has installed a signal handler for SIGTERM. In this case, the handler is invoked. The handler might do anything at all: possibly nothing, possibly exit immediately, possibly carry out some cleanup operation and exit later, possibly something completely different. Even if the process does exit, that's only an indirect result of the signal, and it happens at a later time, so there is no exit code to obtain that comes directly from the delivery of the signal.
My understanding of the (JavaScript) hub client is that if a connection is lost, it enters a 'Reconnecting...' phase which attempts to reconnect. If it can't do so, it will enter a 'Disconnected' state which is where it'll stay until asked to start again.
How long is the 'Reconnecting...' phase meant to last before it gives up? I've read 40 seconds before, but my client seems to take much less time - about 10, maybe less. [EDIT: Nevermind this part, I had configured a 10 disconnect on the server as a test... and forgot. I understand this is set by the server during the negotiate. Makes sense!] ... I'd prefer to have the client continually retry until it is told to abort - can this be done, and would it cause issues?
Another question; during the Reconnecting... phase, if I attempt to call a hub method (again, in JS) it never seems to complete. I'm using the returned Deferred to check for 'done' and 'fail' events, but neither seems to get called. Is this by design?
Thanks.
You can definitely have it continually reconnect.
Handle the disconnected event on the client and call connection.start:
$.connection.hub.disconnected(function() {
setTimeout(function() {
$.connection.hub.start();
}, 5000); // Re-start connection after 5 seconds
});
The only issues this would cause is that you could potentially be triggering infinite requests to a server that isn't there for client machines. This becomes even more troublesome when you introduce the mobile market into the situation (drains battery like crazy).
When you attempt to call a hub method while reconnecting SignalR will try to send your command. Since there are 2 channels, one for receiving data and one for sending, (for all transports except web sockets) in some cases it can still be possible to send requests while your offline. Therefore SignalR does not know if a request fails until the browser tells it that it could not successfully make the request.
Hope this helps!
I might have a clue... Touching the Web.config produces an appPool Recycle, meaning that a new worker process will be created for new requests while the existing process will continue for a while until the remaining requests end or the timeout is reached. Request that do not end in the timeout period are terminated.
Signalr client reconnects to the new process while the long running task is running in the old process, so when on the long running task you do
GlobalHost.ConnectionManager.GetHubContext<ForceHub>();
you actually get a reference for "old" hub while the client is connected to the "new" hub.
That's why the test preformed by Wasp worked: he was making a new request to publish on the signalr hub that was processed in the newly created worker process.
You could try to configure a singalr backplane (https://www.asp.net/signalr/overview/performance/scaleout-in-signalr), it’s really easy to configure it using Sql Server (https://www.asp.net/signalr/overview/performance/scaleout-with-sql-server). The backplane should be capable of connect the two worker processes and hopefully you will get the notification on the client.
If this is the problem, notifications generated by new requests will work even without the backplane. Notice that the real purpose of the backplane is to scale out signalr, this is, to connect a farm of WebServers between them.
Also keep in mind that running long-running task inside IIS is as task hard to achieve as, among other things, IIS does regular appPool recycles and has timeout limits for the requests to execute. I recommend that you read the following post: http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
“If you think you can just write a background task yourself, it's likely you'll get it wrong. I'm not impugning your skills, I'm just saying it's subtle. Plus, why should you have to?”
Hope this helps