Just getting into Meteor, which by many accounts seems like a great project. One potential issue (which it may not be) is there doesn't seem to be a meteor stop or another programmatic way to shut down meteor gracefully. Please let me know if I am wrong about this!
Are there potential concerns about maintaining database integrity (for example), if we interrupt the process using CTRL-C or shutting it down via an Activity Monitor? And are there steps we can take to reduce or eliminate such issues?
Caveat: I recognize the above questions are somewhat vague, and I understand that this is usually considered harmful on Stack, but I hope they are still answerable ones.
Thanks,
It does look like there is a cleanup which takes place before the process is terminated (https://github.com/meteor/meteor/blob/master/tools/cleanup.js).
The first signal sent is SIGINT which is a polite way to ask the process to shut down (and give it time to finish its last running thread)
With database integrity, the mongod process also tries to clean itself up before it shuts down & it has a recovery mechanism (from the journal files) on a quick recovery while restarting if forced to shutdown.
That being said, in the middle of a longer running thread I'm not too sure if it's allowed to finish or its killed immediately. But meteor does attempt to give it a chance to have a graceful termination at first, and then escalates it to a SIGHUP then finally a SIGTERM (which is still a graceful termination signal). At no point does meteor force or send a SIGKILL or SIGSTOP.
So meteor apps should be safe from Ctrl+C termination. With activity monitor termination it depends on what type of signal its sent (i.e Force Quit or just Quit)
So to add some closure to this, if your mongodb is externally managed, i.e. on a deployment production server meteor doesn't stop it as mongo-runner.js notes:
// Since it is externally managed, asking it to actually stop would be
// impolite, so our stoppable handle is a noop
if (process.env.MONGO_URL) {
launch_callback();
return handle;
}
Related
I use MPICH2. When I launch processes with mpiexec, the failure of one process will crash all other processes. How to avoid this?
In MPICH, there is a flag called -disable-auto-cleanup which will prevent the process manager from automatically cleaning up all processes when a single process fails.
However, MPI itself does not have much support for fault tolerance and this is something that the Fault Tolerance Working Group is working on adding in a future version of the MPI Standard.
For now, the best you can do is change the default MPI Error Handler away from MPI_ERRORS_ARE_FATAL, which causes all processes to abort, to something else like MPI_ERRORS_RETURN which would return the error code to the application and allow it to do something else. However, you're not likely to be able to communicate anymore after a failure has occurred, especially if you are trying to use collective communication.
I am implementing an ASP.NET application that needs to service conventional http requests but the responses require data that I need to acquire from providers that are executables that provide their data over sockets. My plan to implement was:
1) In Application_Start, start a new thread that starts a socket server
2) In Session_Start, launch the session-specific process that will ultimately connect to the socket server, and from there do a Monitor.Wait on a session-specific lock object which I've stored in Application.Contents by Session key
3) When the socket server sees a new connection, make the data available to the appropriate session Contents and do a Monitor.Pulse on the session-specific lock object
Is this technically feasible in IIS? Can this concept function as a stable system?
Before answering, please bear in mind I am not asking "is this the recommended approach", I am aware it is not and if I had the option to write this system from scratch I would do this differently. I'm also not able to change the fact that the programs communicate using sockets.
Given the constraints this approach makes sense.
Shutdown and recycling of IIS worker processes are always throny issues when it comes to keeping state in a web app. Note, that your worker process can recycle pretty much at any time for many reasons. Some of those reasons are unavoidable: Server reboot, app deployment, bug leading to a process crash. So you need to think through what happens in those cases: All sessions will be lost while the child processes still run. Suggested solution: Add the children into a Windows Job Object and configure the Job to be killed when the parent exits.
With overlapped IIS worker recycling you can have two functioning workers running at the same time. You must deal with that possibility.
Consider the possibility that the child process immediately crashes. It will never make a connection. Make sure your app doesn't hang waiting for the connection forever.
I have a strange case where one of my applications is causing the IIS (7.0) request queue to fill up. Requests are not being terminated after 30s as they should be. This then takes all DB connections from the pool and renders that app useless (other apps are unaffected).
I have no idea a) Why they are stalling in the first place, and b) why IIS is letting them sit there stalled rather than killing them. I would guess my app is locking something, perhaps something the GC is trying to reclaim.
My question is where do I start on debugging such an issue? I have no idea. It's currently happening only in production, but reasonably regularly (maybe once every 4 hours) on all web servers.
PS: There is potentially an argument that this question is better on serverfault than on SO, but given that I think this is a development problem with the app rather than an admin one, I have started on SO for now. I am however happy to re-post there if needed.
For reference using WinDbg was the solution I used. I attached WinDbg to w3wp process for the app pool once requests had queued. I could then view the call stack in each process, and although different most of them were sat waiting on a lock inside ResourceManager.
I still don't know why it was locking there, I thought ResourceManager was thread safe. I re-wrote some code to cache the ouput of ResourceManager in another class and that seems to be avoid the lock.
I have an application with a file receive location. After the host instance has been running for a few hours the receive location fails to identify new files dropped into the folder that it is monitoring. It doesn't forget about them altogether, it's just that performance grinds to a crawl. The receive location is configured to poll the target folder every 60 seconds but after host instance has been running for an hour or so, then it seems that the target folder is being polled only every thirty minutes. If I restart the host instance then the files waiting in the target folder are collected right away and performance is fine for the next hour or so.
The same application runs fine in a different environment.
There are now obvious entries in the event log related to the problem.
All the BizTalk SQL jobs are running fine except for Backup BizTalk Server (BizTalkMgmtDb).
Any suggestions gratefully received.
Thanks
Rob
Here are some additional tools which may help you identify and diagnose BizTalk database issues.
BizTalk MsgBox Viewer
Here is a tool to repair identified errors:
Terminator
Use at your own risk... read the glogs and docs. Start with the message box viewer and let us know our results.
Without more details, the biggest tell is that your Backup Job is failing. If the backup job is failing, it may not be properly configured. If it is properly configured and still failing, then you've got other issues. Can you give us some more information about your BizTalk install.
What version are you running?
What are our database sizes?
What are your purge and archive settings like?
Is there any long running blocks in your SQL Server DB coming from BizTalk?
Another thing to consider is the user accounts the send, receive and orchestration hosts are running under. Please check the BizTalk Administration Console. If they are all running the same account, sometimes the orchestrations can starve the send and receive processes of CPU time. I believe priority is given to orchestrations then receive, then send. Even if you are just developing, it is useful to use separate accounts for this. This also improves security.
The Wrox BizTalk Server 2006 will also supply tuning advice.
What other things are going on with the server? Is BizTalk pegged otherwise or is it idle?
You mention that the solution does not have any problems in another environment, so it's likely that there is a configuration problem.
Check the following:
** On SQL Server, set some upper memory limit for SQL Server. By default, SQL Server uses whatever it can get and then hangs onto it, so set a reasonable limit so that your system can operate without spending a lot of time paging memory onto and from your hard drive(s).
** Ensure that you have available disk space - maybe you are running low - this can lead to all kinds of strange problems.
** Try to split up the system's paging file among its physical drives (if you have more than one drive on the system). Also consider using a faster drive, or if you have lots of cash laying around, get a SAN.
** In BizTalk, is tracking enabled? If so, are you also tracking message bodies? Disable tacking or message body tracking and see if there is a difference.
** Start performance monitor and monitor the following counters when running your solution
Object: BizTalk Messaging
Instance: (select the receiving host) %%
Counter: Documents Received/Sec
Object: BizTalk Messaging
Instance: (select the transmitting host) %%
Counter: Documents Sent/Sec
Object: XLANG/s Orchestrations
Instance: (select the processing host) %%
Counter: Orchestrations Completed/Sec.
%% You may have only one host, so just use it. Since BizTalk configurations vary, I am using generic names for hosts.
The preceding counters monitor the most basic aspects of your server, but may help to narrow down places to look further. You can, of course, add CPU and Memory too. If you have time (days...maybe weeks) you could monitor for processes that allocate memory and never release it. Use the following counter...
Object: Memory
Counter: Pool Nonpaged Bytes
Slow decline of this counter indicates that a process is not releasing memory, which affects everything on the system.
Let us know how things turn out!
I had the same problem with, when my orchestration was idle for some time it took a long time to process the first msg. A article of EvYoung helped me solve this problem.
"This is caused by application domain unloading within the BizTalk host process. If an AppDomain is shutdown after idle, the next message that comes needs to wait for the Orchestration to compile again. Depending on the complexity of your design, this can be a noticeable wait. To prevent this in low latency requirement scenario, you can modify the BTSNTSVC.EXE.config file and set SecondsIdleBeforeShutdown property to -1. This will prevent AppDomain shutdown due to idle."
You can find the article in here:
http://blogs.msdn.com/b/biztalkcpr/archive/2008/05/08/thoughts-on-orchestration-performance.aspx
It took me to long to respond but i thought i might help someone. cheers :)
Some good suggestions from others. I will add :
Do you have any custom receive pipeline components on the receive location ? If so perhaps one is leaking memory, calling some external component eg database which is taking a long time ?
How big are the files you are receiving ?
On the File transport properties of your receive location, set "file renaming" on, do the files get renamed within 60s.
I found a bunch of scripts in the project I have been newly assigned to that are the "shutdown" scripts. They just do some basic searches and run the Unix kill command. Is there any reason they shouldn't shutdown the process this way? Does this ensure that dynamically allocated memory will return properly? Are there any other negative effects? I've operated under an intuition that this is a last resort way of terminating a process.
The kill command sends a signal to a Unix process. That signal defaults to SIGTERM, which is a polite request for the program to exit.
When a process exits for any reason, the Unix OS does clean up its memory allocations, file handles and other resources. The only resources that do not get cleaned up are those that are supposed to be shared, like the contents of files and of shared memory (like System V IPC).
Many programs do not need to do any special cleanup on exit and use the default SIGTERM behavior, which is to let the OS stop the process.
If a program does need special behavior, it can install a signal handler, and it can then run a function to handle the signal.
Now the SIGKILL signal, which is number 9, is evil, but also necessary. This signal never gets to the process itself, the OS simple stops the process. This should only be used when really, really necessary. It often becomes necessary in multithreaded programs that get into deadlocks or programs that have installed a TERM signal handler, but screwed up during their exit process.
kill is a polite request for the program to end. It cleans up its memory, closes its handles and other such niceities. It sends a SIGTERM
kill -9 tells the operating system to grab the process by the balls and throw it the hell out of the bar. Obivously it is not concerned with niceities - although it does reclaim all the memory, as it's the Operating System's responsability to keep track of that. But because it's a forceful shutdown you may have problems when trying to run the program again (not cleaning up .pid files for example)
See also [wikipedia](http://en.wikipedia.org/wiki/Kill_(Unix)
Each process runs in its own protected address space, and when the process ends (whether it exits voluntarily or is killed by an external signal) that address space is fully reclaimed. So yes, all if its memory is released properly.
Depending on the process, it may or may not cause other problems next time you try to run it. For example, it may have some files open and leave them in an inconsistent state if it's killed unexpectedly. (The files will be closed automatically, but it could be in the middle of writing some application data, for example, and the files may contain incomplete/inconsistent data if interrupted.)
Typically when the system is shutting down, all processes will be sent signal 15 (SIGTERM), at which they can perform whatever cleanup/shutdown actions they need to do. Then a short time later, they'll get signal 9 (SIGKILL), which immediately kills them, without giving them any chance to react in any way. This gives all processes a chance to clean up for themselves, and then forcefully kills any processes that aren't responding promptly.
kill -9
is the last resort, not kill.
Yes memory is reclaimed (this is the OS's responsibility)
The programs can respond to the signal however they want, it's up to the particular program to do "the right thing"
kill by default will send a terminate signal which will allow the process to exit gracefully. If the process does not seem to exit in a timely fashion, some scripts will then fall back on kill -9 which forces an exit, 'ready or not'.
In all cases OS managed things such as dynamic memory will be returned, files closed etc. But application level things may not be tidied up on a -9 kill.
kill merely sends a signal to the process. The process can trap signals (except for signal 9) and run code to perform shutdown. An app's shutdown is supposed to be brief, but it may not be instantaneous.
In any case, once the process exits, the operating system will reclaim dynamically allocated memory, close open file descriptors, and other resources.
There could be some resources that survive, for example if the app held shared memory or sockets that are also held by other (still living) processes.