I'm in the (ever-going) process of diagnosing a baffling problem with an application we use at work.
First, some notes about this application:
Required to run as the root user
Runs on the Solaris 10 operating system
Compiled for C++14
Normal shutdown is conducted by receiving SIGTERM
Writes a log file (explicitly sets permissions to 660) to a data directory (with default 770 permissions)
The application runs fine and does everything it's supposed to do, up until the point it terminates. Upon termination, the application is changing the permissions on the data directory from 770 to 660.
My coworkers are as baffled as I am. Even our system administrator doesn't understand why this is happening.
Things I've tried:
Print statements: The application reports the directory permissions are 770 until the exit or return statements
Check the logging: The logging mechanism is shared with several other applications, none of which have this issue
Running as myself: The directory's permissions are not changed on termination
Change umask to 027: The directory's permissions are still changed to 660
Check system logs: The sudo and messages logs do not show any calls to chmod for the directory (except those made to change the permissions back)
Due to the nature of the application, I cannot provide any of the code here for review/inspection. Further, many of the standard diagnostics tools are unavailable on the system in question.
However, I'm hopeful the gurus here can provide insight into what might be causing this problem, where to look going forward, or (ideally) how to fix it.
You can use the following simple dTrace script to get a stack trace from any process that calls chmod() (or any of its variants such as fchmod() or fchmodat()):
#!/usr/sbin/dtrace -s
syscall::fchmodat:entry
{
printf{ "\nExecname: %s\n", execname );
ustack();
}
You can filter by execname to only print chmod call stacks from your executable with
#!/usr/sbin/dtrace -s
syscall::fchmodat:entry
/ execname == "yourExecName" /
{
ustack();
}
You can add more or less stack frames with ustack( 10 ); to print, for example, 10 stack frames. If you want longer or shorter function names in the stack trace, you can specify the string length with ustack( 10, 50 ); to print 10 stack frames with each function name printing up to 50 characters.
If your binary has been completely stripped of symbol names you may not get function names, only addresses.
As it's a C++ binary, you might have to demangle the function names.
Once you get a stack trace, you can start working on what exactly is happening.
Related
I am trying to align my logging with the best practice of using STDERR.
So, I would like to understand what happens with the logs sent to STDERR.
Symfony official docs (https://symfony.com/doc/current/logging.html):
In the prod environment, logs are written to STDERR PHP stream, which
works best in modern containerized applications deployed to servers
without disk write permissions.
If you prefer to store production logs in a file, set the path of your
log handler(s) to the path of the file to use (e.g. var/log/prod.log).
This time I want to follow the STDERR stream option.
When I was writing to a specific file, I knew exactly where to look for that file, open it and check the logged messages.
But with STDERR, I don't know where to look for my logs.
So, using monolog, I have the configuration:
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
excluded_http_codes: [404, 405]
nested:
type: stream
path: "php://stderr"
level: debug
Suppose next morning I want to check the logs. Where would I look?
Several hours of reading docs later, my understanding is as follows:
First, the usage of STDERR over STDOUT is preferred for errors because it is not buffered (gathering all output waiting for the script to end), thus errors are thrown immediately to the STDERR stream. Also, this way the normal output doesn't get mixed with errors.
Secondly, the immediate intuitive usage is when running a shell script, because in the Terminal one will directly see the STDOUT and STDERR messages (by default, both streams output to the screen).
But then, the non-intuitive usage of STDERR is when logging a website/API. We want to log the errors, and we want to be able to monitor the already occurred errors, that is to come back later and check those errors. Traditional practice stores errors in custom defined log-files. More modern practice recommends sending errors to STDERR. Regarding Symfony, Fabien Potencier (the creator of Symfony), says:
in production, stderr is a better option, especially when using
Docker, SymfonyCloud, lambdas, ... So, I now recommend to use
php://stderr
(https://symfony.com/blog/logging-in-symfony-and-the-cloud).
And he further recommends using STDERR even for development.
Now, what I believe to be missing from the picture (at least for me, as non-expert), is the guidance on HOW to access and check the error logs. Okay, we send the errors to STDERR, and then? Where am I going to check the errors next morning? I get it that containerized platforms (clouds, docker etc) have specific tools to easily and nicely monitor logs (tools that intercept STDERR and parse the messages in order to organize them in specific files/DBs), but that's not the case on a simple server, be it a local server or on a hosting.
Therefore, my understanding is that sending errors to STDERR is a good standardization when:
Resorting to using a third-party tool for log monitoring (like ELK, Grail, Sentry, Rollbar etc.)
When knowing exactly where your web-server is storing the STDERR logs. For instance, if you try (I defined a new STD_ERR constant to avoid any pre-configs):
define('STD_ERR', fopen('php://stderr', 'wb'));
fputs(STD_ERR, "ABC error message.");
you can find the "ABC error message" at:
XAMPP Apache default (Windows):
..\xampp\apache\logs\error.log
Symfony5 server (Windows):
C:\Users\your_user\.symfony5\log\ [in the most recent folder, as the logs rotate]
Symfony server (Linux):
/home/your_user/.symfony/log/ [in the most recent folder, as the logs rotate]
For Symfony server, you can actually see the logs paths when starting the server, or by command "symfony server:log".
One immediate advantage is that these STDERR logs are stored outside of the app folders, and you do not need to maintain extra writable folders or deal with the permissions etc. Of course, when developing/hosting multiple sites/apps, you need to configure the error log (the STDERR storage) location per app (in Apache that would be inside each <VirtualHost> conf ; with Symfony server, I am not sure). Personally, without a third-party tool for monitoring logs, I would stick with custom defined log files (no STDERR), but feel free to contradict me.
I created a client-server application and now I would like to deploy it.
While development process I started the server on a terminal and when I wanted to stop it I just had to type "Ctrl-C".
Now want to be able to start it in background and stop it when I want by just typing:
/etc/init.d/my_service {stop|stop}
I know how to do an initscript, but the problem is how to actually stop the process ?
I first thought to retrieve the PID with something like:
ps aux | grep "my_service"
Then I found a better idea, still with the PID: Storing it on a file in order to retrieve it when trying to stop the service.
Definitely too dirty and unsafe, I eventually thought about using sockets to enable the "stop" process to tell the actual process to shut down.
I would like to know how this is usually done ? Or rather what is the best way to do it ?
I checked some of the files in the init.d and some of them use PID files but with a particular command "start-stop-daemon". I am a bit suspicious about this method which seems unsafe to me.
If you have a utility like start-stop-daemon available, use it.
start-stop-daemon is flexible and can use 4 different methods to find the process ID of the running service. It uses this information (1) to avoid starting a second copy of the same service when starting, and (2) to determine which process ID to kill when stopping the service.
--pidfile: Check whether a process has created the file pid-file.
--exec: Check for processes that are instances of this executable
--name: Check for processes with the name process-name
--user: Check for processes owned by the user specified by username or uid.
The best one to use in general is probably --pidfile. The others are mainly intended to be used in case the service does not create a PID file. --exec has the disadvantage that you cannot distinguish between two different services implemented by the same program (i.e. two copies of the same service). This disadvantage would typically apply to --name also, and, additionally, --name has a chance of matching an unrelated process that happens to share the same name. --user might be useful if your service runs under a dedicated user ID which is used by nothing else. So use --pidfile if you can.
For extra safety, the options can be combined. For example, you can use --pidfile and --exec together. This way, you can identify the process using the PID file, but don't trust it if the PID found in the PID file belongs to a process that is using the wrong executable (it's a stale/invalid PID file).
I have used the option names provided by start-stop-daemon to discuss the different possibilities, but you need not use start-stop-daemon: the discussion applies just as well if you use another utility or do the matching manually.
I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.
I have a bit of a problem with Xen. Each time I try to run xm create I get the following error:
dom0:~# xm create -c staros.xm
Using config file "./staros.xm". Started domain StarOS-3 xenconsole: Could not read tty from store: No such file or directory
Is this familiar to anyone?
I believe my config is in order. At first I suspected the path to qemu-dm wasn't set correctly.
The error you are describing could mean two things:
It is documenting a well known race in xenstore
The psuedo TTY needed to attach to a domain's console is stored in xenstore in several places. The Xen console client establishes an inotify style watch on that value, so that it can reconnect to the console if the backing file descriptor happens to change. However, takes a few seconds for that information to be populated in xenstore from the time that the domain is initially created.
If you post the output of xm info, it would be easy to see if you are dealing with a well known race.
The backing psuedo terminal can't be created
Common reasons for this would be /dev/pts not being mounted. If you run xenstore-ls /local/domain/{domain_id} after starting the domain without the -c option, you will see the contents of the store for that domain. Look for the line (near the bottom) that says
tty="/dev/pts/{pty}"
Verify that the pty does, in fact, exist.
The xen console daemon uses two actual file descriptors to make it happen. The first is a psuedo file descriptor (obtained via xs_fileno()) on that specific piece of information in the node, so it can poll() to see if that information changes. The second is a real FD returned from open() (yes, O_NONBLOCK is passed) which actually reads/writes to the psuedo tty.
It looks like it's not even finding the psuedo FD from xenstore, which means the backing pty is likely existentially challenged.
// Java programmers, when I mean method, I mean a 'way to do things'...
Hello All,
I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)
main()
{
$today = Get_Current_Date();
$yesterday = Subtract_One_Day($today);
`grep $yesterday '/path/to/log' > /tmp/log` # Get logs from previous day
`awk '{print $X}' > /tmp/log_codes`; # Get Log Code
SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}
Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.
These servers are production servers and serve a couple million customers...
The Log files average 3.3GB (which are logs for about two days)
So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.
Anyone have any suggestions?
Thanks for reading this far =)
Possible solutions: 1) have the system start a new log file every midnight -- this way you could mine the finite-size log file of the previous day at a reduced priority; and 2) modify the logging system so that it automatically extracts certain messages for further processing on the fly.