I run SAS batch jobs on a UNIX server and usually encounter the problem that I cannot overwrite sas datasets in batch that have been created by my user locally without changing the authorization level of each file in Windows. Is it possible to signon using my user id and password when initializing the batch job to enable me to get full authorization (to my own files) in batch?
Another issue is that I don't have authorization to run UNIX commands using PIPE on a local remote session on the server and can hence not terminate my own sessions. It is on the other hand possible to run PIPE in batch, but this only allows me to terminate batch jobs so I also wonder if it is possible to run a pipe command in batch using my id and password as the batch user does not have authorizatio to cancel "local remote sessions" on my user?
Example code for terminating process:
%let processid = 6938710;
%let unixcmd = "kill &processid";
%PUT executing &unixcmd;
filename unixcmd pipe &unixcmd.;
there's a good and complete answer to your first point in the following SAS support page.
You can use the umask Unix command to specify the default file permission policy used for the permanent datasets created during a SAS session (be it batch or not).
If you are lauching a Unix script which invokes a SAS batch session you can put a umask command just before the sas execution.
Otherwise you can adopt a more permanent solution including the umask command in one of the places specified in the above SAS support article.
You are probably interested in something like:
umask 002
This will assign a rw-rw-r-- file permission to all new datasets.
Related
I outlined my small project in a different post - to summarize it again quickly, I am trying to do the following:
Write an R script that pulls data from a website
Schedule the R script to automatically run daily at the same time
Write / append the R script's output to a database
I am familiar with R web-scraping packages (rvest, rselenium) for doing the first bullet. For the 2nd bullet, just today I learned how to create a crontab to run my script when I desire, however the crontab does not run the script when my computer is off, or so I've read.
How can I have it such that the crontab is run even with my computer off? I am somewhat (not really) familiar with EC2 instances, but if I have my R script in an EC2 instance, could I schedule a crontab for the script there and then it would run with my computer off?
Thanks in advance for help!
Since cron is a service that runs on the instance you can't have it start the EC2 instance for you - it's a catch-22.
You can treat EC2 instances as computers that run in someone else's cellar (most of the time at least). You wouldn't expect a computer to run code when it's not turned on and it's exactly the same for an EC2 instance.
I suggest you consider if this is really the setup you want, it sounds to me that you'd be better served using AWS Lambda combined with one of Amazon's hosted data stores (RDS, DynamoDB, SimpleDB, or even S3). The downside here is that you're limited to JavaScript, Python, and Java and as such can't use R (well, you can, but it's messy since you'll have to package everything you need in a JS/Python/Java app and start it from there).
If you really want to run your R script on the EC2 instance you can start the instance with a lambda and then shut it down from your script. Just make sure your instance isn't set to terminate on shutdown.
Regardless of what path you chose you will need to create a lambda and run it from a scheduled CloudWatch Event.
Then you just need to implement the lambda, either to run your script or to use the EC2 API to start the instance.
If you use the lambda to start the EC2 instance you should not use cron on the instance to run the script at a specific time, but run it on startup. Then you have your script shut down the instance when it's finished.
Here's an example Python script for starting an EC2 instance from a lambda to get your started:
import logging
import boto3
# Set up logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Set up a boto session to get credentials and region
session = boto3.session.Session()
# Set up EC2
ec2 = session.resource("ec2")
# The instance to start
instance_id = "i-1234567890abcd"
def lambda_handler(event, context):
logger.info('Start handling event.')
logger.info('Starting instance ' + instance_id)
instance = ec2.Instance(instance_id)
response = instance.start()
try:
current_state = response['StartingInstances'][0]['CurrentState']
except (KeyError, IndexError) as e:
logger.warn('Unexpected response when starting instance: {}'.format(response))
else:
if current_state not in ('pending', 'running'):
logger.warn('Instance {} is in unexpected state {} after starting'.format(id, current_state))
else:
logger.info('Started instance ' + instance_id)
I'm executing a batch file inside an R script. I'd like to run this and another large section of the R script twice using a foreach loop.
foreach (i=1:2, .combine = rbind)%do%{
shell.exec("\\\\network\\path\\to\\batch\\script.ext")
*rest of the R script*
}
One silly problem though is that this batch file generates data and that data is connected to SQL Server localdb inside the loop. I thought at first that the script would execute the batch file, wait for it to finish and then move on. However, (seems obvious in hindsight) the script instead executes the batch file, tries to grab data that hasn't been created yet (because the file isn't finished running) and the executes the batch file again before it finishes the first time.
I've been trying to find away to delay the rest of the script from executing until the batch script has finished executing but have not come up with anything yet. I'd appreciate any insights anyone has.
Use system2 instead of shell.exe. system2 calls are blocking — meaning, the function waits until the external program has finished running. On most systems, this can be used directly to run scripts. On Windows, you may have to invoke rundll32 to execute a script:
cmd = c('rundll32.exe', 'Shell32.dll,ShellExecute', 'NULL', 'open', scriptpath)
system2(paste(shQuote(cmd), collapse = ' '))
Windows users may use shell, which by default has wait=TRUE, which will cause R to wait for its completion. You may choose whether or not to directly "intern" the result.
On unix-like systems, use system, which also defaults to wait=TRUE.
If your batch file simply launches another process and terminates, then it may need to be modified to either wait for completion or return a suitable process or file indicator that can be monitored.
Can anyone show me an example of script that can be run from sqoop2 client in batch mode?
I refered http://sqoop.apache.org/docs/1.99.2/Sqoop5MinutesDemo.html
and it says we can run sqoop2 client in batch mode using the following command
sqoop.sh client /path/to/your/script.sqoop
but that script.sqoop isn't like sqoop1 script, so how should it be?
Batch file is nothing but a list of the same commands you would otherwise type in interactive mode (plus comment lines starting with pound sign).
However! Some commands require manual input, thus cannot be easily fully automated (e.g., 'create link' command). See this thread for details.
I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.
I'm running batch Java on an IBM mainframe under JZOS. The job creates 0 - 6 ".txt" outputs depending upon what it finds in the database. Then, I need to convert those files from Unix to MVS (ebcdic) and I'm using OCOPY command running under IKJEFT01. However, when a particular output was not created, I get a JCL error and the job ends. I'd like to check for the presence or absence of each file name and set a condition code to control whether the IKJEFT01 steps are executed, but don't know what to use that will access the Unix file pathnames.
I have resolved this issue by writing a COBOL program to check the converted MVS files and set return codes to control the execution of subsequent JCL steps. The completed job is now undergoing user acceptance testing. Perhaps it sounds like a kludge, but it does work and I'm happy to share this solution.
The simplest way to do this in JCL is to use BPXBATCH as follows:
//EXIST EXEC PGM=BPXBATCH,
// PARM='pgm /bin/cat /full/path/to/USS/file.txt'
//*
// IF EXIST.RC = 0
//* do whatever you need to
// ENDIF
If the file exists, the step ends with CC 0 and the IF succeeds. If the file does not exist, you get a non-zero CC (256, I believe), and the IF fails.
Since there is no //STDOUT DD statement, there's no output written to JES.
The only drawback is that it is another job step, and if you have a lot of procs (like a compile/assemble job), you can run into the 255 step limit.