Terminate and restart Google DataLab instance? - jupyter-notebook

I am finding when working with larger datasets that the kernel may die, something I also experiance on my local machine. Sometimes it comes back and sometimes not. So even the Tree panel won't react to terminate a errant Kernel. EG "restart" does not work and the server itself seems to die. So the tree view won't respond or refresh. On my local machine I just kill the terminal instance and start over.
What is the "proper" way to restart everything?
FWIW the instance seems pegged at 150% cpu utilization atm
Related: is there any way to allow long running stuff to work?
I am trying to use a report generator (pandas-profiling) on a 2mm record dataset.. Works on my local..

found it here: https://cloud.google.com/datalab/getting-started
FWIW These commands can be used in the new command line shell on the Cloud console page.see https://cloud.google.com/shell/docs/ .. Without the sdk on your machine.. You need to modify the commands slightly since you will be logged into your project already,
Stopping/starting VM instances
You may want to stop a Cloud Datalab managed VM instance to avoid incurring ongoing charges. To stop a Cloud Datalab managed machine instance, go to a command prompt, and run:
$ gcloud auth login
$ gcloud config set project <YOUR PROJECT ID>
$ gcloud preview app versions stop main
After confirming that you want to continue, wait for the command to complete, and make sure that the output indicates that the version has stopped. If you used a non-default instance name when deploying, please use that name instead of "main" in the stop command, above (and in the start command, below).
For restarting a stopped instance, run:
$ gcloud auth login
$ gcloud config set project <YOUR PROJECT ID>
$ gcloud preview app versions start main

Related

airflow logs: break long logs into smaller multiple files

I see that airflow logs are stored at
base_log_folder/dag_id/task_id/date_time/1.log
i.e:
base_log_folder/dag_id={dag_id}/run_id={run_id}/task_id={task_id}/attempt={try_number}.log
Sometime my logs are huge and know its now a good idea to check them from the web ui, because the chrome cant handle so much size of logs.
I have access to the server and can check the logs.
So how can i break the longs into smaller files - v
i.e
{try_number}_1.log
{try_number}_2.log
{try_number}_3.log
...
Also noted that the log file {trynumber}.log, is only created when the task is completed.
while the task is running i can check the logs in the webui, but i dont see any file in the corresponding log folder.
So i need two things for logging from the server side:
break large log files into smaller files
see the log file live while the task is running, not only after the task is completed
In Airflow 2.4.0 there is an option to view full logs or only the first fragment thus huge logs are not loaded automatically:
Starting Airflow 2.5.0 the web UI also does auto tails for logs (PR)
Airflow does show live logs. If you will set for example a Sensor task that pokes resource you will see the poking attempts in the log when the task is running. It's important to note that there are local logs and remote logs (docs):
In the Airflow UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs can not be found or accessed, local logs will be displayed. Note that logs are only sent to remote storage once a task is complete (including failure). In other words, remote logs for running tasks are unavailable (but local logs are available).
Huge logs are often a sign of not using log levels. If you have entries relevant for debugging then set DEBUG mode rather than INFO mode that way you can better control over the log size displayed in the UI using the AIRFLOW__LOGGING__LOGGING_LEVEL variable.

Windows Server 2016 mount shared folder as disc permanently also on restart

I'm have local access to a shared folder with credentials.
net use Z: \\server\folder /u:user /persistent:yes passwd
The files in this folder are to be used by other processes.
The problem is that when the machine is restarted, the connection is lost, meaning that I can see that the Network Device Z is there, but it does not automatically connect to it, so that the processes have no longer access to the folder.
I have tried several solutions with no avail:
Creating a Windows scheduled task to run a script with 'net use Z:...',
a. with my_user I get this error in Event Viewer: "The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID"
b. with System, I get no errors but the connection is not created.
I've tried to log stdout and stderr to debug the script, but it does not even
create the log file. (The used script is shown below)
#echo off
'> map.log 2>&1(
net use Z: \server\folder /u:user /persistent:yes passwd
)
Crating a Symlink was another solution I had found, but it is not suitable for my case,
because it does not allow credentials. It seems that the resource has to be either
unprotected or already available locally.
Creating a Startup Script from Local Group Policy Editor, I get the same error in
Event Viewer
Running 'net use Z:...' via a Python script installed ad Windows Service,
import subprocess
proc=subprocess.Popen(r'net use Z: \server\folder /u:user pass', shell=True)
out, err = proc.communicate()
print("out:%s, err:%s"%(out,err))
If I run the python script manually, it works, while running it as a service creates a
disconnected drive.
Can anyone give me an advice on how to proceed?
Thanks

Google Cloud Build - how to do a build/deploy to Firebase with a stateful container that "spins down" to 0 (like cloud run)

We have most, but not all of our build artifacts into a Git repo (Bitbucket).
Our current build looks like this, and takes 30+ minutes to build/deploy to Firebase, we would like to reduce the time to build.
We are not using Google Cloud Build at the moment, but before heading down that path, I want to find out if that would even be fruitful.
We have all of the code cloned from the git repo (Bitbucket), to a GCE VM.
And then 1 TB of static data is then copied into a directory under the git repo area, artifacts that are needed for the deploy.
We do not want to check in that 1TB of data into the git repo, it is from a 3rd party, it is rarely updated, and would be too heavy of a directory to pull into developer environments on their IDE's, it is pointless to do so.
We launch a build script on the GCE VM to build the code, and deploy to Firebase (bash script), it takes about 30 minutes.
We want the builds to go faster, and possible to use cloud build.
With this:
a git repo
external files that need to remain in a stateful container, not copied over each time, due to the time it would take
how do we create a stateful container that would only require a git update (pull origin master), and then to fire off a build/deploy to Firebase?
We want to avoid ingress traffic to the Firebase deploy using external build services where the 1TB of data that remains the same each and every time is sent to Firebase, where we would be billed.
Cloud Run containers are not stateful. GCE VM's are stateful, but it requires that we keep them up and going 24x7x365, so that any developer anywhere can run a build, and that may take only 30 minutes out of any day, and we don't know when that will be, so leaving it up 24x7x365 is mostly wasteful.
We want to avoid building a stateless container where the code is checked out fresh each and every time, a git pull origin master will do, and to have to copy the 1TB of artifacts into the container each and every time taking time.
We just want to do:
git pull origin master
Fire off the build as the next step in the script
spin down the container, have it save it's state for the next build, minimizing time, each and every time, saving the previous 'git pull origin master' updated artifacts, and preserving the 1TB files we copied to the container.
The ideal situation would be to have a container that is stateful, that spins down when not in use, and "spins up", or is made active for use when we need to do a build.
It would retain the previous git update (git pull origin master), and would retain all artifacts outside the git repo that we copy over. We also need shell access to the container (ssh, scp) etc.
A stateful 'Cloud Run' option would be ideal, but I don't know of such a thing (stateful containers with GCP that we can run and only be billed for runtime/compute time)
One solution is to use a VM for this. Add a startup script. In it
git pull origin master
Fire off the build as the next step in the script
Add this line which stop your compute
gcloud compute instances stop $(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/id) --zone=$(curl -H "Metadata-Flavor: Google" http://metadata.google.internal/compu
teMetadata/v1/instance/zone)
By the way, each time that you start your VM, it will apply the startup script and shutdown automatically. You keep your persistent disk, and thus your 1TB data, and you pay few because of automatic stop.
If you have to wait an external build. 2 solutions:
Either you set a sleep timer and shutdown after in the startup script
Or customize this tutorial -> At the end of your build, publish a message in PubSub, which trigger a function which will stop your instance.
EDITED to reply to comments
Here again, 2 solutions:
You can create a custom role with only the permission required. You can see all compute role here. If you provide an access to the console, I recommend list (to view the VMs), start and stop. Else, only start and/or stop if you write a script.
You can create a private function or Cloud Run. Assign a service account as identity to this, with enough role to start the VM (even if there is more permission as required -> it's not a good practice. Prefer the least privilege with custom role) and grant the role function.invoker or run.invoker to the user (depend if you use Function or Cloud Run) for allowing it to call this private endpoint and start the VM without right on the VM (only the right to perform an HTTP call).

How to run cronjob with computer shut off (EC2 instance)

I outlined my small project in a different post - to summarize it again quickly, I am trying to do the following:
Write an R script that pulls data from a website
Schedule the R script to automatically run daily at the same time
Write / append the R script's output to a database
I am familiar with R web-scraping packages (rvest, rselenium) for doing the first bullet. For the 2nd bullet, just today I learned how to create a crontab to run my script when I desire, however the crontab does not run the script when my computer is off, or so I've read.
How can I have it such that the crontab is run even with my computer off? I am somewhat (not really) familiar with EC2 instances, but if I have my R script in an EC2 instance, could I schedule a crontab for the script there and then it would run with my computer off?
Thanks in advance for help!
Since cron is a service that runs on the instance you can't have it start the EC2 instance for you - it's a catch-22.
You can treat EC2 instances as computers that run in someone else's cellar (most of the time at least). You wouldn't expect a computer to run code when it's not turned on and it's exactly the same for an EC2 instance.
I suggest you consider if this is really the setup you want, it sounds to me that you'd be better served using AWS Lambda combined with one of Amazon's hosted data stores (RDS, DynamoDB, SimpleDB, or even S3). The downside here is that you're limited to JavaScript, Python, and Java and as such can't use R (well, you can, but it's messy since you'll have to package everything you need in a JS/Python/Java app and start it from there).
If you really want to run your R script on the EC2 instance you can start the instance with a lambda and then shut it down from your script. Just make sure your instance isn't set to terminate on shutdown.
Regardless of what path you chose you will need to create a lambda and run it from a scheduled CloudWatch Event.
Then you just need to implement the lambda, either to run your script or to use the EC2 API to start the instance.
If you use the lambda to start the EC2 instance you should not use cron on the instance to run the script at a specific time, but run it on startup. Then you have your script shut down the instance when it's finished.
Here's an example Python script for starting an EC2 instance from a lambda to get your started:
import logging
import boto3
# Set up logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Set up a boto session to get credentials and region
session = boto3.session.Session()
# Set up EC2
ec2 = session.resource("ec2")
# The instance to start
instance_id = "i-1234567890abcd"
def lambda_handler(event, context):
logger.info('Start handling event.')
logger.info('Starting instance ' + instance_id)
instance = ec2.Instance(instance_id)
response = instance.start()
try:
current_state = response['StartingInstances'][0]['CurrentState']
except (KeyError, IndexError) as e:
logger.warn('Unexpected response when starting instance: {}'.format(response))
else:
if current_state not in ('pending', 'running'):
logger.warn('Instance {} is in unexpected state {} after starting'.format(id, current_state))
else:
logger.info('Started instance ' + instance_id)

have R halt the EC2 machine it's running on

I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.

Resources