Clone GCP Notebook in another Zone - jupyter-notebook

I would like to clone a GCP Vertex AI notebook in a different availability zone. The zone the notebook was created in has not been available for several days! I can't find a way to do this from the User Managed Notebook interface. So I created a clone of the notebook's VM in the Compute Engine interface, but then I don't have a url proxy to access Jupyter Lab.
Any guidance would be appreciated.
Thanks,
Jay

When you say "The zone the notebook was created has not been available for several days", sounds very alarming and probably inaccurate because that would mean a disk or compute level a GCP zone is down (Does not show any related alert https://status.cloud.google.com/index.html and I don't see this in our Monitoring systems)
Notebooks VM is composed of 2 disks, boot disk and data disk which is a mount to /home/jupyter folder. Disks can't be moved to a different zone.
I would suggest you to create a new Notebook and copy the files. Documentation here: https://cloud.google.com/vertex-ai/docs/workbench/user-managed/migrate

Related

sharing a jupyter notebook on a server

I made a jupyter notebook that can read hdf5 files and use some functions to analyze the data. I would like to put this jupyter notebook on a server containing different hdf5 files and make it available for people who are working in other places. An exemple of functions would be to see the expression of some genes in a sample. Those people could open this jupyter notebook and add a list of specific genes to look at.
I am looking at JupyterLab, but I can see that people can read and modify the notebook. I would like that even if they modify it, when they finish, the notebook is as it was before they opened it. Do you think it is possible to do that ? I think I could do it locally with "read-only" but I don't know how to do it on jupyterLab.
I am a newbie for things related to servers. I will really appreciate your help and suggestions.
Thanks a lot :)
Some suggestions:
You could supply the notebook via a code sharing resource like GitHub and let them clone/copy/download use it as they see fit via their own resources. This insures they aren't changing your source notebook. And you don't have to worry about computational resources because they can run it where they prefer such as on their own cluster or at a cloud center.
Typically, I'd suggest the MyBinder project at https://mybinder.org/ for what you describe as it lets you share temporary active sessions that launch on remote servers. You set up a Github repository with notebooks and the data or a way to retrieve the data built in the notebook and when the temporary session launches, users can work through your notebook. This way they can extend and modify and run their own data and not change your source notebook. However, the resources are limiting as it is free, see here.
Examples:
solve for sediment transport
Resolving the measurement uncertainty paradox in ecological management
A quick introduction to RNAseq
bendit-binder
blast-binder
I've seen others use Code Ocean, see an example here.
There are other national/government-funded centers that allow similar hosting of services and resources that can be shared with others. CyVerse is one that is now running in the United States and several place, such as CyVerseUK in association with the Earlham Institute, and elsewhere. They offer notebooks to be served via their VICE apps in their Discover environment. Their resource allows more computational power and storage than the free, public MyBinder service.
You can use Mercury framework. It is an open-source tool that converts Python notebooks into interactive documents, like web apps, reports, dashboards, slides.
It requires an YAML header in the first cell in the notebook. In the header you define the presence of the notebook in the Mercury (title, description, show-code). Additionally, you can add widgets to the notebook. Widgets are directly connected to variables in the notebook. The variables which values are controlled by widgets should be in the separate cell.
The example notebook
The example notebook with YAML header
The YAML header
There are two widgets defined in the below YAML header. The name and points widgtes have the same name as variables in the code. User can change the widgets values, click Run and the notebook will be executed with new values.
---
title: My notebook
description: Notebook with plot
show-code: False
params:
name:
input: text
label: What is your name?
value: Piotr
points:
input: slider
label: How many points?
value: 200
min: 10
max: 250
---
The Web Application from Notebook
The final notebook after execution can be downloaded as standalone HTML or PDF file.
Deployment
The Mercury framework is built on top of Django. It can be deployed on any machine with Python available. The deployment for Heroku is very simple, and can be done with two commands (you need to have Heroku CLI tool):
heroku create my-amazing-web-app-name-here
git push heroku main
Please make sure that there is Procfile created for Heroku deployment with:
web: mercury run 0.0.0.0:$PORT
You can also deploy Mercury with docker-compose. Please check the docs for more details. You can also check the article on how to share Jupyter Notebook with non-programmers with Mercury tool.
You can install a local Gitlab server instead of publishing on Github. Then publish your Jupyter notebooks in Gitlab. Share your server URL with people. Our company is running all internally.

Really no possible way to privately share libraries in Azure Notebooks?

I'm looking into which service to chose when it comes to Jupyter Notebooks hosted in the cloud.
Google Colaboratory (GCL) seems to be able to share privately (with people with Google Apps accounts) and real-time collaborate with them.
However, for me, Azure Notebooks (AZNB) has some benefits that I prefer over GCL – But I cannot NOT have the feature of sharing and collaborating. It seems like such obvious features, that I'm posing the question here – am I missing something right in front of me?
Note: For AZNB, I've found the "share" option, which forces the notebook to be public, and the "clone" option, which lets others work on a copy of my library. This is not what I am looking for.
For now (29 sept 2018) - you can't get official way.
You can track this discussion here
https://github.com/Microsoft/AzureNotebooks/issues/329
I can suggest some option. Azure notebook service allow to use the terminal console, aznbsetup.sh file and have the preinstalled git.
You can put your notebooks in some private git repo, send the link and credential to your collaborators and tell some manual instructions
Create empty library
Open the terminal
Run the cd ~/library and git clone your_private_repo
All changes could be operated with standart git operations pull, push and etc
For autoupdates you can suggest the aznbsetup.sh with
#!/bin/bash
cd ~/library
git pull
It's worked for me.

Live migration on Openstack

I'm working on a projet on OpenStack. I have installed OpenStack by creating two virtual machines, one for the controller node and the other for the compute node.
Actually, I want to test an example of live migration on openstack and I have found a video which describes the aproch. As the video shows, I need to have 2 compute nodes, and I want to know if I just need to create a second compute node or this second compute should be created at the phase of installation of openstack.
This is the link of the video that I have watched: https://www.youtube.com/watch?v=_4vJUYFGbEM
Thank you
It doesn't matter when you add the compute nodes (During the install or later on). Please also remember that the live-migration piggy backs on the hypervisor. So depending on hypervisor that one uses, this may or may not be possible.
Please look at this http://docs.openstack.org/admin-guide/compute-configuring-migrations.html#section-configuring-compute-migrations to ensure that the migration capability exists
It simply boils down to a few things
The storage is not moved in case of a live migration, so if you have a VM with instance storage, you will need to have a shared file system like NFS or something, If you have an instance backed by a cinder volume you will be able do the migration without the shared storage.
The Nova-Compute application needs to be installed on the destiantion
The hypervisor version should be the same.
I hope this clarifies.
Either works. OpenStack allows you dynamically add and remove computes nodes from a cloud environment.
Please refer to http://docs.openstack.org/admin-guide/compute-configuring-migrations.html for extra details.
Live migration for light instances can be done over network ,without shared storage, but for heavy instances ,shared storage or shared volume will be preferred. As you mentioned you have two compute nodes ,theirs nova storage should be shared storage.
Long answer short in my perspective,
You can add/remove compute node at any time from an OpenStack installation.
For adding compute, follow installation guide to add new compute node right from environment setup.
Also, dont forget to install networking part in your new Compute node.

How do I access data on my EBS Volume from R-Studio Server on Ubuntu EC2 Instance

Not a programming question but definitely related to big data analysis.
I have setup R-Studio Server on an Ubuntu EC2 instance for the first time and successfully started r-studio server in my browser. I also have putty ssh client.
I had a file in an s3 bucket. I passed this command to bring it from s3 to my ebs volume:
s3cmd get s3://data-analysis/input-data/filename.csv . I assume this command downloads the file from s3 into the ebs volume.
1) How do I set path in r-studio server to my mounted EBS volume
2) Why do I not see the contents of my EBS volume in the r-studio files area (bottom right side? ) .
I also tried to list the contents of my volume in the ssh using this:
$ ls /dev/xvdal
I have scoured the internet looking for help on this but not found the nuts and bolts detail for this problem anywhere. Please help!

use julia language without internet connection (mirror?)

Problem:
I would like to make julia available for our developers on our corporate network, which has no internet access at all (no proxy), due to sensitive data.
As far as I understand julia is designed to use github.
For instance julia> Pkg.init() tries to access:
git://github.com/JuliaLang/METADATA.jl
Example:
I solved this problem for R by creating a local CRAN repository (rsync) and setting up a local webserver.
I also solved this problem for python the same way by creating a local PyPi repository (bandersnatch) + webserver.
Question:
Is there a way to create a local repository for metadata and packages for julia?
Thank you in advance.
Roman
Yes, one of the benefits from using the Julia package manager is that you should be able to fork METADATA and host it anywhere you'd like (and keep a branch where you can actually check new packages before allowing your clients to update). You might be one of the first people to actually set up such a system, so expect that you will need to submit some issues (or better yet; pull requests) in order to get everything working smoothly.
See the extra arguments to Pkg.init() where you specify the METADATA repo URL.
If you want a simpler solution to manage I would also think about having a two tier setup where you install packages on one system (connected to the internet), and then copy the resulting ~/.julia directory to the restricted system. If the packages you use have binary dependencies, you might run into problems if you don't have similar systems on both sides, or if some of the dependencies is installed globally, but Pkg.build("Pkgname") might be helpful.
This is how I solved it (for now), using second suggestion by
ivarne.I use a two tier setup, two networks one connected to internet (office network), one air gapped network (development network).
System information: openSuSE-13.1 (both networks), julia-0.3.5 (both networks)
Tier one (office network)
installed julia on an NFS share, /sharename/local/julia.
soft linked /sharename/local/bin/julia to /sharename/local/julia/bin/julia
appended /sharename/local/bin/ to $PATH using a script in /etc/profile.d/scriptname.sh
created /etc/gitconfig on all office network machines: [url "https://"] insteadOf = git:// (to solve proxy server problems with github)
now every user on the office network can simply run # julia
Pkg.add("PackageName") is then used to install various packages.
The two networks are connected periodically (with certain security measures ssh, firewall, routing) for automated data exchange for a short period of time.
Tier two (development network)
installed julia on NFS share equal to tier one.
When the networks are connected I use a shell script with rsync -avz --delete to synchronize the .julia directory of tier one to tier two for every user.
Conclusion (so far):
It seems to work reasonably well.
As ivarne suggested there are problems if a package is installed AND something more than just file copying is done (compiled?) on tier one, the package wont run on tier two. But this can be resolved with Pkg.build("Pkgname").
PackageCompiler.jl seems like the best tool for using modern Julia (v1.8) on secure systems. The following approach requires a build server with the same architecture as the deployment server, something your institution probably already uses for developing containers, etc.
Build a sysimage with PackageCompiler's create_sysimage()
Upload the build (sysimage and depot) along with the Julia binaries to the secure system
Alias a script to julia, similar to the following example:
#!/bin/bash
set -Eeu -o pipefail
unset JULIA_LOAD_PATH
export JULIA_PROJECT=/Path/To/Project
export JULIA_DEPOT_PATH=/Path/To/Depot
export JULIA_PKG_OFFLINE=true
/Path/To/julia -J/Path/To/sysimage.so "$#"
I've been able to run a research pipeline on my institution's secure system, for which there is a public version of the approach.

Resources