Saving Google Colab Notebook on Github - jupyter-notebook

Checkpoints in Google Colab
In one of the answers to the above question it is mentioned that to save checkpoints in Google Colab we should push the notebbok to GitHub. I was having a doubt that whether pushing to Github will save all the files that are created in the VM environment of Google Colab Notebook. If no, please suggest an alternate solution. Thanks in advance

Files in the VM environment will not be saved to Github. In order to save specific files, you'll need to write a script to save them either to your local machine via:
https://gist.github.com/korakot/e7f04fa7bd3a8a67b729da279ab1713a
Or you can save the files using the Colab Drive integrations:
https://datascience.stackexchange.com/questions/27964/how-to-download-dynamic-files-created-during-work-on-google-colab

Related

google colab does not import modules

I was trying to find a way to install modules permanently. I came to this post which teaches how to install packages on google drive, then mounting the drive and then using "sys.path.append" to tell the python where to look for the new package.
this method works as expected when a module directly is imported when you code in the notebook itself.
However, when I tried to run a project that I already had and wanted to run the .py file (by using"!python myCode.py"), the "sys" module can't append the path of the modules that have been installed in google drive.
in short, when you use the approach in the link above, you can only import packages when you directly code in the notebook itself. the approach did not work for me when I tried to use it on my .py files. i.e., when I used "!python myCode.py"
any suggestion on how to solve this problem? do you have the same problem as well?
thanks,

Saving R output in Google Colab

How do I save R output into a file in google colab? It could be saved on google drive or my local drive, either would work.
For example, if I wanted to save a list of R objects in a RDS file, normally I would have used something like this on RStudio.
saveRDS(list(a, b, c, d), file = "C:\\sim1.rds")
I am looking to do something similar on Google colab.
Recently I found the answer so I wanted to write it here in case it is useful for others.
To save an output on my google drive we need to mount it using the following.
from google.colab import drive
drive.mount('/content/drive')
Then we can navigate to MyDrive using the following.
cd /content/drive/MyDrive
Now that we are in MyDrive, we can run the code and save outputs on MyDrive. Then we can download it to our laptop.

how to load csv files in google colab for R?

How to load CSV files in google colab for R?
For python, there are many answers but can someone guide how file can be imported in R for google colab.
Assuming you mean "get a CSV file from my local system into the Colaboratory Environment" and not just importing it from inside the Colab file paths as per Korakot's suggestion, since your question wasn't very clear, I think you have two main options:
1. Upload a file directly through the shortcut in the side menu thingy.
Just click the icon there and upload your file to drive. Then, you can run normal r import functions by following the internal path like korakot put in this answer.
2. Connect your google drive
Assuming you're using a notebook like the one created by Thong Nguyen, you can use a python call to mount your own google drive, like this one:
cat(system('python3 -c "from google.colab import drive\ndrive.mount()"', intern=TRUE), sep='\n', wait=TRUE)
... which will initiate the login process to Google Drive and will allow you to access your files from google drive as if they were folders in colab. There's more info about this process here.
In case you use the Colab with R as runtime type (and Python code would not work therefore), you could also simply upload the file as MAIAkoVSky suggested in step 1 and then import it with
data <- read.csv('/content/your-file-name-here.csv')
The filepath can also be accessed by right clicking on the file in the interface.
Please be aware that the files will disappear once you disconnected from Colab. You would need to upload them again for the next session.
You can call the read.csv function like
data = read.csv('sample_data/mnist_test.csv')

Google Dataproc with Jupyter - Downloading files generated by notebook

We're using Google Cloud Dataproc for quick data analysis, and we use Jupyter notebooks a lot. A common case for us is to generate a report which we then want to download as a csv.
In a local Jupyter env this is possible using FileLink for example:
from IPython.display import FileLinks
df.to_csv(path)
FileLinks(path)
This doesn't work with Dataproc because the notebooks are kept on a Google Storage bucket and the links generated are relative to that prefix, for example http://my-cluster-m:8123/notebooks/my-notebooks-bucket/notebooks/my_csv.csv
Does anyone know how to overcome this? Of course we can scp the file from the machine but we're looking for something more convenient.
To share report you can save it to Google Cloud Storage (GCS) instead of local file.
To do so, you need to convert your Pandas DataFrame to Spark DataFrame and write it to GCS:
sparkDf = SQLContext(SparkContext.getOrCreate()).createDataFrame(df)
sparkDf.write.csv("gs://<BUCKET>/<path>")

How to run a downloaded Jupyter notebook on Google Colaboratory?

I am using Google Colaboratory & github.
I create a new Google Colab notebook, and I clone my github project into it using a simple !git clone <github_link> in the notebook.
Now, I have a Jupyter notebook in my github project that I need to run on Google Colab. How do I do that?
There is not a real need of downloading the notebook. If you already have your Notebook in a GitHub repo, the only thing you need to do is:
Open your Notebook file on GitHub in any browser (So the URL ends in .ipynb).
Change the URL from https://github/full_path_to_your_notebook to https://colab.research.google.com/github/full_path_to_your_notebook
And that should work.
You can upload the notebook to google drive first, then open it from there.
go to drive.google.com
go into directory “Colab Notebooks”
choose “New” > File upload
After uploading, click the new file
Chose “Open with Colaboratory” at the top
The two most practical ways are both through the Google Drive webinterface.
The first method is what #Korakot Choavavanich described.
The advantage of this method is that it provides a Search window to search for your file in your google drive storage.
The second method is even more convenient - and maybe more appropriate for your case:
In the Google Drive webinterface, you navigate to your folder where your file is located - in your case within the cloned github repository.
Then (see screenshot):
right-click on the file | Open with | Colaboratory
Your file is then converted into a colabo notebook automatically (it takes at least half a minute for that).
The advantage with this method is that you can create the colabo file directly in the folder.
My tip is to create a copy of the original jupyter file (I added "COLABO" in the file name) as you will have different code to sync your google drive and save files than in a local jupyter notebook.
One of the way could be that you can connect your google drive with the Colaboraty notebook using the following link:
Link to images within google drive from a colab notebook
Post which you can download your github repo in your google drive location. Then browse through your google drive and open the notebook using Colaboratory itself.
import sys, os
sys.path.append('models/research')
sys.path.append('models/research/object_detection')
It helped me. I was also looking for it, and found it in this COLAB work
https://colab.research.google.com/drive/1EQ3Lt_ez-oKTtVMebh6Tm3XSyPPOHAf3#scrollTo=oC-_mxCxCNP6
The better option I have found is copying the code from each cell and executing the code in colab, if you clone the Github and containing ipynb file in that. By doing this you won't face any difficulties.
Upload the .ipynb file directly in colab. Just go to colab, in the tabs above there should be upload. choose the file and upload there.
It may be a new feature not mentioned in other answers.
But right now Colab allows running jupyter notebooks directly from github, even from private repos.
Login to your google account
Access colab.research.google.com
Select the GitHub tab.
Choose include private repository if needed.
Go through the authentication process in the new opened window
Select from your repos and notebooks
And clone your repo from inside the opened notebook.

Resources