Modifying jupyter notebook in init code

Modifying jupyter notebook in init code - jupyter-notebook

Is it possible to modify the contents of a notebook in the notebook startup code? I want to run some init code and add "header" cells to every notebook on a machine based on the code, for instance grab the hash of the current head from a local git repo, or pull a file from S3 to the local file system.
I can put a bunch of scripts, either .py or .ipy in the ~/.ipython/profile_default/startup/ directory and I'd like to modify the notebook that is currently being opened using those scripts (or some other scripts if that's possible).
According to the docs the shell has already been setup when those scripts run, so I'm thinking there should be some way of accessing, at a minimum, the local path of the notebook that was opened. I could then use nbformat (github) to modify the contents.
Alternatively I could use NotebookApp or ContentsManager to possibly modify the running notebook, but I'm not exactly sure how to do that and the notebook docs are pretty light on the actual API for those classes. This might not be possible as the init code is executed in the kernel, which does not know what the front end is, it could be the case that the kernel is connected to a console not to a notebook or to both a notebook and a console.
So
can I access the filename of the current notebook in a startup script?
should I rather be looking to modify the notebook cells through NotebookApp, FileContentsManager or some other internal class?
related
There is an open issue for template files https://github.com/jupyter/notebook/issues/332 -- this is not what I'm looking for, the template files are static, I need to modify the notebook based on the result of a computation

Related

Azure Databricks: How do we access R Scripts present on DBFS?

I'm new to DataBricks. I am trying to access a .R file that is present in the DBFS storage but I cannot figure out how to do so. Any help is really appreciated.
I can read data from the storage using the file path /dbfs and also source code from the script but I want to make edits to the script.

You need some editor to do that - for example, you can setup RStudio on your cluster and connect to it via RStudio UI - in this case you can edit R files directly on DBFS.
But really, the simplest for you would be to use Databricks CLI fs command to copy the file to your local machine, make changes in the editor of your choice, and upload file back.

How do I run SQL commands through Notepad++?

I am trying to start out using Notepad++ to run SQLite commands. I have tried following two brief YouTube tutorials to get me going. I can run the initial .bat file, but still cannot run the .sql file.
I have a Windows system environment Path variable set to the folder containing sqlite3.exe
"C:\Users\Adam\sqlite\"
I have saved the following file RunSQLite.bat in the folder containing sqlite3.exe
sqlite3.exe testDB.db
I have created a second file queries.sql
SELECT 34;
When I try to run queries.sql from Notepad++, using the RUN command:
C:\Users\Adam\sqlite\RunSQLite.bat "$(FULL_CURRENT_PATH)"
the only file that appears to run is RunSQLite.bat, giving the output:
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite>
Can anyone tell where I have gone wrong?
Thanks in advance.
aphopk

This C:\Users\Adam\sqlite\RunSQLite.bat "$(FULL_CURRENT_PATH)" will do exactly the same thing if run at the shell. RunSQLite.bat does not take any arguments so the Run command in npp is working as expected.
sqlite3 takes input from an external file with the .read command.
Path issues notwithstanding a bat file something like this should accomplish the task:
sqlite3.exe testDB.db ".read %1"

Notepad++ is a text editor, so you can now use it to edit your SQL file. After selecting the Language > SQL, Notepad++ will highlight SQL syntax as you type. Try typing some SQL, like
SELECT "Hi";
SELECT * FROM mydatabase WHERE id LIKE 'ID%';
You will see color, bold, and other possible formatting applied to the text you type. If you save the file as something.sql, and then load something.sql in your SQL client, the client will run the SQL commands from that file. If you have an existing somethingElse.sql file, you can open it in Notepad++, which will auto-recognize that it’s SQL and apply the syntax highlighting, allowing you to edit it and save it.
By using the Run > Run dialog, you can run an arbitrary command. For example, if your SQL client has a command-line mode accessed thru sqlclient.exe, you could type
c:\path\to\sqlclient.exe $(FILE_NAME)
If you just run it, that probably won’t show you any results… but if you ran
cmd /k c:\path\to\sqlclient.exe $(FILE_NAME)
It will open a new cmd.exe Windows command prompt, and show the output from that file.
If instead of running, you hit “SAVE”, you can give it a name (which will end up later in the Run menu), and/or a keyboard shortcut, so that you can easily re-use that many times.
If you want to do something more fancy, use the NppExec plugin, which includes a better batch/scripting language. Once again, you can save the NppExec script, and make it show up in the Macro menu.
If Python is a programming language you know or could learn (or if, like me, you know enough other programming languages that you can fake the Python), then the Python Script plugin will allow you to do even fancier stuff. (Python is a complete programming language, and has many libraries written, which could act as an interface between your SQL source file and your database engine; PythonScript has access to a full python2.7 interpreter. For example, you could write a script in Python which executes the commands from your SQL, grabs the results from your database engine, and displays them in Notepad++, either inline with your original SQL code, or in a new text document. You are really limited only by your imagination and knowledge of Python.)

In Jupyter notebook/lab, how can I automatically reset variables on exit?

I have a jupyter notebook containing sensitive data that I would like to not be cached inside the notebook. This would avoid jupyter's tendency to mix data and code.
In a notebook I can reset all variables using
%reset
Is there any way to run this automatically on exit, or on shutdown of the notebook or server?
Or is there a command-line script that could be run over a .ipynb, e.g. in a nightly cron job, to purge the file of stored variables (or - even better - only certain variables)?
Thanks!

nbclean allows some fairly complex customization on what gets cleaned and altered in the resulting notebook. You could do cron job with a script running that on your schedule. Or use Github actions to trigger upon actions such as push.

Unable to use correct file paths in R/RStudio

Disclaimer: I am very new here.
I am trying to learn R via RStudio through a tutorial and very early have encountered an extremely frustrating issue: when I am trying to use the read.table function, the program consistently reads my files (written as "~/Desktop/R/FILENAME") as going through the path "C:/Users/Chris/Documents/Desktop/R/FILENAME". Note that the program is considering my Desktop folder to be through my documents folder, which is preventing me from reading any files. I have already set and re-set my working directory multiple times and even re-downloaded R and RStudio and I still encounter this error.
When I enter the entire file path instead of using the "~" shortcut, the program is successfully able to access the files, but I don't want to have to type out the full file path every single time I need to access a file.
Does anyone know how to fix this issue? Is there any further internal issue with how my computer is viewing the desktop in relation to my other files?
I've attached a pic.
Best,
Chris L.

The ~ will tell R to look in your default directory, which in Windows is your Documents folder, this is why you are getting this error. You can change the default directory in the RStudio settings or your R profile. It just depends on how you want to set up your project. For example:
Put all the files in the working directory (getwd() will tell you the working directory for the project). Then you can just call the files with the filename, and you will get tab completion (awesome!). You can change the working directory with setwd(), but remember to use the full path not just ~/XX. This might be the easiest for you if you want to minimise typing.
If you use a lot of scripts, or work on multiple computers or cross-platform, the above solution isn't quite as good. In this situation, you can keep all your files in a base directory, and then in your script use the file.path function to construct the paths:
base_dir <- 'C:/Desktop/R/'
read.table(file.path(base_dir, "FILENAME"))
I actually keep the base_dir assignemnt as a code snippet in RStudio, so I can easily insert it into scripts and know explicitly what is going on, as opposed to configuring it in RStudio or R profile. There is a conditional in the code snippet which detects the platform and assigns the directory correctly.

When R reports "cannot open the connection" it means either of two things:
The file does not exist at that location - you can verify whether the file is there by pasting the full path echoed back in the error message into windows file manager. Sometimes the error is as simple as an extra subdirectory. (This seems to be the problem with your current code - Windows Desktop is never nested in Documents).
If the file exists at the location, then R does not have permission to access the folder. This requires changing Windows folder permissions to grant R read and write permission to the folder.
In windows, if you launch RStudio from the folder you consider the "project workspace home", then all path references can use the dot as "relative to workspace home", e.g. "./data/inputfile.csv"

automating R script using Mac's Automator and Calendar

I have been trying to run a script automatically using the steps that I found online.
I am trying to run the following R script called AUTO.R
Here is what the script contains:
library(quantmod)
obs <- last(Ad(getSymbols("SPY", auto.assign=FALSE)))
saveRDS(obs, "SAMPLE.rds")
When I build the application it prints Workflow completed
I believe all is well until the time comes to run the script. The alarm pop-up in my desktop is displayed from Calendar but nothing runs. After a few minutes the folder where the .rds file should be saved does not contain anything.

Two suggested changes:
Your Automator task should be more like just /usr/local/bin/Rscript --vanilla /Users/rimeallthetime/Desktop/AUTO.R
You should explicitly set the path in saveRDS; i.e. saveRDS(obs, "/Users/rimeallthetime/Desktop/SAMPLE.rds")
Honestly, though, you should at least make a ~/bin dir (i.e. a directory called bin under your home directory, so in your case /Users/rimeallthetime/bin and put both the workflow and R script in there, and I'd also suggest creating another directory for output files vs the desktop.
UPDATE
I just let the calendar event run and this is really a crude way to automate what you want to do. You'd be better off in the long run using launchd, that way it's fully automated and requires no human intervnention at all (but you may need to adjust your script to send you a notification or "append" to the rds file).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Modifying jupyter notebook in init code - jupyter-notebook

Related

Azure Databricks: How do we access R Scripts present on DBFS?

How do I run SQL commands through Notepad++?

In Jupyter notebook/lab, how can I automatically reset variables on exit?

Unable to use correct file paths in R/RStudio

automating R script using Mac's Automator and Calendar

Categories

Resources