I have a bunch of notebooks in a directory dir1 and would like to write a master notebook that executes the first cell of each notebook in dir1. All notebooks in dir1 have markdown describing themselves in cell 1. So by executing the first cell of all of them, the master notebook will document all notebooks in dir1. This sounds easily doable but I don't have any idea how to proceed.
A more general question is, is there software that will extract the markdown in cell 1 of all notebooks in dir1 and create a nice master notebook from them? nbsphinx produces an html doc, but I would like something much more lightweight and quicker.
Here is the code that I used. I create a notebook called SUMMARY.ipynb inside dir1 and I put this code into a cell of SUMMARY.ipynb. Running this cell produces a nice summary of all the notebooks in dir1 with a link to them
import os
import json
from IPython.display import display, Markdown
# the name of this file
this_fname = 'SUMMARY.ipynb'
fname_to_md = {}
for fname in os.listdir('./'):
if fname[-6:] == '.ipynb' and fname != this_fname:
# print('------------', fname)
with open(fname, 'r', encoding="utf-8") as f:
fdata = json.load(f)
fname_to_md[fname] = ''.join(fdata['cells'][0]['source'])
# print(fname_to_md)
pre_sep = '\n\n<span style="color:red">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>\n\n'
full_md = ''
for fname, md in fname_to_md.items():
sep = pre_sep
sep += '[' + fname + ']\n\n'
full_md += sep + md
display(Markdown(full_md))
Related
My goal is to create a presentation with Jupyter notebook without code input.
I have tried the following code
!jupyter nbconvert Explanatory_Analysis.ipynb --to slides --post serve --no-input --no-prompt
This code is prompting the NotImplementedError
Here's a somewhat hacky solution.
Paste the following code into a new code cell, then execute the cell.
Be sure to change the NOTEBOOK variable to the filename of the current notebook and SAVE the notebook BEFORE running.
The hackiest thing about it is that the code overwrites the current notebook, so you'll need to refresh the juptyer page in your browser after running the script.
import nbformat as nbf
import os
NOTEBOOK = "Explanatory_Analysis.ipynb"
PATH = f'{os.path.abspath("")}/{NOTEBOOK}'
ntbk = nbf.read(PATH, nbf.NO_CONVERT)
for i, cell in enumerate(ntbk.cells):
if cell.cell_type == "code":
metadata = cell["metadata"]
slideshow = metadata.get("slideshow", {})
print(f"[cell#index={i}] {cell.cell_type=}")
print(f"BEFORE {metadata=}, {slideshow=}")
slideshow["slide_type"] = "skip"
metadata["slideshow"] = slideshow
print(f"AFTER {metadata=}, {slideshow=}")
nbf.write(ntbk, PATH)
I have two files and am using R's diffobj to create an HTML difference file between them.
When I run the RScript in RStudio all is well and I get a diff HTML file like:
When I run the script from the command line, the HTML diff file looks like:
How do I run the R Script from the command line and get the nice HTML formatting?
R Script and Text Files
Original Text File - file_name_original
Hello there I am a file
I have three lines
And no fourth line
Changed Text File - file_name_changed
Hello there I am a file
I have three lines but I am a little longer than usual
And no fourth line
R Script
library("diffobj")
file_name_diff <- "diff.html"
file_name_original <- # Path to original file
file_name_changed <- # Path to changed file
# Compare files
diff_content <- diffFile(current = file_name_original,
target = file_name_changed,
mode = "sidebyside",
format = "html")
writeLines(as.character(diff_content), file_name_diff)
By default diffFile() behaves differently depending on if R is in interactive mode or not so you need to use the argument interactive = TRUE to get the same result as you would from the console.
Using the function example from the documentation:
library("diffobj")
file_name_diff <- "C:\\Path\\to\\file\\diff.html"
url.base <- "https://raw.githubusercontent.com/wch/r-source"
f1 <- file.path(url.base, "29f013d1570e1df5dc047fb7ee304ff57c99ea68/README")
f2 <- file.path(url.base, "daf0b5f6c728bd3dbcd0a3c976a7be9beee731d9/README")
res <- diffFile(f1,
f2,
mode = "sidebyside",
format = "html",
interactive = TRUE)
writeLines(as.character(res), file_name_diff)
I am new to using the Jupyter notebook with R kernel.
I have R code written in two files Settings.ipynb and Main_data.ipynb.
My Settings.ipynb file has a lot of details. I am showing sample details below
Schema = "dist"
resultsSchema = "results"
sourceName = "hos"
dbms = "postgresql" #Should be "sql server", "oracle", "postgresql" or "redshift"
user <- "hos"
pw <- "hos"
server <- "localhost/hos"
port <- "9763"
I would like to source Settings file in Main_data code file.
When I was using R studio, it was easy as I just use the below
source('Settings.R')
But now in Main_data Jupyter Notebook with R kernel, when I write the below piece of code
source('Settings.R') # settings file is in same directory as main_data file
I get the below error
Error in source("Settings.R"): Settings.R:2:11: unexpected '['
1: {
2: "cells": [
^
Traceback:
1. source("Settings.R")
When I try the below, I get another error as shown below
source('Settings.ipynb')
Error in source("Settings.ipynb"): Settings.ipynb:2:11: unexpected '['
1: {
2: "cells": [
^
Traceback:
1. source("Settings.ipynb")
How can I source an R code and what is the right way to save it (.ipynb or .R format in a jupyter notebook (which uses R kernel)). Can you help me with this please?
updated screenshot
We could create a .INI file in the same working directory (or different) and use ConfigParser to parse all the elements. The .INI file would be
Settings.INI
[settings-info]
schema = dist
resultsSchema = results
sourceName = hos
dbms = postgresql
user = hos
pw = hos
server = localhost/hos
Then, we initialize a parser object, read the contents from the file. We could have multiple subheadings (here it is only 'settings-info') and extract the components using either [[ or $
library(ConfigParser)
props <- ConfigParser$new()
props <- props$read("Settings.INI")$data
props[["settings-info"]]$schema
From the Jupyter notebook
the 'Settings.INI' file
Trying to save a Jupyter notebook file in .R format will not work as the format is a bit messed up (due to the presence of things like { "cells" : [....". You can verify this by opening your .R file in Jupyter Notebook.
However, you can use a vim editor/R studio to create a .R file. This will allow you to have the contents as is without any format issues such as { "cells" : [....".
Later from another jupyter notebook, you can import/source the .R file created using vim editor/R studio. This resolved the issue for me.
In summary, don't use jupyter notebook to create .R file and source them using another jupyter notebook file.
I'm using Jupyter/ ipython to attempt to load a .csv file with the open() function on Windows.
First, I type the command 'pwd' to display the current working directory, and the following shows up:
'd:\\my data\\documents\\notebooks'
I try using the following code to try to load the file, which does not work:
data_file = open("D:\\my data\\documents\\notebooks\\MNIST\\mnist_train_10.csv", 'r')
data_list = data_file.readlines()
data_file.close()
I have also tried the following variations of this, removing the entire filepath and only having the local folder path within the current directory:
data_file = open("\\MNIST\\mnist_train_10.csv", 'r')
data_list = data_file.readlines()
data_file.close()
and also, I've experimented with removing the double backslashes and also tried with forward slashes, with no success. I read online that Windows can be funny about forward versus backslashes in python.
data_file = open("/MNIST/mnist_train_10.csv", 'r')
data_list = data_file.readlines()
data_file.close()
this is the error that I get:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-25-65ea2d6f0c09> in <module>()
----> 1 data_file = open("D:\\my data\\documents\\notebooks\\MNIST\\mnist_train_10.csv", 'r')
2 data_list = data_file.readlines()
3 data_file.close()
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\my data\\documents\\notebooks\\MNIST\\mnist_train_10.csv'
Does it make a difference if the file is in the d: drive rather than the c: drive?
What am I doing wrong here?
fyi this project is part of the "Make your own Neural network" book by Tariq Rashid, but the book doesn't get into the specifics of this.
Can anyone point me to a resource where I can learn more?
Thank you for the help - I am just starting to learn, and do not have much experience working with real files and directories.
The best way to define a path is to use os.path.join, which removes the OS dependency of your code. For your case, below should work:
import os
path = os.path.join("D:", "my data", "documents", \
"notebooks", "MNIST", "mnist_train_10.csv")
if os.path.exists(path):
data_file = open(path, 'r')
else:
print('{} does not exist'.format(path))
I am using knitr to generate reports automatically to a mediawiki page. The report output is in HTML via pandoc. But I am having problems uploading the figures to the wiki site. So I figured that I would use the SVG device and include the code in the final document instead of relying on external documents. However I am having trouble doing that with either knitr or pandoc. Does anybody know about a pandoc or a knitr option that creates the SVG embedded instead of linking to the image? Or even a small shell script that replaces <img src="myFigure.svg"> with the contents of myFigure.svg.
I ended up using a simple python script for the job
from sys import argv
import re
import os
def svgreplace(match):
"replace match with the content of a filename match"
filename = match.group(1)
with open(filename) as f:
return f.read()
def svgfy(string):
img = re.compile(r'<img src="([^"]*\.svg)"[^>]*>')
return img.sub(svgreplace, string)
if __name__ == "__main__":
fname = argv[1]
with open(fname) as f:
html = f.read()
out_fname = fname + ".tmp"
out = open(out_fname, 'w')
out.write(svgfy(html))
out.close()
os.rename(out_fname, fname)