knitr, pandoc: embeding SVG directly to HTML document

knitr, pandoc: embeding SVG directly to HTML document - r

I am using knitr to generate reports automatically to a mediawiki page. The report output is in HTML via pandoc. But I am having problems uploading the figures to the wiki site. So I figured that I would use the SVG device and include the code in the final document instead of relying on external documents. However I am having trouble doing that with either knitr or pandoc. Does anybody know about a pandoc or a knitr option that creates the SVG embedded instead of linking to the image? Or even a small shell script that replaces <img src="myFigure.svg"> with the contents of myFigure.svg.

I ended up using a simple python script for the job
from sys import argv
import re
import os
def svgreplace(match):
"replace match with the content of a filename match"
filename = match.group(1)
with open(filename) as f:
return f.read()
def svgfy(string):
img = re.compile(r'<img src="([^"]*\.svg)"[^>]*>')
return img.sub(svgreplace, string)
if __name__ == "__main__":
fname = argv[1]
with open(fname) as f:
html = f.read()
out_fname = fname + ".tmp"
out = open(out_fname, 'w')
out.write(svgfy(html))
out.close()
os.rename(out_fname, fname)

Related

Add reStructuredText support for Juptyer notebooks

I needed to view reStructuredText files in JupyterLab, and the best I found so far was #akaihola's answer to a related issue on github.
I added a work around that would allow rendering the file without viewing the source, provided below.

In case anyone else may need it, here's the solution I am working with for now:
import docutils.core
import docutils.writers.html5_polyglot
from IPython.core.magic import register_cell_magic, register_line_magic
from IPython.display import HTML
#register_cell_magic
def rst(line, cell):
"Render ReStructuredText"
writer = docutils.writers.html5_polyglot.Writer()
return HTML(docutils.core.publish_string(cell, writer=writer).decode('UTF-8'))
#register_line_magic
def rstfile(filename):
"Render ReStructuredText"
writer = docutils.writers.html5_polyglot.Writer()
with open(filename, 'r') as file:
cell = file.read()
return HTML(docutils.core.publish_string(cell, writer=writer).decode('UTF-8'))
To view the rst file, without the source:
%rstfile <your-rst-filename>
To use the original solution, as an rst cell, showing both the ReStructuredText source and the rendered output:
%%rst
============
Main title
============
Some **heavy** markup.

How to execute cell 1 from another notebook in current notebook

I have a bunch of notebooks in a directory dir1 and would like to write a master notebook that executes the first cell of each notebook in dir1. All notebooks in dir1 have markdown describing themselves in cell 1. So by executing the first cell of all of them, the master notebook will document all notebooks in dir1. This sounds easily doable but I don't have any idea how to proceed.
A more general question is, is there software that will extract the markdown in cell 1 of all notebooks in dir1 and create a nice master notebook from them? nbsphinx produces an html doc, but I would like something much more lightweight and quicker.

Here is the code that I used. I create a notebook called SUMMARY.ipynb inside dir1 and I put this code into a cell of SUMMARY.ipynb. Running this cell produces a nice summary of all the notebooks in dir1 with a link to them
import os
import json
from IPython.display import display, Markdown
# the name of this file
this_fname = 'SUMMARY.ipynb'
fname_to_md = {}
for fname in os.listdir('./'):
if fname[-6:] == '.ipynb' and fname != this_fname:
# print('------------', fname)
with open(fname, 'r', encoding="utf-8") as f:
fdata = json.load(f)
fname_to_md[fname] = ''.join(fdata['cells'][0]['source'])
# print(fname_to_md)
pre_sep = '\n\n<span style="color:red">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>\n\n'
full_md = ''
for fname, md in fname_to_md.items():
sep = pre_sep
sep += '[' + fname + ']\n\n'
full_md += sep + md
display(Markdown(full_md))

rendering equations as images and including them in Word/PowerPoint output documents with R Markdown

I'm gratuitously cross posting this from the R Studio community page as this is a bit esoteric.
Is there a way to ask Knitr to render my equations from R Markdown into images and then stick the resulting images into my final document? The use case I have in mind is overcoming some of the shortcomings of MSFT Equation editor when knitting to Word/PowerPoint. If the equation was simply an image, then I could have LaTeX quality equations in my MSFT docs, which would be fabulous!
The closest thing I have found is using latex2exp and putting in an R Code chunk that produces a figure which is actually a rendered LaTeX formula. I kinda like this sort of hack, but latex2exp has some limitations.

As mentioned in the comments, webtex is an easy solution. Pandoc's --webtex switch has no effect when targeting docx. However, a Lua filter can be used to the same effect.
local mediabag = require 'pandoc.mediabag'
local utils = require 'pandoc.utils'
local function url_encode(str)
local encode_char = function(c)
return ("%%%02X"):format(string.byte(c))
end
return str
:gsub("\n", "\r\n")
:gsub("([^%w%-%_%.%~])", encode_char)
end
local function webtex_url(formula)
return 'https://latex.codecogs.com/png.latex?' .. url_encode(formula)
end
function Math(el)
local filename = utils.sha1(el.text) .. '.png'
local mime, contents = mediabag.fetch(webtex_url(el.text), '.')
mediabag.insert(filename, mt, contents)
local img = pandoc.Image({}, filename)
return el.mathtype == 'DisplayMath'
and {pandoc.LineBreak(), img, pandoc.LineBreak()}
or img
end
Save this to a file and pass the file to pandoc via the --lua-filter option. It will convert all equations into png images via webtex.

Save website content into txt files

I am trying to write R code where I input an URL and output (save on hard drive) a .txt file. I created a large list of url using the "edgarWebR" package. An example would be "https://www.sec.gov/Archives/edgar/data/1131013/000119312518074650/d442610dncsr.htm". Basically
open the link
Copy everything (CTRL+A, CTRL+C)
open empy text file and paste content (CTRL+V)
save .txt file under specified name
(in a looped fashion of course). I am inclined to "hard code it" (as in open website in browner using browseURL(...) and "send keys" commands). But I am afraid that it will not run very smoothly. However other commands (such as readLines()) seem to copy the HTML structure (therefore returning not only the text).
In the end I am interested in a short paragraph of each of those shareholder letters (containing only text; Therefore Tables/graphs are no concern in my particular setup.)
Anyone aware of an R function that would help`?
thanks in advance!

Let me know incase below code works for you. xpathSApply can be applied for different html components as well. Since in your case only paragraphs are required.
library(RCurl)
library(XML)
# Create character vector of urls
urls <- c("url1", "url2", "url3")
for ( url in urls) {
# download html
html <- getURL(url, followlocation = TRUE)
# parse html
doc = htmlParse(html, asText=TRUE)
plain.text <- xpathSApply(doc, "//p", xmlValue)
# writing lines to html
# depends whether you need separate files for each url or same
fileConn<-file(paste(url, "txt", sep="."))
writeLines(paste(plain.text, collapse = "\n"), fileConn)
close(fileConn)
}

Thanks everyone for your input. Turns out that any html conversion took too much time given the ammount of websites I need to parse. The (working) solution probably violates some best-practice guidelines, but it does do the job.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path=path + '/codes_ml/geckodriver/geckodriver.exe') # initialize driver
# it is fine to open the driver just once
# loop over urls will the text
driver.get(report_url)
element = driver.find_element_by_css_selector("body")
element.send_keys(Keys.CONTROL+'a')
element.send_keys(Keys.CONTROL+'c')
text = clipboard.paste()

Download file smaller than it's real size

I'm trying to download all the comics(png) from xkcd.com and here's my code to do the download job:
imageFile=open(os.path.join('XKCD',os.path.basename(comicLink)),'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
And the downloadeded file is 6388Bytes and cannot be opened while the real file from link is 27.6Kb.
I've already tested my code line by line in shell, so I'm pretty sure I get the right link and right file.
I just don't understand why the png downloaded by my code is smaller.
Also I tried to search why this is happening but without helpful information.
Thanks.

Okay since you are using requests here is function that will let you download a file given a url
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return local_filename
Link to the documentation -> http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

knitr, pandoc: embeding SVG directly to HTML document - r

Related

Add reStructuredText support for Juptyer notebooks

How to execute cell 1 from another notebook in current notebook

rendering equations as images and including them in Word/PowerPoint output documents with R Markdown

Save website content into txt files

Download file smaller than it's real size

Categories

Resources