I am trying to convert a markdown file to docx but the image and tables are not being transferred to the docx.
I am doing the following:
require(knitr)
require(markdown)
Knit
knit('test.rmd')
markdownToHTML('test.md', 'test.html', options=c("use_xhml"))
Convert to pdf (The pdf is nicely created)
system("pandoc -s test.html -o test.pdf")
Convert to Word Document (Converts to test.docx but no tables or graphics)
pandoc("test.md",format="docx")
All the tables and images are saved on the same directory inside the 'Figure' folder
Do I need to add another argument to pandoc to look for the Figure folder?
Related
I have several Rmarkdown (Rmd) files in the same directory, and want to render them into individual pdf files using bookdown::render_book. I normally use knitr::knit for that, but would like to take advantage of the cross-referencing in bookdown. But I would still like to have one Rmd file for each pdf file, and have them live independently in the same directory.
According to section 12.4 of the bookdown book I can use output: bookdown::pdf_document2 in the yaml header to generate a single pdf from an Rmd file. However, bookdown::render_book always combines all Rmd files that it finds in the directory, which is not what I want.
Is there an option in the yaml header or in the bookdown::render_book function where I can tell it to ignore all other Rmd files in the directory?
I know there is an option rmd_files I can specify in the file _bookdown.yaml but I don't know how this would work with multiple Rmd files, and I would also like to avoid having to maintain a separate yml file.
I'm downloading a Google Doc as .docx and then converting to markdown for manipulation and export to multiple formats.
Problem: When I convert using pandoc, it strips title (and subtitle) and does not add any YAML header information. I could add title manually in the header, but I need it to be scripted, so need to not lose the title (ideally) or extract title from docx and add to YAML header, which would then be concatenated to the converted markdown file.
Example Code, where title is lost on conversion from docx to markdown:
require(rmarkdown);require(devtools)
examplefile=paste0(tempdir(),"/example.docx")
download.file("https://file-examples.com/wp-content/uploads/2017/02/file-sample_100kB.docx",destfile=examplefile)
pandoc_convert(examplefile,to="markdown",output = "example.rmd", options=c("--extract-media=."))
render(paste0(tempdir(), "/example.rmd"),"html_document")
browseURL(paste0(tempdir(),"/example.html"))
When converting from docx to markdown (or another markup format like rst) you need to include the -s or --standalone option.
From the pandoc documentation:
-s, --standalone
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf, epub, epub3, fb2, docx, and odt output. For native output, this option causes metadata to be included; otherwise, metadata is suppressed.
Without the -s this data is suppressed.
I'm using knitr to convert .Rnw files to .pdf files. I can use Skim to jump from a position in the .pdf file to the .tex file produced by knitr.
How can I jump from a position in the .pdf file to the .Rnw?
Check out SyncTeX. For example, RStudio's PDF viewer provides a synchronization between the PDF and the underlying Rnw file, see here or here.
I'm using the R package knitr to generate a markdown file test.md. This file is then processed by pandoc to produce a variety of output formats, such as html and pdf. Because I want to use bibtex when generating the pdf through latex, I believe I have to tell pandoc to stop at the intermediate latex output, and then run bibtex and pdflatex myself (twice). Here's where I found a slight annoyance in my workflow: the only way I found for pandoc to keep the intermediate tex file, and not go all the way to the pdf, was to specify a hard-coded filename through the -o option with a .tex extension. This is problematic for me because I'm using a config file to run pandoc('test.md', "latex", "config.pandoc") via knitr with options, which I would like to keep generic without hard-coded output filename:
format: latex
o: test.tex
s:
S:
biblio: refs.bib
biblatex:
template: 'template.tex'
default-image-extension: pdf
which in turn becomes the following command for pandoc,
pandoc -s -S --biblio=refs.bib --default-image-extension=pdf --biblatex --template='template.tex' -f markdown -t latex -o test.tex 'test.md'
If I skip the o: test.tex option, pandoc produces a pdf and doesn't keep the intermediate latex file. How can I keep the tex file, without specifying this hard-coded filename?
To solve this problem on my side, I added a new argument ext to the pandoc() function. It is available on Github now (knitr development version 1.3.6). You can override the default file extension, e.g.
library(knitr)
pandoc(..., ext = 'tex')
I am using the markdown document created by R and I am trying to create the pdf file from the markdown using pandoc. Everything else works fine but I want the title of the document to appear and the numbering on the sections as in default Latex document. It seems the title defined on rmarkdown appears as a section title for the pdf.
I was able to create double spacing, enter line numbers etc by using a options.sty file. Below is the code that I used to create a pdf.
For options.sty I used:
\usepackage{setspace}
\doublespacing
\usepackage[vmargin=0.75in,hmargin=1in]{geometry}
\usepackage{lineno}
\usepackage{titlesec}
\titleformat{\section}
{\color{red}\normalfont\Large\bfseries}
{\color{red}\thesection}{1em}{}
\titleformat{\subsection}
{\color{blue}\normalfont\Large\bfseries}
{\color{blue}\thesection}{0.8em}{}
\title{Monitoring Stations}
\author{Jdbaba}
I used knitr to create the R markdown file. In the above options.sty file, it seems the program is not taking title and author part. It seems I am missing something.
The code I used to convert markdown to pdf is as follows:
pandoc -H options.sty mydata.md -o mydata.pdf
In latex document, the pdf would have the automatic numbering as well. But my pdf is missing that. Can anyone suggest how numbering can be enabled on the pdf document created using pandoc ?
Thanks.
Pandoc takes the title from a title block in the Markdown file. This is a Pandoc-specific extension to Markdown. The block should have the following format:
% title
% author(s) (separated by semicolons)
% date
So, in your case:
% Monitoring Stations
% Jdbaba
% March 6, 2013
To have the sections numbered, you'll need to run Pandoc with the --number-sections option.