RMarkdown Inline Code Format - r

I am reading ISL at the moment which is related to machine learning in R
I really like how the book is laid out specifically where the authors reference code inline or libraries for example library(MASS).
Does anyone know if the same effect can be achieved using R Markdown i.e. making the MASS keyword above brown when i reference it in a paper? I want to color code columns in data frames when i talk about them in the R Markdown document. When you knit it as a HTML document it provides pretty good formatting but when i Knit it to MS Word it seems to just change the font type
Thanks

I've come up with a solution that I think might address your issue. Essentially, because inline source code gets the same style label as code chunks, any change you make to SourceCode will be applied to both chunks, which I don't think is what you want. Instead, there needs to be a way to target just the inline code, which doesn't seem to be possible from within rmarkdown. Instead, what I've opted to do is take the .docx file that is produced, convert it to a .zip file, and then modify the .xml file inside that has all the data. It applies a new style to the inline source code text, which can then be modified in your MS Word template. Here is the code:
format_inline_code = function(fpath) {
if (!tools::file_ext(fpath) == "docx") stop("File must be a .docx file...")
cur_dir = getwd()
.dir = dirname(fpath)
setwd(.dir)
out = gsub("docx$", "zip", fpath)
# Convert to zip file
file.rename(fpath, out)
# Extract files
unzip(out, exdir=".")
# Read in document.xml
xml = readr::read_lines("word/document.xml")
# Replace styling
# VerbatimChar didn't appear to the style that was applied in Word, nor was
# it present to be styled. VerbatimStringTok was though.
xml = sapply(xml, function(line) gsub("VerbatimChar", "VerbatimStringTok", line))
# Save document.xml
readr::write_lines(xml, "word/document.xml")
# Zip files
.files = c("_rels", "docProps", "word", "[Content_Types].xml")
zip(zipfile=out, files=.files)
# Convert to docx
file.rename(out, fpath)
# Remove the folders extracted from zip
sapply(.files, unlink, recursive=TRUE)
setwd(cur_dir)
}
The style that you'll want to modify in you MS Word template is VerbatimStringTok. Hope that helps!

Related

How to extract images from word using media_extract in r?

I am working in rmarkdown to produce a report that extracts and displays images extracted from word.
To do this, I am using the officer package. It has a function called media_extract which can 'extract files from an rdocx or rpptx object'.
In word, I am struggling to locate the image without the media_path column.
The media_path is used as an argument in the media_extract function to locate the image. See example code from package documentation below:
example_pptx <- system.file(package = "officer",
"doc_examples/example.pptx")
doc <- read_pptx(example_pptx)
content <- pptx_summary(doc)
image_row <- content[content$content_type %in% "image", ]
media_file <- image_row$media_file
png_file <- tempfile(fileext = ".png")
media_extract(doc, path = media_file, target = png_file)
The file path is generated using either; docx_summary or pptx_summary, depending on the file type, which create a data frame summary of the files. The pptx_summary includes a column media_path, which displays a file path for the image. The docx_summary data frame doesn't include this column. Another stackoverflow post posed a solution for this using word/media/ subdir which seemed to work, however I'm not sure what this means or how to use it?
How do I extract an image from a word doc, using word/media/ subdir as the media path?
I have continued to research this and found an answer, so I thought I would share!
The difficultly I was having extracting images from docx was due to the absence of a media_file column in the summary data frame (produced using docx_summary), which is used to locate the desired image. This column is present in the data frame produced for pptx pptx_summary and is used in the example code from the package documentation.
In the absence of this column you instead need to locate the image using the document subdirectory (file path when the docx is in XML format), which looks like:
media_path <- "/word/media/image3.png"
If you want see what this structure looks like you can right click on your document >7-Zip>Extract files.. and a folder containing the document contents will be created, otherwise just change the image number to select the desired image.
Note: sometimes images have names that do not follow the image.png format so you may need to extract the files to find the name of the desired image.
Example using media_extract with docx.
#extracting image from word doc using officer package
report <- read_docx("/Users/user.name/Documents/mydoc.docx")
png_file <- tempfile(fileext = ".png")
media_file <- "/word/media/image3.png"
media_extract(report, path = media_file, target = png_file)
The output you are looking for is TRUE.
The image can then be included in a report using knitr (or another method).
include_graphics(png_file)

Bind or merge multiple powerpoints in r

I have been using officer package to create the respective PowerPoint decks, however at this moment, i would like to merge/ bind them all as one slide deck and was not able to figure out. Can someone guide me if there any package that helps to merge multiple PowerPoint decks into one.
I believe currently there are no functions or packages that do this in R, so I'll suggest you a few possible solutions that come to mind.
1: I believe you could use read_pptx() to read, say, a deck1 and a deck2 files. Then, loop through the slide indexes of deck2, and use those values to add_slide() into deck1. I think there's a function in officer called pptx_summary(), which converts a pptx R object into a tibble, but I'm not sure you could convert a tibble back to a pptx R object.
2: You could convert pptx files into pdf files, and use pdftools to join them.
When creating PowerPoint slides automatically via R (for example by using the PowerPoint export of R markdown), merging them with pre-manufactured fixed slides (for example explanations with elaborate visuals) may likely become necessary. As there seems not single-line solution so far, here's an incomplete answer to a 3-year-old question.
A look into the sources of OfficeR shows that the package works with a data structure and a temporary folder in the background that contains the XML files that are zipped in the XLSX file.
Copying slides, therefore, requires both: To update the structure, and to copy XML files and other ressources, eventually. Here is a very rough draft of how merging two PowerPoint files can work, based on the OfficeR classes.
merge_pptx = function(a, b, filename) {
# go through the slides of b
for (index in 1:length(source$slide$get_metadata())) {
# We need a new filename in the target's slide directory
new_slidename <- target$slide$get_new_slidename()
xml_file <- file.path(target$package_dir, "ppt/slides", new_slidename)
# Copy XML from source to new filename
orgFilename = source$slide$get_metadata()[index, "filename"]
newFilepath = paste(target$package_dir, newFilename, sep="/")
file.copy(orgFilename, xml_file)
# Not sure yet, what exactly this does
slide_info <- target$slideLayouts$get_metadata()[1,] # Use first best layout at the moment
layout_obj <- target$slideLayouts$collection_get(slide_info$filename)
layout_obj$write_template(xml_file)
# update presentation elements
target$presentation$add_slide(target = file.path("slides", new_slidename))
target$content_type$add_slide(partname = file.path("/ppt/slides", new_slidename))
# Add the slide to the collection
target$slide$add_slide(xml_file, target$slideLayouts$get_xfrm_data())
target$cursor <- target$slide$length()
}
print(target, target=filename)
}
source = read_pptx("One.pptx")
target = read_pptx("Two.pptx")
merge_pptx(source, target, "Combined.pptx")
Please note, that this is a draft only. It does not yet respect the different layouts for slides, not even speaking of different masters. Embedded files (images) are not yet copied.
The bulk of this function is inspired by the add_slide() function in the dir_slide class, see https://github.com/davidgohel/officer/blob/master/R/ppt_class_dir_collection.R

Save website content into txt files

I am trying to write R code where I input an URL and output (save on hard drive) a .txt file. I created a large list of url using the "edgarWebR" package. An example would be "https://www.sec.gov/Archives/edgar/data/1131013/000119312518074650/d442610dncsr.htm". Basically
open the link
Copy everything (CTRL+A, CTRL+C)
open empy text file and paste content (CTRL+V)
save .txt file under specified name
(in a looped fashion of course). I am inclined to "hard code it" (as in open website in browner using browseURL(...) and "send keys" commands). But I am afraid that it will not run very smoothly. However other commands (such as readLines()) seem to copy the HTML structure (therefore returning not only the text).
In the end I am interested in a short paragraph of each of those shareholder letters (containing only text; Therefore Tables/graphs are no concern in my particular setup.)
Anyone aware of an R function that would help`?
thanks in advance!
Let me know incase below code works for you. xpathSApply can be applied for different html components as well. Since in your case only paragraphs are required.
library(RCurl)
library(XML)
# Create character vector of urls
urls <- c("url1", "url2", "url3")
for ( url in urls) {
# download html
html <- getURL(url, followlocation = TRUE)
# parse html
doc = htmlParse(html, asText=TRUE)
plain.text <- xpathSApply(doc, "//p", xmlValue)
# writing lines to html
# depends whether you need separate files for each url or same
fileConn<-file(paste(url, "txt", sep="."))
writeLines(paste(plain.text, collapse = "\n"), fileConn)
close(fileConn)
}
Thanks everyone for your input. Turns out that any html conversion took too much time given the ammount of websites I need to parse. The (working) solution probably violates some best-practice guidelines, but it does do the job.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path=path + '/codes_ml/geckodriver/geckodriver.exe') # initialize driver
# it is fine to open the driver just once
# loop over urls will the text
driver.get(report_url)
element = driver.find_element_by_css_selector("body")
element.send_keys(Keys.CONTROL+'a')
element.send_keys(Keys.CONTROL+'c')
text = clipboard.paste()

rmarkdown shiny user input in chunk?

I have a shiny app that allows the user to download an HTML file (knitted from a .Rmd file) that includes the code used to run the analysis based on all the user inputs. I am trying to write the base .Rmd file that gets altered when user inputs vary. I am having trouble including user input variables (e.g. input$button1) into R code chunks. Say the user input for input$button1 = "text1".
```{r}
results <- someFun(input$button1)
```
And I'd like to have it knitted like this:
```{r}
results <- someFun('text1')
```
Every time I download the knitted HTML though, I get input$button1 getting written to file. I would also like to be able to produce an .Rmd file that is formatted with this substitution. It seems like knit_expand() might be the key, but I can't seem to relate available examples to my specific problem. Is the proper way to knit_expand() the whole .Rmd file and specify explicitly all the parameters you want subbed in, or is there a more elegant way within the .Rmd file itself? I would prefer a method similar to this, except that instead of using the asis engine, I could use the r one. Any help would be greatly appreciated. Thanks!
Got it. Solution below. Thanks to Yihui for the guidance. The trick was to knit_expand() the whole .Rmd file, then writeLines() to a new one, then render. With hindsight, the whole process makes sense. With hindsight.
For the example, p1 is a character param 'ice cream' and p2 is an integer param 10. There is a user-defined param in ui.R called input$mdType that is used to decide on the format provided for download.
Rmd file:
Some other text.
```{r}
results <- someFun("{{p1}}", {{p2}})
```
in the downloadHandler() within server.R:
content = function(file) {
src <- normalizePath('userReport.Rmd')
# temporarily switch to the temp dir, in case you do not have write
# permission to the current working directory
owd <- setwd(tempdir())
on.exit(setwd(owd))
file.copy(src, 'userReport.Rmd')
exp <- knit_expand('userReport.Rmd', p1=input$p1, p2=input$p2)
writeLines(exp, 'userReport2.Rmd')
out <- rmarkdown::render('userReport2.Rmd', switch(input$mdType,
PDF = pdf_document(), HTML = html_document(), Word = word_document()))
}
file.rename(out, file)
}
Resulting userReport2.Rmd before rendering:
```{r}
results <- someFun("ice cream", 10)
```

Save an object as a .R file within R, keeping formatting

I am writing an R script that reads in a template .R file, a list of dates, and creates a bunch of folders corresponding to the dates and containing copes of the .R wherein text substitution has been performed in R to customize each script for the given date.
I'm stuck on the part where I write out the .R file though, because the formatting and/or character representation keeps getting screwed up.
Here's a minimal, reproducible example:
RMapsDemo <- readLines("https://raw.githubusercontent.com/hack-r/RMapsDemo/master/RMapsDemo.R")
RMapsDemo <- gsub("## File: RMapsDemo.R", "## File: RMapsDemo.R ####", RMapsDemo)
save(RMapsDemo, file = "RMapsDemo.R") # Doesn't work right
save(RMapsDemo, file = "RMapsDemo.R", ascii = T) # Doesn't work right
dput(RMapsDemo, file = "RMapsDemo.R") # Close, but no cigar
dput(RMapsDemo, file = "RMapsDemo.R", control = c("keepNA", "keepInteger")) # Close, but no cigar
Ricardo Saporta pointed out the solution in the comments -- use writeLines.
I feel stupid for not thinking of this myself. It works beautifully.
writeLines(RMapsDemo, con = "RMapsDemo.R")

Resources