Parse YAML front matter with R - r

I want to use YAML for SAS program documentation in a manner similar to how R studio uses YAML. Can I put YAML at the top of my program, then read it into R as some object? The program file would just be a text file called program_1.sas. There would be a bunch of these in a directory or two.
I can then use the object in an R notebook to produce a readable document describing a set of SAS programs. I would have to find a function to display the object in a readable format.
I noticed that this issue is addressed for languages like python and JAVA, but I am hoping to do it in R. Obviously this code has already been written. For example the function yaml_front_matter must do this, but it does not seem to be documented.
If I could also write/change the YAML header that would be a big plus.
Here is an example:
Project: Disease Burden
Directory: /net1/program/
Purpose: Extract Data
The only extra piece is that I would have to comment out this header and also identify it
/*
---
Project: Disease Burden
Directory: /net1/program/
Purpose: Extract Data
---
*/
I could then create markdown like this
# Project: Disease Burden
Directory
: /net1/program/
## Purpose
Extract Data

Related

Is there a way to convert an .Rmd file exported as a basic R Script using purl back into an .Rmd file?

I have a dataset that I am trying to get published as part of the supplementary information of a study that is in .Rmd format. The .Rmd file is set up to not only provide an easily readable printout of the statistical analyses performed in the study, but it also set up so to be a tool for other researchers working in the same area to use on their data. The intent being all they would have to do is insert their data and re-knit the file and their results would be printed out without having to rework the R code from scratch.
However, the journal will not accept .Rmd scripts as .Rmd files and possibly will not accept knitted .html printouts of an .Rmd file. The journal suggested to me that I save the .Rmd file as a plain R script using purl() and submit that instead. However, this creates a problem in that the script no longer generates an easily navigable printout (i.e., issues with headers and document text) and is more difficult to use. At the same time, I noticed that purl() seems to produce an R script that contains most of the information of the .Rmd file, particularly if one uses the options documentation=1 or documentation=2.
I am trying to figure out if there is any way to convert an .Rmd file back into an .Rmd file after it has been exported as a basic R script using purl? This way I could potentially submit the analysis as a basic R script as per the journal's requirements, but the user could convert it back into the script that produces a knitted html report if that is what they desire.

How to include an external file in a Moodle question with R/exams?

In order to include statistical tables when using R-exams, I know that one can just use the option pages inside the function exams2nops(). But when using exams2moodle() how should one proceed?
In Moodle one can upload a file within a question and add a link to the embedded file. Is it possible to do it through R exams?
You can easily include various kinds of supplementary files in R/exams and then export them to Moodle or other learning management systems. Two steps need to be taken:
While processing the .Rmd or .Rnw exercise the supplementary file(s) need to be created in the current working directory (which is a temporary directory by default). This can either be done be creating the file via some R code or by copying an existing file. The convenience function include_supplement() simplifies this.
The supplementary file then needs to be included as a link in the question text. In .Rmd this would be something like [myfile.pdf](myfile.pdf) and in .Rnw exercises \url{myfile.pdf}.
An example for such an inclusion is the lm exercise template shipped along with the package, see: http://www.R-exams.org/templates/lm/. This exercise creates a .csv file on the fly within R and then includes it.
An example for the include_supplement() function is available in the Rlogo exercise template that copies the R logo image from the R system and then includes it in the exercise. See: http://www.R-exams.org/templates/Rlogo/.
Final comment: For a distribution table it would also be possible to include this directly as an HTML table in Moodle. For example, you could generate suitable Markdown or LaTeX code within the R code of the exercise.

Creating an editable slideshow (ideally a powerpoint) from R

As part of a contract, the team I work in has to produce a monthly powerpoint filled with KPI's and other requested values, which is then passed on to another team who write a commentary on last months performance. At the moment the values are created (mostly in SAS) exported to an excel file and then copy and pasted into a powerpoint. This is an old approach which clearly needs updating.
What I would ideally like to do is to automate the presentation using RMarkdown and save myself the hassle of copy and pasting values. The issue is that RMarkdown from what I can see can't produce a .ppt file, or another editable format that the commentary team could add to without having to use R.
From googling around the topic I found packages such as rcom, RDCOMclient, and R2PPT but they don't appear to have been recently updated or maintained.
TLDR; Need a way of making a powerpoint/slideshow in R where the text can be edited afterwards outside of R.
This can be easily done with RStudio and pandoc 2.1:
1) install Pandoc 2 from pandoc.org (this is higher version than the one which currently comes with rstudio )
2) create your RMarkdown file in RStudio,
---
title: 'Some title'
author: 'author'
output:
md_document: default
---
3) knit to md
4) call pandoc to convert to pptx
system("cmd", input = "C:\\Users\\janvy\\AppData\\Local\\Pandoc\\pandoc -f markdown -t pptx -o myfile.pptx myfile.md" )
I used to work for a company that had all their presentations linked to excel sheets and it worked fine for some broad definition of fine.
If you have to keep Powerpoint as a presentation format, I'd advise to not use R for creating it. There are some packages that create great connections to Office products, but in my experience they break easily with development of packages and R versions. (one of the ones I played in the past, would recreate ggplot2 plots with actual shapes and lines on Powerpoint, resulting in a huge file)
With that in mind, I'd advise you to create the results programatically and dump it in an excel spread sheet and build the presentation linked to that spread sheet. To keep you sane, I'd do one file per month (if that is the kpi's periodicity). There are many nice packages that create excel spreadsheets, but I'd stick to csv files for their simplicity.
I recommend that you take a look at SlideMight, a utility for merging data with PowerPoint templates; both text and images, in slides and tables. The usage is in principle similar to mail merge, with some more advanced stuff.
Possibly a solution for you would be to have the R program first write your data in a YAML, JSON or XML file; then it invokes the command-line version of SlideMight.
See www.slidemight.com.
Disclaimer: I am the developer and seller of SlideMight.

Example input data with example output using relative pathway in vignette of R package?

I'm putting together an R package. I would like to show example code in the vignette, where example data files (included in the package) are used to generate an (example) output file.
I read about using example data in Hadley Wickham's post (http://r-pkgs.had.co.nz/data.html), and believe I should keep my example data as raw data, as it must be parsed to generate the output.
So, I created a directory in my package structure
/Users/userName/myPackage/inst/extdata/
with subdirectories InputFiles and OutputFiles.
And I put the example file (exampleData.csv) inside of the InputFiles subdirectory (/Users/userName/myPackage/inst/extdata/InputFiles).
My vignette is located in:
/Users/userName/myPackage/vignettes/myPackage.Rnw
It contains the following syntax:
<<eval=FALSE>>=
fileString = "/Users/userName/myPackage/inst/extdata/InputFiles/exampleData.csv"
doFunction1(fileString)
doFunction2(fileString)
doFunction3(fileString, output ="Users/userName/myPackage/inst/extdata/OutputFiles")
#
I am having two problems with developing this vignette and its example datasets:
1) I am unsure if my use of the extdata file is appropriate. This seemed to be the best directory name and location to place my example files, according to the aforementioned Hadley Wickham reference.
2) I am unsure how to make the pathways relative, instead of absolute, as I have them currently. This example code does not run automatically, as you can see. Instead, I have it under an R chunk of eval=FALSE so that it is simply listed there for the users to test themselves. After running the example code, the users can also check that the output file was indeed created in (/Users/userName/myPackage/inst/extdata/OutputFiles). What is the best way for me to allow the user to not have to use an absolute path when following the example? Is it possible to just follow a relative path from within the package directory myPackage?
My data files consist of .csv, .htm, and .text files. In the past, when constructing a package, I have saved a data frame as .rda file, and then the user could simply use:
data(example.rda)
to read that file. They would not have to write the entire pathway. Is there a similar function that can be used to read .csv, .html, and .text files, and then output them to an example output location - without having to use the full pathway? Would it be possible to have help functions that also read in the input files and write to the output files? Would this cause a conflict in CRAN if various example help functions in the /man folder physically save the example output file to the example output folder?
The standard way to refer to a file in a package is:
# gives root package directory
system.file(package="myPackage")
# specific file
system.file("extdata/InputFiles/exampleData.csv", package="myPackage")
# best is to use cross-platform way to write a file path:
system.file("extdata", "InputFiles", "exampleData.csv", package="myPackage")
When developing with devtools, the inst subdirectory is ignored, so you never need to worry about absolute paths. This should work in a vignette. Note that a vignette, I think, only ever uses the installed version of a package, not the one you may have loaded in your development environment (specifically, devtools::load_all() does not change the code which is used to build the vignette, you must install() it first).
Finally, using data() is a bit old fashioned. Hadley and others recommend using lazy data, so the data appears in the namespace automatically. Try the following in your DESCRIPTION.
LazyData: true
LazyDataCompression: xz

Create and save R's default codebooks as a pdf

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.
I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.
Is it possible to save the 'content' of ?mtcars and how is it created?
Thanks, Eric
P.S. I did read this thread.
update 2012-05-14 00:39:59 PDT
I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)
update 2012-05-14 09:49:05 PDT
Thank you very much everyone for the many answers.
Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.
R, I am looking for a solution that is based exclusively on R.
Reproducibility, that the codebook can be part of a automated script.
Readability, the text should be easy to read.
Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).
I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.
(I'm not completely sure what you're after, but):
Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.
However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.
For user-data, usually Noweb is much more convenient.
.Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files.
However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.
Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)
If html is an option, both Sweave and knitr can work with that.
I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:
path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD",
"Rd2pdf",shQuote(path)))
I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.
PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.
The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.
.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.
For more information on creating your own packages, writing .Rd files, and building packages:
http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".
Edit
And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .
You can't create a PDF with just R; you need to use other software that creates PDFs.
You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.
It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.
It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.
Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.
In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:
base_text = "
First header
============
This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)
Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Resources