Loop in R markdown - r

I have an R markdown document like this:
The following graph shows a histogram of variable x:
```{r}
hist(x)
```
I want to introduce a loop, so I can do the same thing for multiple variables. Something hypothetically like this:
for i in length(somelist) {
output paste("The following graph shows a histogram of somelist[[" , i, "]]")
```{r}
hist(somelist[[i]])
```
Is that even possible?
PS: The greater plan is to create a program that would go over a data frame and automatically generates appropriate summaries for each column (e.g. histogram, tables, box plots, etc). The program then can be used to automatically generate a markdown document that contains the exploratory analysis you would do when seeing a data for the first data.

Could that be what you want?
---
title: "Untitled"
author: "Author"
output: html_document
---
```{r, results='asis'}
for (i in 1:2){
cat('\n')
cat("#This is a heading for ", i, "\n")
hist(cars[,i])
cat('\n')
}
```
This answer was more or less stolen from here.

As already mentioned, any loop needs to be in a code chunk. It might be easier to to give the histogram a title rather than add a line of text as a header for each one.
```{r}
for i in length(somelist) {
title <- paste("The following graph shows a histogram of", somelist[[ i ]])
hist(somelist[[i]], main=title)
}
```
However, if you would like to create multiple reports then check out this thread.
Which also has a link to this example.
It seems when the render call is made from within a script, the environmental variables can be passed to the Rmd file.
So an alternative might be to have your R script:
for i in length(somelist) {
rmarkdown::render('./hist_.Rmd', # file 2
output_file = paste("hist", i, ".html", sep=''),
output_dir = './outputs/')
}
And then your Rmd chunk would look like:
```{r}
hist(i)
```
Disclaimer: I haven't tested this.

Related

R Markdown pdf, page break in for loop without asis

I want to generate an rMarkdown pdf that for each of several variables has a summary table followed by a graph, and to put the table and graph for each variable on its own page.  If I use asis then the table formatting is lost.  If I omit asis, the way I insert the page break has no effect.  Since this is only for data exploration, I’m too lazy to do any post-processing ;-).  Is there a simple way to have my cake and eat it too?
---
output: pdf_document
---
some markdown text
```{r UglyTable, results = 'asis'}
for (n.var in 1:2) {
cat("\n\\pagebreak\n")
print(summary(iris[ , n.var]))
hist(iris[ , n.var])
}
```
more text
```{r NoPageBreaks}
for (n.var in 1:2) {
cat("\n\\pagebreak\n")
print(summary(iris[ , n.var]))
hist(iris[ , n.var])
}
```
 
These questions describe the use of pagebreaks with asis:  In Markdown PDF, how to add page break after each iteration of for loop?, rmarkdown: page break within a chunk?

Automating the generation of preformated text in Rmarkdown using R

I'm creating a document in which I repeat the same formatting multiple times. So I want to automate the process using the for loop in R. Here's a simple example.
Assume, I have an R code that computes some information for all cut values in ggplot2::diamonds dataset, which I want then to print in my document in five separate sections (one section per cut):
library(knitr); library(data.table)
dt <- data.table(ggplot2::diamonds)
for (cutX in unique(dt$cut)) {
dtCutX <- dt[cut==cutX, lapply(.SD,mean), .SDcols=5:7]
#### START of the Rmd part that needs to be printed
# Section: The Properties of Cut `cutX`
<!-- NB: This is the Section title in Rmd format, not the comment in R format ! -->
This Section describes the properties of cut `r cutX`. Table below shows its mean values:
`r knitr::kable(dtCutX)`
The largest carat value for cut `r cutX` is `r dt[cut=='Ideal', max(carat)]`
#### END of the Rmd part that needs to be printed
}
How do I do that?
I.e., How do I insert inside my main Rmd code an R code that tells it to insert other Rmd codes (in a for loop) - to produce automatically five Sections for five types of diamond cuts?
PS.
I found these related posts:
Reusing chunks in Knitr and
Use loop to generate section of text in rmarkdown
but was not able yet to recreate the solution for the above example.
For this kind of task, you can use glue package to evaluate R expressions inside character strings.
Here's an Rmd file that answer your question:
---
title: "Untitled"
output: html_document
---
```{r echo=FALSE, results='asis'}
library(data.table)
dt <- data.table(ggplot2::diamonds)
for (cutX in unique(dt$cut)) {
dtCutX <- dt[cut==cutX, lapply(.SD,mean), .SDcols=5:7]
cat("\n\n# Section: The Properties of Cut `cutX`\n")
cat(glue::glue("This Section describes the properties of cut {cutX}. Table below shows its mean values:\n"))
print(knitr::kable(dtCutX))
cat(glue::glue("\n\nThe largest carat value for cut {cutX} is {dt[cut=='Ideal', max(carat)]}\n"))
}
```

R Markdown makes custom plot disappear if I set echo=FALSE

I created a custom function which sets mfrow to nxn and creates n^2 scatter plots, with multiple data sets on each plot, based on an input list of data frames. The signature of my plotting function looks like this:
plot.return.list<-function(df.list,num.plot,title)
Where df.list is my list of data frames, num.plot is the total number of plots to generate (used to set mfrow) and title is the overall plot title (the function generates titles for each individual sub-graph).
This creats plots fine when I run the function from the console. However, I'm trying to get this figure into a markdown document using RStudio, like so:
```{r, fig.height=6,fig.width=6}
plot.return.list(f.1.list,4,bquote(atop("Numerical Approximations vs Exact Soltuions for "
,dot(x)==-1*x*(t))))
```
Since I haven't set the echo option in my {r} statement, this prints both the plotting code as well as the plot itself. However, if my first line instead reads:
{r, fig.height=6,fig.width=6,echo=FALSE}
Then both the code AND the plot disappear from the final document.
How do I make the plot appear WITHOUT the code? According to the example RStudio gives, setting echo=FALSE should make the plot appear without the code, but that isn't the behavior I'm observing.
EDIT: I seem to have tracked my problem down to kable. Whether or not I'm making a custom plot-helper function, any call to kable kills my plot. This can be reproduced in a markdown:
---
title: "repro"
author: "Frank Moore-Clingenpeel"
date: "October 9, 2016"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
options(default=TRUE)
repro.df<-data.frame((0.1*1:10)%*%t(1:10))
```
```{r, echo=FALSE}
kable(repro.df)
```
```{r, fig.height=6,fig.width=6,echo=FALSE}
plot(repro.df[,1],repro.df[,2])
```
In this code, the plot won't plot because I have echo set to false; removing the flag makes the plot visible
Also note that in my repro code, kable produces a table with a bunch of garbage in the last line--I don't know why, but this isn't true for my full original code, and I don't think it's related to my problem.
Thanks for the reproducible example. From this I can see that the problem is you don't have a newline between your table chunk and your plot chunk.
If you were to knit this and examine the MD file produced by knit (or set html_document as your output format and have keep_md: true to look at it), you would see that the table code and plot code are not separated by any newline. Pandoc needs this to delimit the end of the table. Without it, it thinks your ![](path/to/image.png) is part of the table and hence puts it as a "junk line" in the table rather than an image on its own.
Just add a newline between the two chunks and you will be fine. (Tables need to be surrounded with blank lines).
(I know you are compiling to LaTeX so it may confuse you why I am talking about markdown. In case it does, when you do Rmd -> PDF, Rmarkdown uses knit to go from RMD to MD, and then pandoc to go from MD to tex. This is why you still need to make sure your markdown looks OK).

Data and plots from different chunks

I am creating a Word report with R studio, markdown and knitr and I am having some troubles.
My r code includes several chunks, becauase between chunks, I want to include the text my report should include.
The problem I have is that: if use a single chunk, then the report is ok, but I can't include text/comments to be written in the report, unless I print also the code (right?). But if I use multiple chunks, then, when compiling, plots are not included in the report and warning messages appear:
pandoc.exe: Could not find image `Scriptv01_files/figure-docx/4.PLOTS-1.png', skipping...
It only works with HTML output: report includes all plots, but not with DOC nor PDF output.
I think the issue is that the data object is created in a different chunk, but I have tried 'cache' and 'autodep' options with no success.
How can this be done? What's the problem with the code?
Many thanks!
Here I provide a code example:
---
output: word_document
---
# PROJECT: IRIS STUDY
#### Statistical Analysis
```{r setup}
require(knitr)
opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE, comment='')
```
```{r read data}
dataset<-iris
```
### Data Descriptive by Iris Specie
```{r 4. ANALYSE DATA - DATA DESCRIPTION BY SPECIE}
require(ggplot2)
ggplot(dataset, aes(Species)) + geom_bar(aes(fill=Species))+
labs(x = "Species", y = "Number of Flowers")+ ggtitle("Fisher's Iris data set")
```
knitr uses the chunk name as part of the image file name. The chunk name 4. ANALYSE DATA - DATA DESCRIPTION BY SPECIE is invalid and is the reason why the plot is not being created. Replacing the name by a valid name solves the problem:
Avoid spaces and periods . in chunk labels and directory names [Source]

R knitr: Possible to programmatically modify chunk labels?

I'm trying to use knitr to generate a report that performs the same set of analyses on different subsets of a data set. The project contains two Rmd files: the first file is a master document that sets up the workspace and the document, the second file only contains chunks that perform the analyses and generates associated figures.
What I would like to do is knit the master file, which would then call the second file for each data subset and include the results in a single document. Below is a simple example.
Master document:
# My report
```{r}
library(iterators)
data(mtcars)
```
```{r create-iterator}
cyl.i <- iter(unique(mtcars$cyl))
```
## Generate report for each level of cylinder variable
```{r cyl4-report, child='analysis-template.Rmd'}
```
```{r cyl6-report, child='analysis-template.Rmd'}
```
```{r cyl8-report, child='analysis-template.Rmd'}
```
analysis-template.Rmd:
```{r, results='asis'}
cur.cyl <- nextElem(cyl.i)
cat("###", cur.cyl)
```
```{r mpg-histogram}
hist(mtcars$mpg[mtcars$cyl == cur.cyl], main = paste(cur.cyl, "cylinders"))
```
```{r weight-histogam}
hist(mtcars$wt[mtcars$cyl == cur.cyl], main = paste(cur.cyl, "cylinders"))
```
The problem is knitr does not allow for non-unique chunk labels, so knitting fails when analysis-template.Rmd is called the second time. This problem could be avoided by leaving the chunks unnamed since unique labels would then be automatically generated. This isn't ideal, however, because I'd like to use the chunk labels to create informative filenames for the exported plots.
A potential solution would be using a simple function that appends the current cylinder to the chunk label:
```r{paste('cur-label', cyl, sep = "-")}
```
But it doesn't appear that knitr will evaluate an expression in the chunk label position.
I also tried using a custom chunk hook that modified the current chunk's label:
knit_hooks$set(cyl.suffix = function(before, options, envir) {
if (before) options$label <- "new-label"
})
But changing the chunk label didn't affect the filenames for generated plots, so I didn't think knitr was utilizing the new label.
Any ideas on how to change chunk labels so the same child document can be called multiple times? Or perhaps an alternative strategy to accomplish this?
For anyone else who comes across this post, I wanted to point out that #Yihui has provided a formal solution to this question in knitr 1.0 with the introduction of the knit_expand() function. It works great and has really simplified my workflow.
For example, the following will process the template script below for every level of mtcars$cyl, each time replacing all instances of {{ncyl}} (in the template) with its current value:
# My report
```{r}
data(mtcars)
cyl.levels <- unique(mtcars$cyl)
```
## Generate report for each level of cylinder variable
```{r, include=FALSE}
src <- lapply(cyl.levels, function(ncyl) knit_expand(file = "template.Rmd"))
```
`r knit(text = unlist(src))`
Template:
```{r, results='asis'}
cat("### {{ncyl}} cylinders")
```
```{r mpg-histogram-{{ncyl}}cyl}
hist(mtcars$mpg[mtcars$cyl == {{ncyl}}],
main = paste({{ncyl}}, "cylinders"))
```
```{r weight-histogam-{{ncyl}}cyl}
hist(mtcars$wt[mtcars$cyl == {{ncyl}}],
main = paste({{ncyl}}, "cylinders"))
```
If you make all chunks in your ** nameless, i.e. ```{r} it works. This, of course, is not very elegant, but there are two issues preventing you from changing the label of the current chunk:
A file is parsed before the code blocks are executed. The parser already detects duplicate labels, before any code is executed or custom hooks are called.
The chunk options (inc. the label) are processed before the hook is called (logical: it's an option that triggers a hook), so the hook cannot change the label anymore.
The fact that unnamed blocks work is that internally they get the label unnamed-chunk-+chunk number.
Blocks cannot have duplicate names as internally knitr references them by label. A fix could be to make knitr add the chunk number to all chunks with duplicate names. Or to reference them by chunk number instead of label, but that seems to me a much bigger change.
There is a similar question posed here I was able to programmatically create r chunks and knit the outputs for use in a flexdashboard (quite useful) based on an arbitrary list of input plots using the knit_expand(text=) and r paste(knitr::knit(text = paste(out, collapse = '\n'))) methods.

Resources