Data and plots from different chunks - r

I am creating a Word report with R studio, markdown and knitr and I am having some troubles.
My r code includes several chunks, becauase between chunks, I want to include the text my report should include.
The problem I have is that: if use a single chunk, then the report is ok, but I can't include text/comments to be written in the report, unless I print also the code (right?). But if I use multiple chunks, then, when compiling, plots are not included in the report and warning messages appear:
pandoc.exe: Could not find image `Scriptv01_files/figure-docx/4.PLOTS-1.png', skipping...
It only works with HTML output: report includes all plots, but not with DOC nor PDF output.
I think the issue is that the data object is created in a different chunk, but I have tried 'cache' and 'autodep' options with no success.
How can this be done? What's the problem with the code?
Many thanks!
Here I provide a code example:
---
output: word_document
---
# PROJECT: IRIS STUDY
#### Statistical Analysis
```{r setup}
require(knitr)
opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE, comment='')
```
```{r read data}
dataset<-iris
```
### Data Descriptive by Iris Specie
```{r 4. ANALYSE DATA - DATA DESCRIPTION BY SPECIE}
require(ggplot2)
ggplot(dataset, aes(Species)) + geom_bar(aes(fill=Species))+
labs(x = "Species", y = "Number of Flowers")+ ggtitle("Fisher's Iris data set")
```

knitr uses the chunk name as part of the image file name. The chunk name 4. ANALYSE DATA - DATA DESCRIPTION BY SPECIE is invalid and is the reason why the plot is not being created. Replacing the name by a valid name solves the problem:
Avoid spaces and periods . in chunk labels and directory names [Source]

Related

Selectively knitting R notebook

I would like to make long, well documented R-studio notebooks to completely document analyses, but then be able to knit either the full notebook or an abbreviated version for reports for different audiences. The long version for scientists and a short version as an executive summary.
I know about not running individual chunks, but is there a way to specify chunks together with sections of supporting text to be knit or not, and then somehow turn on or off the selective knitting?
This question shows how to use knit_exit() to exit knitting early, but I am looking for a way to go in and out multiple times through a document.
You can try the following code:
---
title: study sample
output: pdf_document
---
```{r beginning, echo = FALSE, include = FALSE}
echoaudience <- c("scientific", "executive")
```
```{r data1, echo = any(c("scientific","tested") %in% echoaudience)}
# a detailed report
```
```{r data2, echo = any(c("executive","brief") %in% echoaudience)}
# a brief summary
```
```{r data3, echo = any(c("myleftover","brief") %in% echoaudience)}
# some data only in my interest
```
It will return only the code you want to show, in this case data1 and data2. You can control the outputs.
```r
# a quite detailed info
```
```r
# a brief summary
```
Hope it helps.
Regards,
Alexis

Modularized R markdown structure

There are a few questions about this already, but they are either unclear or provide solutions that don't work, perhaps because they are outdated:
Proper R Markdown Code Organization
How to source R Markdown file like `source('myfile.r')`?
http://yihui.name/knitr/demo/externalization/
Modularized code structure for large projects
R Markdown/Notebook is nice, but the way it's presented, there is typically a single file that has all the text and all the code chunks. I often have projects where such a single file structure is not a good setup. Instead, I use a single .R master file that loads the other .R files in order. I'd like to replicate this structure using R Notebook i.e. such that I have a single .Rmd file that I call the code from multiple .R files from.
The nice thing about working with a project this way is that it allows for the nice normal workflow with RStudio using the .R files but also the neat output from R Notebook/Markdown without duplicating the code.
Minimal example
This is simplified to make the example as small as possible. Two .R files and one master .Rmd file.
start.R
# libs --------------------------------------------------------------------
library(pacman)
p_load(dplyr, ggplot2)
#normally load a lot of packages here
# data --------------------------------------------------------------------
d = iris
#use iris for example, but normally would load data from file
# data manipulation tasks -------------------------------------------------
#some code here to extract useful info from the data
setosa = dplyr::filter(d, Species == "setosa")
plot.R
#setosa only
ggplot(setosa, aes(Sepal.Length)) +
geom_density()
#all together
ggplot(d, aes(Sepal.Length, color = Species)) +
geom_density()
And then the notebook file:
notebook.Rmd:
---
title: "R Notebook"
output:
html_document: default
html_notebook: default
---
First we load some packages and data and do slight transformation:
```{r start}
#a command here to load the code from start.R and display it
```
```{r plot}
#a command here to load the code from plot.R and display it
```
Desired output
The desired output is that which one gets from manually copying over the code from start.R and plot.R into the code chunks in notebook.Rmd. This looks like this (some missing due to lack of screen space):
Things I've tried
source
This loads the code, but does not display it. It just displays the source command:
knitr::read_chunk
This command was mentioned here, but actually it does the same as source as far as I can tell: it loads the code but displays nothing.
How do I get the desired output?
The solution is to use knitr's chunk option code. According to knitr docs:
code: (NULL; character) if provided, it will override the code in the
current chunk; this allows us to programmatically insert code into the
current chunk; e.g. a chunk option code =
capture.output(dump('fivenum', '')) will use the source code of the
function fivenum to replace the current chunk
No example is provided, however. It sounds like one has to feed it a character vector, so let's try readLines:
```{r start, code=readLines("start.R")}
```
```{r plot, code=readLines("start.R")}
```
This produces the desired output and thus allows for a modularized project structure.
Feeding it a file directly does not work (i.e. code="start.R"), but would be a nice enhancement.
For interoperability with R Notebooks, you can use knitr's read_chunk method as described above. In a notebook, you must call read_chunk in the setup chunk; since you can run notebook chunks in any order, this ensures that the external code will always be available.
Here's a minimal example of using read_chunk to bring code from an external R script into a notebook:
example.Rmd
```{r setup}
knitr::read_chunk("example.R")
```
```{r chunk}
```
example.R
## ---- chunk
1 + 1
When you execute the empty chunk in the notebook, code from the external file will be inserted, and the results displayed inline, as though the chunk contained that code.
As per my comment above, I use the here library to work with projects in folders:
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE, results='asis'}
library(here)
insert <- function(filename){
readLines(here::here("massive_report_folder", filename))
}
```
and then each chunk looks like
```{ r setup, echo=FALSE, message=FALSE, warning=FALSE,
results='asis', code=insert("extra_file.R")}
```

R Markdown makes custom plot disappear if I set echo=FALSE

I created a custom function which sets mfrow to nxn and creates n^2 scatter plots, with multiple data sets on each plot, based on an input list of data frames. The signature of my plotting function looks like this:
plot.return.list<-function(df.list,num.plot,title)
Where df.list is my list of data frames, num.plot is the total number of plots to generate (used to set mfrow) and title is the overall plot title (the function generates titles for each individual sub-graph).
This creats plots fine when I run the function from the console. However, I'm trying to get this figure into a markdown document using RStudio, like so:
```{r, fig.height=6,fig.width=6}
plot.return.list(f.1.list,4,bquote(atop("Numerical Approximations vs Exact Soltuions for "
,dot(x)==-1*x*(t))))
```
Since I haven't set the echo option in my {r} statement, this prints both the plotting code as well as the plot itself. However, if my first line instead reads:
{r, fig.height=6,fig.width=6,echo=FALSE}
Then both the code AND the plot disappear from the final document.
How do I make the plot appear WITHOUT the code? According to the example RStudio gives, setting echo=FALSE should make the plot appear without the code, but that isn't the behavior I'm observing.
EDIT: I seem to have tracked my problem down to kable. Whether or not I'm making a custom plot-helper function, any call to kable kills my plot. This can be reproduced in a markdown:
---
title: "repro"
author: "Frank Moore-Clingenpeel"
date: "October 9, 2016"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
options(default=TRUE)
repro.df<-data.frame((0.1*1:10)%*%t(1:10))
```
```{r, echo=FALSE}
kable(repro.df)
```
```{r, fig.height=6,fig.width=6,echo=FALSE}
plot(repro.df[,1],repro.df[,2])
```
In this code, the plot won't plot because I have echo set to false; removing the flag makes the plot visible
Also note that in my repro code, kable produces a table with a bunch of garbage in the last line--I don't know why, but this isn't true for my full original code, and I don't think it's related to my problem.
Thanks for the reproducible example. From this I can see that the problem is you don't have a newline between your table chunk and your plot chunk.
If you were to knit this and examine the MD file produced by knit (or set html_document as your output format and have keep_md: true to look at it), you would see that the table code and plot code are not separated by any newline. Pandoc needs this to delimit the end of the table. Without it, it thinks your ![](path/to/image.png) is part of the table and hence puts it as a "junk line" in the table rather than an image on its own.
Just add a newline between the two chunks and you will be fine. (Tables need to be surrounded with blank lines).
(I know you are compiling to LaTeX so it may confuse you why I am talking about markdown. In case it does, when you do Rmd -> PDF, Rmarkdown uses knit to go from RMD to MD, and then pandoc to go from MD to tex. This is why you still need to make sure your markdown looks OK).

Proper R Markdown Code Organization

I have been reading about R Markdown (here, here, and here) and using it to create solid reports. I would like to try to use what little code I am running to do some ad hoc analyses and turn them into more scalable data reports.
My question is rather broad: Is there a proper way to organize your code around an R Markdown project? Say, have one script that generates all of the data structures?
For example: Let's say that I have the cars data set and I have brought in commercial data on the manufacturer. What if I wanted to attach the manufacturer to the current cars data set, and then produce a separate summary table for each company using a manipulated data set cars.by.name as well as plot a certain sample using cars.import?
EDIT: Right now I have two files open. One is an R Script file that has all of the data manipulation: subsetting and re-categorizing values. And the other is the R Markdown file where I am building out text to accompany the various tables and plots of interest. When I call an object from the R Script file--like:
```{r}
table(cars.by.name$make)
```
I get an error saying Error in summary(cars.by.name$make) : object 'cars.by.name' not found
EDIT 2: I found this older thread to be helpful. Link
---
title: "Untitled"
author: "Jeb"
date: "August 4, 2015"
output: html_document
---
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r}
table(cars.by.name$make)
```
```{r}
summary(cars)
summary(cars.by.name)
```
```{r}
table(cars.by.name)
```
You can also embed plots, for example:
```{r, echo=FALSE}
plot(cars)
plot(cars.import)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
There is a solution for this sort of problem, explained here.
Basically, if you have an .R file containing your code, there is no need to repeat the code in the .Rmd file, but you can include the code from .R file. For this to work, the chunks of code should be named in the .R file, and then can be included by name in the .Rmd file.
test.R:
## ---- chunk-1 ----
table(cars.by.name$make)
test.Rmd
Just once on top of the .Rmd file:
```{r echo=FALSE, cache= F}
knitr::read_chunk('test.R')
```
For every chunk you're including (replace chunk-1 with the label of that specific chunk in your .R file):
```{r chunk-1}
```
Note that it should be left empty (as is) and in run-time your code from .R will be brought over here and run.
Often times, I have many reports that need to run the same code with slightly different parameters. Calling all my "stats" functions separately, generating the results and then just referencing is what I typically do. The way to do this is as follows:
---
title: "Untitled"
author: "Author"
date: "August 4, 2015"
output: html_document
---
```{r, echo=FALSE, message=FALSE}
directoryPath <- "rawPath" ##Something like /Users/userid/RDataFile
fullPath <- file.path(directoryPath,"myROutputFile.RData")
load(fullPath)
```
Some Text, headers whatever
```{r}
summary(myStructure$value1) #Where myStructure was saved to the .RData file
```
You can save an RData file by using the save.image() command.
Hope that helps!

R knitr: Possible to programmatically modify chunk labels?

I'm trying to use knitr to generate a report that performs the same set of analyses on different subsets of a data set. The project contains two Rmd files: the first file is a master document that sets up the workspace and the document, the second file only contains chunks that perform the analyses and generates associated figures.
What I would like to do is knit the master file, which would then call the second file for each data subset and include the results in a single document. Below is a simple example.
Master document:
# My report
```{r}
library(iterators)
data(mtcars)
```
```{r create-iterator}
cyl.i <- iter(unique(mtcars$cyl))
```
## Generate report for each level of cylinder variable
```{r cyl4-report, child='analysis-template.Rmd'}
```
```{r cyl6-report, child='analysis-template.Rmd'}
```
```{r cyl8-report, child='analysis-template.Rmd'}
```
analysis-template.Rmd:
```{r, results='asis'}
cur.cyl <- nextElem(cyl.i)
cat("###", cur.cyl)
```
```{r mpg-histogram}
hist(mtcars$mpg[mtcars$cyl == cur.cyl], main = paste(cur.cyl, "cylinders"))
```
```{r weight-histogam}
hist(mtcars$wt[mtcars$cyl == cur.cyl], main = paste(cur.cyl, "cylinders"))
```
The problem is knitr does not allow for non-unique chunk labels, so knitting fails when analysis-template.Rmd is called the second time. This problem could be avoided by leaving the chunks unnamed since unique labels would then be automatically generated. This isn't ideal, however, because I'd like to use the chunk labels to create informative filenames for the exported plots.
A potential solution would be using a simple function that appends the current cylinder to the chunk label:
```r{paste('cur-label', cyl, sep = "-")}
```
But it doesn't appear that knitr will evaluate an expression in the chunk label position.
I also tried using a custom chunk hook that modified the current chunk's label:
knit_hooks$set(cyl.suffix = function(before, options, envir) {
if (before) options$label <- "new-label"
})
But changing the chunk label didn't affect the filenames for generated plots, so I didn't think knitr was utilizing the new label.
Any ideas on how to change chunk labels so the same child document can be called multiple times? Or perhaps an alternative strategy to accomplish this?
For anyone else who comes across this post, I wanted to point out that #Yihui has provided a formal solution to this question in knitr 1.0 with the introduction of the knit_expand() function. It works great and has really simplified my workflow.
For example, the following will process the template script below for every level of mtcars$cyl, each time replacing all instances of {{ncyl}} (in the template) with its current value:
# My report
```{r}
data(mtcars)
cyl.levels <- unique(mtcars$cyl)
```
## Generate report for each level of cylinder variable
```{r, include=FALSE}
src <- lapply(cyl.levels, function(ncyl) knit_expand(file = "template.Rmd"))
```
`r knit(text = unlist(src))`
Template:
```{r, results='asis'}
cat("### {{ncyl}} cylinders")
```
```{r mpg-histogram-{{ncyl}}cyl}
hist(mtcars$mpg[mtcars$cyl == {{ncyl}}],
main = paste({{ncyl}}, "cylinders"))
```
```{r weight-histogam-{{ncyl}}cyl}
hist(mtcars$wt[mtcars$cyl == {{ncyl}}],
main = paste({{ncyl}}, "cylinders"))
```
If you make all chunks in your ** nameless, i.e. ```{r} it works. This, of course, is not very elegant, but there are two issues preventing you from changing the label of the current chunk:
A file is parsed before the code blocks are executed. The parser already detects duplicate labels, before any code is executed or custom hooks are called.
The chunk options (inc. the label) are processed before the hook is called (logical: it's an option that triggers a hook), so the hook cannot change the label anymore.
The fact that unnamed blocks work is that internally they get the label unnamed-chunk-+chunk number.
Blocks cannot have duplicate names as internally knitr references them by label. A fix could be to make knitr add the chunk number to all chunks with duplicate names. Or to reference them by chunk number instead of label, but that seems to me a much bigger change.
There is a similar question posed here I was able to programmatically create r chunks and knit the outputs for use in a flexdashboard (quite useful) based on an arbitrary list of input plots using the knit_expand(text=) and r paste(knitr::knit(text = paste(out, collapse = '\n'))) methods.

Resources