I am working on a scientific paper in R Markdown and it needs an abstract (with embedded results) at the start of the paper. Currently the paper has analyses interspersed before the paragraphs and/or tables and figures where I report the results. I can render the abstract at the end because all the parts have been calculated but I cant submit the paper like this. I hate to front load all the analysis code (out of context) so I have the numbers ready for the abstract.
Does anybody have a strategy for this?
I know I can manually cut and paste the abstract to the top after rendering the document but that wrecks an otherwise fully reproducible workflow.
The best idea I have come up with (but I have not tried yet) is to save the .md file that is generated as the Rmd is processed and then programmatically parse the file and feed the abstract then the body of the paper directly to pandoc.
Is there a good solution to this using Quarto?
Both rmarkdown (through {bookdown}) and {quarto} will allow you to assemble a series of rmd/qmd files into a single document and use a YAML file to specify the order in which they should be assembled.
I'm currently using something like this for developing a standard reporting format for some research on human/machine interfaces. Intended for an applied scientist/engineer audience, which is not so different from your academic journal application.
Related
To preface this, I think my question is related to this, but it's not exactly the same: How to source R Markdown file like `source('myfile.r')`?
Basically, I perform most of my data cleaning and analysis in Rmarkdown files because the visual separation between chunks of code and my own comments on what should be done for the analysis/cleaning is very helpful to me. It also helps that within Rstudio if you run a table, df, then it'll display an interactive snippet of it in the document. This is all very helpful in complicated cleaning/analyses. So in other words, I'd like to develop in one R-Markdown file and write in another R-Markdown file. Splitting/writing the code into a source.R file is not ideal, unless there was a very automated and reproducible way to do it.
The issue is that for reports, I'd sometimes like to take specific objects that were generated from these lengthy data-cleaning and analyses files in Rmarkdown. For example, let's say that during my data-cleaning in Rmarkdown-file-1, there was a particular table that was giving me trouble problematic.df and that I'd like to call in my report or possibly perform further manipulations in my report (Rmarkdown-file-2).
So ultimately I think this is the question:
How can I call any arbitrary object generated at any arbitrary point of one Rmarkdown file in another Rmarkdown file?
Obviously, the above would be the ideal, but it sounds unreasonable, so perhaps this is a better question/request:
How can I call any arbitrary objects generated by the end of one Rmarkdown file in another Rmarkdown file?
Upon further reflection, my question might already be answered in the post I linked, but it's been a while since that question was posted and perhaps there are new solutions or perspectives on this issue.
I have a spreadsheet of exam questions that I want to use to generate quizzes and exams using R exams, and I'd like to include graphics in some of the questions.
The template here (http://www.r-exams.org/templates/fruit/) begins by defining the images as long base 64 encoded strings as generated by
base64enc::base64encode("file.png")
This seems fine, but if I have a dozen or so images of which I might only want a question to use one, two, or three images selected at random for programmatically generated exercises, how can I avoid including the encoding for all dozen images with every single exercise?
The best I can think of at the moment is to include LaTeX syntax for graphics inclusion in a spreadsheet of possible question options, and as exercises are generated, use regular expressions to find the file names inside the \includegraphics{} commands that will be included, encode those as base 64 strings, and include them in the exercise file, but I'm wondering if there's a way I can do this without writing my own code to parse LaTeX.
First a few clarifications:
The fruit exercises include the images as Base64 strings because the three icons are quite small (12K per icon) and it is convenient to have all information within the Rnw/Rmd exercise without the need to store graphics files separately. It is just one trick that can be nifty and that we wanted to demonstrate.
For more and larger images one could do the same trick but it is probably less convenient. To illustrate how static images can be included in an exercise, the following template is available: http://www.R-exams.org/templates/Rlogo/ It uses the include_supplement() function to declare a certain file as a supplement for the exercises. If this is a graphic it can then be integrated into the exercises via \includegraphics{...} in Rnw exercises and via ![...](...) in Rmd exercises.
Each exercise just has to include the supplements it actually uses (and not all files from which these were sampled). And there is no need to do the Base64 encoding manually. This is done by the exams2xyz(...) functions automatically if needed.
Now for the scenario you describe. Say you have an exercise foo.Rmd in which you want to show one of three static images foo-1.png, foo-2.png, foo-3.png and ask questions about it. Then your R code might do something like:
i <- sample(1:3, 1)
img <- paste0("foo-", i, ".png")
include_supplement(img)
which randomly selects one of the three files and declares it to be an attachment. Then within the question text you would include the image via something like:
![](`r img`)
Caveats:
The code above assumes that the PNG images are located in the same directory as the Rmd exercise itself. If it is in a sub-directory bar/ say, you would need include_supplement(img, dir = "bar") etc.
If this exercise is rendered into HTML then the original file name (foo-1.png or foo-2.png or foo-3.png) would be visible in the HTML source code. This may (or may not) provide a hint for students what the correct answer is. If so, it would be better to include the file with a neutral name, e.g., include_supplement(img, target = "foo.png").
In Rnw exercises the code for including the graphic would be something like: \includegraphics{\Sexpr{img}}.
I ask my question after searching an answer on stackoverflow and on the web, without success.
I'm sorry if there is already an answer somewhere.
Global objective
I aim to create my questionnaires in libreoffice ( I need to print it, it's not for an online survey), and secondly to use it in a R shiny app I've created for register the collected answers and to export the data.
I want to create the fields in R (questions, answers...) automatically from the styles of my questionnaires in .odt, .docx or others formats.
I need to have well formatted questionnaires, nice-looking.
There is the problem:
I have written a questionnaire on a libreoffice .odt file (or if necessary in microsoft word).
I uses styles for different text blocks: one style for the "questions", one for the "answer", one for the parts of the questionnaire, one for the "instructions"...
I want to get a database ( in .csv format) with one column with the styles, and one column with the text content.
Solutions?
I try to open the xml files in the .odt or .docx archives, but the conversion to a simpler and readable format seems quite difficult.
Is it possible to export a toc from libreoffice or word to a spreadsheet format?
R can read in such files (.odt or .dox, or.xml) ?
Thank you very much for your ideas, and more generaly for your feedbacks on my project.
I'm sorry for my english
I would recommend using .Rmd (for rmarkdown) or .Rnw (for knitr) files as the source for your questionaires, rather than starting with .odt or .docx. You can produce output in various formats, including .docx, .pdf, .html (only .pdf for .Rnw) to display the questionaire to the subjects, but you can also develop functions to manage the data, or even interactive displays to collect and record the data.
I'm not familiar with R packages that do all of this for you, but I expect they already exist. Maybe someone else will give an answer with more details.
You might explore using the .fodt format in libreOffice Writer. That format is an "unzipped" version of the Writer xml format, so could be directly readable by xml utilities (and probably R, with appropriate libraries). I note that for another answer you seemed to want to avoid markdown or knitr composition, and .fodt would provide a "text" format completely compatible with LibreOffice as a front end.
(Note the other parts of LibreOffice have "flat" versions, so you could, in theory, process text versions of spreadsheets, graphics, and presentation files in your R utility.)
A few web searches indicates some relevant libraries and utilities for R exist, which may get you closer to what you need for your project.
I'm using Rmarkdown/Bookdown to write a paper/PDF document, which is an amazing tool #Yihui, thanks! Now I'm trying to include a table I have already put in LaTeX into the document by reading in this external .tex file. However, when knitting in RStudio with a \include{some-file.tex} or input{some-file.tex} in the body of the .Rmd outside of a chunk a LaTeX Error: Can be used only in preamble. is produced and the process stopped. I haven't found a way how to directly input through knit or otherwise into a chunk as well.
I found this question here: Rmarkdown v2, embed Latex document, although while the question is similar, there is no answer which would reflect how to input/include .tex-files into an .Rmd.
Why would I want this? Sometimes LaTeX tables offer more layout options than building directly in R, like for tables only with text rather than R-computed numbers. Also, when running models on a cluster, exporting results directly into .tex ready for compilation saves a lot of computation compared to have to open all these heavy .RData files just for getting the results into a PDF. Similarly, having sometimes multiple types of reports with different audiences, having the full R code in one main .Rmd file and integrating only the necessary results in other files reduces complexity by not having to redo all steps in each file newly. This way, I can keep one report with the full picture and do not have to check if I included every little change in various documents simultaneously.
So finally the question is how to get prepared .tex-Files into a .Rmd-document?
Thanks for your answers!
Finally, I've decided to move my dissertation research closer toward the goal of making it as good reproducible research as it can be, given my circumstances. Since currently I don't use LaTeX for my dissertation report (though I'm considering this option), I believe that knitr is the best way to go.
The software project, implementing empirical part of my dissertation research (data analysis), is being written in R. The project's contains multiple files within directory structure, which is rather typical for scientific workflows (top level sub-directories: analysis, cache, data, figures, import, prepare, present, results, sandbox, utils).
I have read a lot of information (including examples) on using knitr for auto-generating reports and reproducible research, in general. However, I'm somewhat overwhelmed by multitude of configuration options and, more importantly, still confused on the best/correct/optimal approach for using knitr in projects like mine, containing multiple files and directories. In particular, I'm interested in advice on framework and steps for transitioning existing codebase without too many modifications in R modules.
As an example, let's consider my modules, related to exploratory data analysis (EDA). My current EDA workflow includes:
preliminary data, transformed from the original raw data (located in "data/transform" sub-directories);
module "eda.R", located in "analysis" directory;
directory "results/eda", where my current code is generating figures (SVG files) of univariate and multivariate EDA, as well as a single document report (PDF file) with the same graphical only information (generated descriptive statistics is being produced as a console output, when running the "eda.R" script).
In order to transition to knitr-based project, I have created file "eda-report.Rmd" with R Markdown statements for setting local knitr options, including read_chunk("eda.R"). My understanding is that now I need to define existing blocks of R code in "eda.R" as knitr chunks and then call these named chunks, according to my EDA workflow.
Questions:
Is it correct approach? What are best practices for using knitr in regard to setting up project paths, using source(), grouping some plots via gridExtra, preventing potential issues? It seems to me that, in addition to "eda-report.Rmd", I need to create another R module, which will be initiating processing of the .Rmd file by knitr. If Yes, which call should I use: rmarkdown::render() or knitr::knit() (while I use RStudio for development, I want my code to be independent from the development environment)?
UPDATE 1 (Additional question):
Why processing of an .Rmd file in RStudio via "Knit HTML" button produces HTML document, while processing via Makefile command Rscript -e 'library("knitr"); knit("eda-report.Rmd")' produces .md file, but not HTML, despite the presence of output: html_document directive?
Thank you for reading this! Your advice will be greatly appreciated!
In order to transition your workflow to using knitr, I suggest that rather than trying to make every last piece of code you write reproducible, you should start with the bits that will be most useful.
Since knitr is a report generation tool, the best place to start is by writing your dissertation in knitr. (You mention that you don't use LaTeX at the moment. That's fine: knitr also supports AsciiDoc, which I find easier to write. If your dissertation doesn't have many equations or tables, you might also get away with writing it in Markdown or Textile, which are even easier.)
Similarly, knitr is good for any reports or papers that you might write.
For more advanced usage, you can create presentations using knitr. (I sometimes knit xhtml Slidy presentations.)
What I wouldn't bother with is trying to knit all your exploratory data analysis. Most things you'll find are boring or dead ends, so it isn't worth the extra effort. Concentrate on exploring as fast as you can, then knit the interesting bits afterwards. Likewise, data cleaning isn't usually that interesting, so well commented code often suffices.
To answer your question about directory structure, my preference is that since knitr reports are for final output, they should be sandboxed away from scrappier exploratory work. That is, they can have their own directory, and produce their own copies of figures.