I try to write all data analysis reports using R Markdown, because I can have a reproducible document that I can share in several output formats (Pdf, html and MS Word).
However, most of my colleagues use MS Word and they have no idea about R, Markdown, etc.
One advantage of using R Markdown is that I can generate my report in MS Word and directly share it with my colleagues.
The disadvantage is that collaboration becomes cumbersome for me, because I receive feedback on MS Word as well (typically using track changes) and I have to manually introduce those changes back into the .rmd file.
So, my question is: how can I simplify the process (i.e. make it as automatic as possible) of getting the changes in the MS Word document into the .Rmd?
Are there any tools out there that can help me out?
P.s.getting my colleagues to become R-literate is not an option :(
I haven't yet tried what I'm proposing, but here is how I plan to handle this, since I have exactly the same need. First, there are two distinct scenarios:
I am the lead author, or I am responsible for the statistical analysis: I will require all collaborators to learn and use markdown (not R Markdown, just generic markdown) and I'll instruct them not to touch any R code. I believe markdown is easy enough that anyone who is competent enough to collaborate on an article with data analysis is more than competent to learn markdown. For teaching them, the key features for people familiar with working with Microsoft Word and track changes are the following:
Basic markdown references: I would give them the core R Markdown references, which are their Pandoc Markdown documentation and their R Markdown cheat sheet.
Track changes: Collaborators would simply edit the markdown in plain text and submit their edited version. To view and reconcile differences, I would simply use a diff tool; I would find a good online one to teach my collaborators how to diff changes.
Comments between authors: I would select one of the options for markdown comments and teach my collaborators to use that when needed. The modified HTML comment (<!--- Pandoc-enhanced HTML comment -->) is the one I would probably use.
Reference management: I use Zotero, so I would use Better BibTeX for Zotero to handle references. The nice thing about this is that although I would have to handle the references myself, collaborators can directly add references to the Zotero group library. In fact, using citation keys, it should be simple for collaborators to learn how to insert references themselves into the markdown text.
I am NOT the lead author and I am NOT responsible for the statistical analysis: I would use whatever workflow the lead author uses (e.g. if the lead author uses Word with tracked changes, I'll use the same things).
I want to note that it seems that the only part that seems to be not so easy (compared to Microsoft Word normal working features) is replacing track changes with diff. I'm not aware of a tool that makes incorporating diff files as easy as how Word reconciles changes, but if such a tool exists, then the process should be more seamless.
I believe we would need to work on several packages in order to make true collaboration possible between users of Word and RMarkdown. I would be happy to collaborate with anyone interested in making this happen.
Adding a CriticMarkup plugin for RStudio. https://github.com/CriticMarkup/CriticMarkup-toolkit/
Having an R package that can scrape Word documents along with tracked changes. The officer package can already read Word documents, but not the tracked changes. It would also be extremely useful if this package could add simple RMarkdown formatting to the scrapes, e.g. for bold, subscripts and perhaps even tables to facilitate the subsequent matching of Word text to the RMarkdown file.
https://github.com/davidgohel/officer/issues/132
Write a package that can translate the scraped Tracked changes to CriticMarkup into the RMarkdown file.
Generate a key (paragraph)->(lines) that matches paragraphs scraped from Word (without any of the tracked changes) to lines in the RMarkdown. The problem is that we don't know what was generated using code, and what was directly written as Rmd. The first step would be to find lines in the RMarkdown file that should form paragraphs (exclude R chunks, but not inline R). Then, ensuring the order remains the same, compare these lines (remove newlines) to paragraphs scraped from the Word document, using a regexp symbol for "any char, any length" in the place of inline r chunks. Next, split paragraphs with inline chunks as into sub-paragraphs in order to be able to apply tracked changes and comments to either the inline code, before, or after the inline chunk more easily. Finally, the paragraphs that could not be matched were likely generated within code chunks and should be matched to the appropriate code chunks, determined from the order of the paragraphs.
Use the generated key, apply tracked changes (as CritcMarkup) to the RMarkdwown file. Any changes made to code chunks should be reported as a CrticMarkup comment around that code chunk (or group of code chunks if there is no markdown in between chunks).
I suggest you try trackdown https://claudiozandonella.github.io/trackdown/
trackdown offers a simple answer to collaborative writing and editing of R Markdown (or Sweave) documents. Using trackdown, the local .Rmd (or .Rnw) file is uploaded as plain-text in Google Drive where, thanks to the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing of the narrative part of the document. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
Using Google Docs, anyone can collaborate on the document as no programming experience is required, they only have to focus on the narrative text ignoring code jargon.
Moreover, you can hide code chunks setting hide_code = TRUE (they will be automatically restored when downloaded). This prevents collaborators from inadvertently making changes to the code that might corrupt the file and it allows collaborators to focus only on the narrative text ignoring code jargon.
You can also upload the actual Output (i.e., the resulting complied document) in Google Drive together with the .Rmd (or .Rnw) document. This helps collaborators to evaluate the overall layout, figures and tables and it allows them to use comments on the pdf to propose and discuss suggestions.
I know this is an old post, but for future askers, there is now a package available that can do (mostly) this:
The {redoc} package can output to Word, and by storing the R code internally within the Word document, it can also dedoc() a Word file back into RMarkdown. It uses the Critic Markup syntax discussed in another answer.
Related
I'm producing plenty of analyses in R and utilizing the .html Markdown format to present and communicate work. Often, my manager will need to correct/add to the text which accompanies the code blocks, and has practically no interaction with the code blocks. The analyses are typically produced by myself alone, so code collaboration is a low priority.
In an ideal world, he could open up the .html and edit the text in a browser, which I understand is not possible.
Are there any simple solutions for this? I am sure this is a common problem so there must be an easy solution I am overlooking. Here are the current solutions being considered:
Use Git (but my manager wouldn't like to learn Git)
Use Jupyter Notebooks (but I would prefer to stick with R Markdown for integration with RStudio and for the reproducible templates)
Knit the Markdown as a word document with manual version control on a shared network, allow tracking of changes in the word document, and copy-and-paste over changes made to the .Rmd file
The latter is least elegant but most likely to be used at the moment. If you have any suggestions, please let me know!
Here's a solution that is tailor-made your exact situation.
Use jupytext for bi-directional lossless interoperability between jupyter notebooks and R Markdown documents!
Maybe redoc is an option for you. Haven't tried it myself and it's still experimental but it would allow you to collaborate via Word. Basically the Word document can be edited and passed back to RMarkdown with all changes. See here.
I suggest you try trackdown https://claudiozandonella.github.io/trackdown/
trackdown offers a simple answer to collaborative writing and editing of R Markdown (or Sweave) documents. Using trackdown, the local .Rmd (or .Rnw) file is uploaded as plain-text in Google Drive where, thanks to the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing of the narrative part of the document. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
Moreover, you can hide code chunks setting hide_code = TRUE (they will be automatically restored when downloaded). This prevents collaborators from inadvertently making changes to the code that might corrupt the file and it allows collaborators to focus only on the narrative text ignoring code jargon.
You can also upload the actual Output (i.e., the resulting complied document) in Google Drive together with the .Rmd (or .Rnw) document. This helps collaborators to evaluate the overall layout, figures and tables and it allows them to use comments on the pdf to propose and discuss suggestions.
I am in the midst of writing some scripts to perform data analysis on large excel sheets faster than by hand. However, my company has a strict quality review system where the program used needs to be validated and secure (i.e. no one can edit it, there is proof of what code was run, etc.). So essentially I would like my code to be able to be ran by my coworkers without them being able to edit the script. I was also interested in inserting prompts that they can fill in (e.g. "Which column would you like to analyze?")
Is all of this possible? I have read a few things online about file permissions but I know that these can easily be changed by the user. I also read about obfuscators but am entirely unfamiliar with their use.
One thought I have is to use Rmarkdown as a method of displaying which lines were run for which results. However, I believe that document could be edited as well? This would also leave the issue of the script itself being able to be edited.
When I look at documentation for R packages, it often comes in a PDF document like this:
https://cran.r-project.org/web/packages/glmnet/glmnet.pdf
Does that document have a name?
Normally I find these documents by searching on the web, but I wonder if I can also produce them using some R command like library(help=...) or vignette(...). However, this answer makes it sound somewhat complicated, like I have to compile the package myself and run R CMD Rd2pdf, is that correct?
Also, as a prospective package author, I could imagine having this PDF document serve as the primary documentation for my package. The only obstacle is that when I read these documents, the documented functions always seem to appear in alphabetical order. Is there a way to put the most important functions first, so that the document can be read straight through (rather than just as a reference)? Or is there another documentation format which will let me document things in a certain order?
The reference manual is just a collection of the help pages. They should be written as reference material, which is probably not the first place a user should look for documentation, if that's what you mean by "primary documentation".
The first place users should look is for a vignette which provides an overview of the package. It can be displayed in HTML or PDF (it's up to the package author to choose). Since it is free-form, you can document things in a logical order, you aren't restricted to the alphabetical order of the reference manual.
It's also optional, and I use it as a measure of quality of a package that I'm investigating: if they don't have such a vignette, the authors don't really care about providing good documentation.
I am getting started with the reproducible research tools in R, and I'm pretty excited about the prospects. Sweave/Knitr/Markdown, all that stuff is great. I use RStudio, and they have done a great job of integrating those tool, and I hear that StatET does a nice job putting all that together as well.
I don't write academic papers in LaTeX, and all the people I work with use Word, so I am very interested in an effective workflow to use ODFWeave to make documents.
My usual process is:
Develop the code chunks in my IDE (RStudio, in my case)
Go back and insert these into a ODT document and fill in the surrounding text.
run ODFweave
My problem is that I get confused in tracking code chunks and putting them into the ODF document. Keeping the ODF document in sync as I create the code is annoying, so I'd rather wait and insert the code chunks by name.
So finally, here are my questions:
What are people's suggestions for tracking code chunks or on how to optimize this workflow?
Can anyone recommend tools or tips for keeping track of the code chunks you write?
Being a software geek and a data nerd, I naturally imagine a piece of software doing this for me. Like I'd have a database of code chunks, and when writing the ODF document I'd be able to click on a chunk to insert it into my ODF file.
Has one anyone created this sort of thing?
When you check the number of items tagged odfweave on SO, you will notice that it is rarely used compared to Sweave and knit-offs. I do not fully understand why it did not take off, possible because of table-generation being such a nuisance (at least that what I remember from my attempts).
Since many customers insist on Word-Documents, we are using two alternatives currently:
Create html, e.g. with RStudio/knitr/rmd, and read it with Word. This is not really a good workflow, to get reasonable document you need much manual post-processing, but it works more or less.
You can also use the path via RDCOM. I don't remember what's the state of art here, because we have totally given up using it since the conditions of licensing were not transparent to us.
Use pandoc. This approach produces documents that do not need manual post-processing in MS-Word, but the range of features to create a nice layout (cross-linked images, figure numbering) are limited; it might be a problem that we are not yet good enough in using pandoc in its full.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'd like to pitch the question discussed here to the SO community: what is the best way for Sweave users to collaborate with Word users?
I'm trying to move my entire workflow to R and Sweave (or similar, e.g. maybe Knitr will prove more useful). However, the last step in my workflow is usually to write a manuscript with collaborators. They work by passing MS Word documents back and forth, and editing text using Track Changes.
Let's stipulate that I can't convince any of them to learn any new software - their process isn't going to change. I am looking for a straightforward way to:
1.) send Sweave-created documents to coauthors
2.) allow them to open the documents in Word and make tracked changes
3.) receive the edited documents and reincorporate them into Sweave, ideally with co-authors' changes highlighted in some way
4.) And if the solution works for OSX, that would be great.
The discussion on the R help mailing list focusses on SWord which appears to be undocumented and available only for Windows. Does anyone know if it is good? The discussion on Vanderbilt's biostatistics wiki is good on ways to get Sweave documents into Word-readable forms, but no so much on how to integrate edited Word documents with Sweave.
I use a combination of approaches, depending on who is editing. My default is to have editors / collaborators markup a PDF or hard copy. Foxit Reader is free and provides more extensive PDF commenting tools than Acrobat reader, although reader allows comment bubbles.
For more extensive contributions, it helps that I separate out the Sweave parts of a document from the main text, e.g. by writing results in results.Rnw and the inserting \input{results.tex} into the main document. This allows you to send around the part that does not include the R markup. You can also copy-paste everything between the preamble and bibliography into a word document, and ask users to ignore the markup. If you copy-paste from an editor with syntax highlighting, it can be copied to word, making the process easier.
You might also consider using Inference for R, which is like Sweave for Word. There is also Lyx, which requires users to learn a new program, but which is easier to use than Sweave.
I've always thought that this was a good use case for odfWeave....
I've been researching on the issue for a couple of weeks now, and here is what I have come up with so far. Unfortunately, my newbie reputation does not allow to post all the proper links, but Google search should be evident for all the packages involved.
Conversions between LaTeX and Word are cumbersome, therefore I consider it feasible to use a third format which allows export to both LaTeX and Word. Several alternatives are available. First, you have Markdown, a markup language which I am using even to write this post :). Markdown in itself is not really suited for academic writing, but there is an extension in development that allows for citations, footnotes and other features of technical writing.
Second, perhaps more promising, the reStructuredText markup from Docutils, which can handle citations already. My idea is to write my articles in plain text using reST, weaving (or knitting) them using knitr into HTML or PDF, this is supported natively from within R. R code can be embedded of course, that is the whole point.
To convert the text to .doc one can use Pandoc, which can also handle citations, and is able to convert between multiple document formats, including PDF, Word, OpenDocument etc.
I still have to figure the whole workflow out. Converting between formats without the citations seems pretty straightforward to me (even though it requires some minor editing in Word afterwards). Working with citations still requires some figuring out. Hope this information helps whoever is on the same path of Reproducible Research, but also in the need to share texts with non-geek population.