How can Sweave users collaborate with Word users? [closed] - r

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'd like to pitch the question discussed here to the SO community: what is the best way for Sweave users to collaborate with Word users?
I'm trying to move my entire workflow to R and Sweave (or similar, e.g. maybe Knitr will prove more useful). However, the last step in my workflow is usually to write a manuscript with collaborators. They work by passing MS Word documents back and forth, and editing text using Track Changes.
Let's stipulate that I can't convince any of them to learn any new software - their process isn't going to change. I am looking for a straightforward way to:
1.) send Sweave-created documents to coauthors
2.) allow them to open the documents in Word and make tracked changes
3.) receive the edited documents and reincorporate them into Sweave, ideally with co-authors' changes highlighted in some way
4.) And if the solution works for OSX, that would be great.
The discussion on the R help mailing list focusses on SWord which appears to be undocumented and available only for Windows. Does anyone know if it is good? The discussion on Vanderbilt's biostatistics wiki is good on ways to get Sweave documents into Word-readable forms, but no so much on how to integrate edited Word documents with Sweave.

I use a combination of approaches, depending on who is editing. My default is to have editors / collaborators markup a PDF or hard copy. Foxit Reader is free and provides more extensive PDF commenting tools than Acrobat reader, although reader allows comment bubbles.
For more extensive contributions, it helps that I separate out the Sweave parts of a document from the main text, e.g. by writing results in results.Rnw and the inserting \input{results.tex} into the main document. This allows you to send around the part that does not include the R markup. You can also copy-paste everything between the preamble and bibliography into a word document, and ask users to ignore the markup. If you copy-paste from an editor with syntax highlighting, it can be copied to word, making the process easier.
You might also consider using Inference for R, which is like Sweave for Word. There is also Lyx, which requires users to learn a new program, but which is easier to use than Sweave.

I've always thought that this was a good use case for odfWeave....

I've been researching on the issue for a couple of weeks now, and here is what I have come up with so far. Unfortunately, my newbie reputation does not allow to post all the proper links, but Google search should be evident for all the packages involved.
Conversions between LaTeX and Word are cumbersome, therefore I consider it feasible to use a third format which allows export to both LaTeX and Word. Several alternatives are available. First, you have Markdown, a markup language which I am using even to write this post :). Markdown in itself is not really suited for academic writing, but there is an extension in development that allows for citations, footnotes and other features of technical writing.
Second, perhaps more promising, the reStructuredText markup from Docutils, which can handle citations already. My idea is to write my articles in plain text using reST, weaving (or knitting) them using knitr into HTML or PDF, this is supported natively from within R. R code can be embedded of course, that is the whole point.
To convert the text to .doc one can use Pandoc, which can also handle citations, and is able to convert between multiple document formats, including PDF, Word, OpenDocument etc.
I still have to figure the whole workflow out. Converting between formats without the citations seems pretty straightforward to me (even though it requires some minor editing in Word afterwards). Working with citations still requires some figuring out. Hope this information helps whoever is on the same path of Reproducible Research, but also in the need to share texts with non-geek population.

Related

Best way to collaborate with manager on R Markdown reports?

I'm producing plenty of analyses in R and utilizing the .html Markdown format to present and communicate work. Often, my manager will need to correct/add to the text which accompanies the code blocks, and has practically no interaction with the code blocks. The analyses are typically produced by myself alone, so code collaboration is a low priority.
In an ideal world, he could open up the .html and edit the text in a browser, which I understand is not possible.
Are there any simple solutions for this? I am sure this is a common problem so there must be an easy solution I am overlooking. Here are the current solutions being considered:
Use Git (but my manager wouldn't like to learn Git)
Use Jupyter Notebooks (but I would prefer to stick with R Markdown for integration with RStudio and for the reproducible templates)
Knit the Markdown as a word document with manual version control on a shared network, allow tracking of changes in the word document, and copy-and-paste over changes made to the .Rmd file
The latter is least elegant but most likely to be used at the moment. If you have any suggestions, please let me know!
Here's a solution that is tailor-made your exact situation.
Use jupytext for bi-directional lossless interoperability between jupyter notebooks and R Markdown documents!
Maybe redoc is an option for you. Haven't tried it myself and it's still experimental but it would allow you to collaborate via Word. Basically the Word document can be edited and passed back to RMarkdown with all changes. See here.
I suggest you try trackdown https://claudiozandonella.github.io/trackdown/
trackdown offers a simple answer to collaborative writing and editing of R Markdown (or Sweave) documents. Using trackdown, the local .Rmd (or .Rnw) file is uploaded as plain-text in Google Drive where, thanks to the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing of the narrative part of the document. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
Moreover, you can hide code chunks setting hide_code = TRUE (they will be automatically restored when downloaded). This prevents collaborators from inadvertently making changes to the code that might corrupt the file and it allows collaborators to focus only on the narrative text ignoring code jargon.
You can also upload the actual Output (i.e., the resulting complied document) in Google Drive together with the .Rmd (or .Rnw) document. This helps collaborators to evaluate the overall layout, figures and tables and it allows them to use comments on the pdf to propose and discuss suggestions.

Automating excel process -- Applying a macro to several files [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on files used to monitor health plan data, and each type of report comes in the same template. I have created macros to automate the oversight of the files (find errors, gaps in data, logical improbabilities, etc). Now that I have that in place and the code has been cleaned up -- I am trying to automate the application of those macros to these files. For instance, for one type of report I have 10 client files I have to review. Currently I am going through the painstaking process of opening each file and dropping in the macro, applying it to the file, removing the macro (so that the clients cant take my code), and saving the resulting file. I repeat that process for each client -- and then do a similar process for many other reports. I know there has to be a better way, I am wondering if anyone has experience in this field and might be able to point me in the direction of how to achieve it. I use R studio for another process we automated and I believe I could utilize it for this process as well -- just need to find a jumping off point.
Manual intervention will still be needed to review the results of the macros, but I am hoping to eliminate the unnecessary manual touch points
Really appreciate any advise / knowledge you can share
Unless the macro you've written contains some very specific functions that don't have Python equivalents, I'd recommend simply abandoning VBA and manipulating your Excel sheets in Python via xlwings or openpyxl. If your data is very "database-like" in that the top row is simply column headers and every additional row contains nothing but data aligned to those headers, you can also use pandas to process the data as well.
If you do need access to those functions built directly into Excel that don't have Python equivalents, you can use win32com to communicate with Excel via Python. This library basically drives Excel via its COM interface. You can then either use the COM libraries directly to execute an equivalent of your VBA file from within Python, or if you prefer to stick with VBA, you can simply paste your VBA code into your Python script and inject it into workbook like in this example. From there, you can also remove your VBA code from the Excel sheet as shown in this example.
A pure VB solution would involve essentially making these same calls to inject and subsequently remove the CodeComponent in an environment outside your Excel workbook.
You may find it vastly easier to solve problems like this with a popular scripting language like Python since the support community around it is much larger than VBA's. VBA tends to be unpopular among developers and thus its support community also tends to be small. Large support communities also mean well-maintained and highly-convenient libraries such as the aforementioned xlwings and openpyxl.

Why not organise all functions in a package in one file? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I found the following line in Hadley Wickham's book about R packages:
While you’re free to arrange functions into files as you wish, the two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file (See here).
But why? This would seem to be the two options which make most sense to me. Especially keeping just one file seems appealing to me.
From user perspective:
When I want to know how something works in a package I often go to the respective GitHub page and look at the code. Having functions organised in different files makes this a lot harder and I regularly end up cloning a repository just so I can search the content of all files (e.g. via grep -rnw '/path/to/somewhere/' -e 'function <-').
From a developer's perspective
I also don't really see the upside for developing a package. Browsing through a big file doesn't seem much harder than browsing through a small one if you employ the outline window in R Studio. I know about the Ctrl + . shortcut but it still means I have to open a new file when working on a different function while Ctrl + . could basically do the same job if I keep just one file.
Wouldn't it make more sense to keep all functions in one single file? I know different people like to organise their projects in different ways and that is fine. I'm not asking for opinions here. Rather I would like to know if there are any real disadvantage of keeping everything in one file.
There is obviously not one single answer to that. One obvious question is the size of your package: If it contains only those two neat functions and a plot command, why bother organizing it in any difficult manner: Just hack it into one file and you are good to go.
If your project is large, say you try to throw the R graphics system over and write lots and lots of functions (geoms and stats and ...) a lot of small files might be a better idea. But then, having more files than there is room for tabs in RStudio might not be such a good idea as well.
Another important question is, whether you intend to develop alone or with hundreds of people on GitHub. You might prefer to accept a change in a small file as opposed the "the one big file" that was so easy to search, back when you were alone.
The people who invented Java originally had a "one file per class" going and C# seems to have something similar. That does not mean, that those people are less clever then Hadley. It just means, that your mileage may vary and you have the right to oppose to Hadleys opinions.
Why not put all files on your computer in the root directory?
Ultimately if you use a file tree you are back to using everything as single entities.
Putting things that conceptually belong together into the same file is the logical continuation of putting things into directories/libraries.
If you write a library and define a function as well as some convenience wrappers around them it makes sense to put them in one file.
Navigating the file tree is made easier as you have fewer files and navigating the files is easier as you don't have all functions in the same file.

MS Word track changes and RMarkDown

I try to write all data analysis reports using R Markdown, because I can have a reproducible document that I can share in several output formats (Pdf, html and MS Word).
However, most of my colleagues use MS Word and they have no idea about R, Markdown, etc.
One advantage of using R Markdown is that I can generate my report in MS Word and directly share it with my colleagues.
The disadvantage is that collaboration becomes cumbersome for me, because I receive feedback on MS Word as well (typically using track changes) and I have to manually introduce those changes back into the .rmd file.
So, my question is: how can I simplify the process (i.e. make it as automatic as possible) of getting the changes in the MS Word document into the .Rmd?
Are there any tools out there that can help me out?
P.s.getting my colleagues to become R-literate is not an option :(
I haven't yet tried what I'm proposing, but here is how I plan to handle this, since I have exactly the same need. First, there are two distinct scenarios:
I am the lead author, or I am responsible for the statistical analysis: I will require all collaborators to learn and use markdown (not R Markdown, just generic markdown) and I'll instruct them not to touch any R code. I believe markdown is easy enough that anyone who is competent enough to collaborate on an article with data analysis is more than competent to learn markdown. For teaching them, the key features for people familiar with working with Microsoft Word and track changes are the following:
Basic markdown references: I would give them the core R Markdown references, which are their Pandoc Markdown documentation and their R Markdown cheat sheet.
Track changes: Collaborators would simply edit the markdown in plain text and submit their edited version. To view and reconcile differences, I would simply use a diff tool; I would find a good online one to teach my collaborators how to diff changes.
Comments between authors: I would select one of the options for markdown comments and teach my collaborators to use that when needed. The modified HTML comment (<!--- Pandoc-enhanced HTML comment -->) is the one I would probably use.
Reference management: I use Zotero, so I would use Better BibTeX for Zotero to handle references. The nice thing about this is that although I would have to handle the references myself, collaborators can directly add references to the Zotero group library. In fact, using citation keys, it should be simple for collaborators to learn how to insert references themselves into the markdown text.
I am NOT the lead author and I am NOT responsible for the statistical analysis: I would use whatever workflow the lead author uses (e.g. if the lead author uses Word with tracked changes, I'll use the same things).
I want to note that it seems that the only part that seems to be not so easy (compared to Microsoft Word normal working features) is replacing track changes with diff. I'm not aware of a tool that makes incorporating diff files as easy as how Word reconciles changes, but if such a tool exists, then the process should be more seamless.
I believe we would need to work on several packages in order to make true collaboration possible between users of Word and RMarkdown. I would be happy to collaborate with anyone interested in making this happen.
Adding a CriticMarkup plugin for RStudio. https://github.com/CriticMarkup/CriticMarkup-toolkit/
Having an R package that can scrape Word documents along with tracked changes. The officer package can already read Word documents, but not the tracked changes. It would also be extremely useful if this package could add simple RMarkdown formatting to the scrapes, e.g. for bold, subscripts and perhaps even tables to facilitate the subsequent matching of Word text to the RMarkdown file.
https://github.com/davidgohel/officer/issues/132
Write a package that can translate the scraped Tracked changes to CriticMarkup into the RMarkdown file.
Generate a key (paragraph)->(lines) that matches paragraphs scraped from Word (without any of the tracked changes) to lines in the RMarkdown. The problem is that we don't know what was generated using code, and what was directly written as Rmd. The first step would be to find lines in the RMarkdown file that should form paragraphs (exclude R chunks, but not inline R). Then, ensuring the order remains the same, compare these lines (remove newlines) to paragraphs scraped from the Word document, using a regexp symbol for "any char, any length" in the place of inline r chunks. Next, split paragraphs with inline chunks as into sub-paragraphs in order to be able to apply tracked changes and comments to either the inline code, before, or after the inline chunk more easily. Finally, the paragraphs that could not be matched were likely generated within code chunks and should be matched to the appropriate code chunks, determined from the order of the paragraphs.
Use the generated key, apply tracked changes (as CritcMarkup) to the RMarkdwown file. Any changes made to code chunks should be reported as a CrticMarkup comment around that code chunk (or group of code chunks if there is no markdown in between chunks).
I suggest you try trackdown https://claudiozandonella.github.io/trackdown/
trackdown offers a simple answer to collaborative writing and editing of R Markdown (or Sweave) documents. Using trackdown, the local .Rmd (or .Rnw) file is uploaded as plain-text in Google Drive where, thanks to the easily readable Markdown (or LaTeX) syntax and the well-known online interface offered by Google Docs, collaborators can easily contribute to the writing and editing of the narrative part of the document. After integrating all authors’ contributions, the final document can be downloaded and rendered locally.
Using Google Docs, anyone can collaborate on the document as no programming experience is required, they only have to focus on the narrative text ignoring code jargon.
Moreover, you can hide code chunks setting hide_code = TRUE (they will be automatically restored when downloaded). This prevents collaborators from inadvertently making changes to the code that might corrupt the file and it allows collaborators to focus only on the narrative text ignoring code jargon.
You can also upload the actual Output (i.e., the resulting complied document) in Google Drive together with the .Rmd (or .Rnw) document. This helps collaborators to evaluate the overall layout, figures and tables and it allows them to use comments on the pdf to propose and discuss suggestions.
I know this is an old post, but for future askers, there is now a package available that can do (mostly) this:
The {redoc} package can output to Word, and by storing the R code internally within the Word document, it can also dedoc() a Word file back into RMarkdown. It uses the Critic Markup syntax discussed in another answer.

Sweave/ODFWeave and tracking code chunks

I am getting started with the reproducible research tools in R, and I'm pretty excited about the prospects. Sweave/Knitr/Markdown, all that stuff is great. I use RStudio, and they have done a great job of integrating those tool, and I hear that StatET does a nice job putting all that together as well.
I don't write academic papers in LaTeX, and all the people I work with use Word, so I am very interested in an effective workflow to use ODFWeave to make documents.
My usual process is:
Develop the code chunks in my IDE (RStudio, in my case)
Go back and insert these into a ODT document and fill in the surrounding text.
run ODFweave
My problem is that I get confused in tracking code chunks and putting them into the ODF document. Keeping the ODF document in sync as I create the code is annoying, so I'd rather wait and insert the code chunks by name.
So finally, here are my questions:
What are people's suggestions for tracking code chunks or on how to optimize this workflow?
Can anyone recommend tools or tips for keeping track of the code chunks you write?
Being a software geek and a data nerd, I naturally imagine a piece of software doing this for me. Like I'd have a database of code chunks, and when writing the ODF document I'd be able to click on a chunk to insert it into my ODF file.
Has one anyone created this sort of thing?
When you check the number of items tagged odfweave on SO, you will notice that it is rarely used compared to Sweave and knit-offs. I do not fully understand why it did not take off, possible because of table-generation being such a nuisance (at least that what I remember from my attempts).
Since many customers insist on Word-Documents, we are using two alternatives currently:
Create html, e.g. with RStudio/knitr/rmd, and read it with Word. This is not really a good workflow, to get reasonable document you need much manual post-processing, but it works more or less.
You can also use the path via RDCOM. I don't remember what's the state of art here, because we have totally given up using it since the conditions of licensing were not transparent to us.
Use pandoc. This approach produces documents that do not need manual post-processing in MS-Word, but the range of features to create a nice layout (cross-linked images, figure numbering) are limited; it might be a problem that we are not yet good enough in using pandoc in its full.

Resources