Filling PDF forms in R? - r

I am seeking a way to automate PDF form filling in R. I cannot find a package written to do this. Is there an option out there?
Alternative solutions I can think of:
Using R to overlay a PDF containing text onto an blank PDF template.
Using R to generate an FDF file that can be read by some other software or code in a different language.
All of these things seem doable in Python. However, my organization leans strongly towards R, and in the past has relied upon software devs to write C# to fill out the forms. I'm hoping to use R to skip over this step.
Thanks!

staplr package now supports this with get_fields and set_fields functions. Note that for this to work pdftk
server must be installed and in your path
get_fields returns a list of fields and their types from a pdf that you can modify
set_fields allows you to fill form according to your modifications. See below code for an example
pdfFile = system.file('testForm.pdf',package = 'staplr')
fields = get_fields(pdfFile)
# You'll get a list of fields that the pdf contains
# along with some additional information about the fields.
# You make modifications in any of the fields by
fields$TextField1$value = 'this is text'
# and apply the changes you have made in a new file
set_fields(pdfFile, 'newFile.pdf', fields)
Note: Currently github version of staplr has fixes that are yet to make into CRAN that affect staplr's ability to write in non-english alphabets. For best experience you may want to install it by doing
devtools::install_github('pridiltal/staplr')

Related

Is there a way to accelerate formatted table writing from R to excel?

I have a 174603 rows and 178 column dataframe, which I'm importing to Excel using openxlsx::saveWorkbook, (Using this package to obtain the aforementioned format of cells, with colors, header styles and so on). But the process is extremely slow, (depending on the amount of memory used by the machine it can take from 7 to 17 minutes!!) and I need a way to reduce this significantly (Doesn't need to be seconds, but anything bellow 5 min would be OK)
I've already searched other questions but they all seem to focus either in exporting to R (I have no problem with this) or writing non-formatted files to R (using write.csv and other options of the like)
Apparently I can't use xlsx package because of the settings on my computer (industrial computer, Check comments on This question)
Any suggestions regarding packages or other functionalities inside this package to make this run faster would be highly appreciated.
This question has some time ,but I had the same problem as you and came up with a solution worth mentioning.
There is package called writexl that has implemented a way to export a data frame to Excel using the C library libxlsxwriter. You can export to excel using the next code:
library(writexl)
writexl::write_xlsx(df, "Excel.xlsx",format_headers = TRUE)
The parameter format_headers only apply centered and bold titles, but I had edited the C code of the its source in github writexl library made by ropensci.
You can download it or clone it. Inside src folder you can edit write_xlsx.c file.
For example in the part that he is inserting the header format
//how to format headers (bold + center)
lxw_format * title = workbook_add_format(workbook);
format_set_bold(title);
format_set_align(title, LXW_ALIGN_CENTER);
you can add this lines to add background color to the header
format_set_pattern (title, LXW_PATTERN_SOLID);
format_set_bg_color(title, 0x8DC4E4);
There are lots of formating you can do searching in the libxlsxwriter library
When you have finished editing that file and given you have the source code in a folder called writexl, you can build and install the edited package by
shell("R CMD build writexl")
install.packages("writexl_1.2.tar.gz", repos = NULL)
Exporting again using the first chunk of code will generate the Excel with formats and faster than any other library I know about.
Hope this helps.
Have you tried ;
write.table(GroupsAlldata, file = 'Groupsalldata.txt')
in order to obtain it in txt format.
Then on Excel, you can simply transfer you can 'text to column' to put your data into a table
good luck

How to extract a database from a text file (word or libreoffice) with styles and content

I ask my question after searching an answer on stackoverflow and on the web, without success.
I'm sorry if there is already an answer somewhere.
Global objective
I aim to create my questionnaires in libreoffice ( I need to print it, it's not for an online survey), and secondly to use it in a R shiny app I've created for register the collected answers and to export the data.
I want to create the fields in R (questions, answers...) automatically from the styles of my questionnaires in .odt, .docx or others formats.
I need to have well formatted questionnaires, nice-looking.
There is the problem:
I have written a questionnaire on a libreoffice .odt file (or if necessary in microsoft word).
I uses styles for different text blocks: one style for the "questions", one for the "answer", one for the parts of the questionnaire, one for the "instructions"...
I want to get a database ( in .csv format) with one column with the styles, and one column with the text content.
Solutions?
I try to open the xml files in the .odt or .docx archives, but the conversion to a simpler and readable format seems quite difficult.
Is it possible to export a toc from libreoffice or word to a spreadsheet format?
R can read in such files (.odt or .dox, or.xml) ?
Thank you very much for your ideas, and more generaly for your feedbacks on my project.
I'm sorry for my english
I would recommend using .Rmd (for rmarkdown) or .Rnw (for knitr) files as the source for your questionaires, rather than starting with .odt or .docx. You can produce output in various formats, including .docx, .pdf, .html (only .pdf for .Rnw) to display the questionaire to the subjects, but you can also develop functions to manage the data, or even interactive displays to collect and record the data.
I'm not familiar with R packages that do all of this for you, but I expect they already exist. Maybe someone else will give an answer with more details.
You might explore using the .fodt format in libreOffice Writer. That format is an "unzipped" version of the Writer xml format, so could be directly readable by xml utilities (and probably R, with appropriate libraries). I note that for another answer you seemed to want to avoid markdown or knitr composition, and .fodt would provide a "text" format completely compatible with LibreOffice as a front end.
(Note the other parts of LibreOffice have "flat" versions, so you could, in theory, process text versions of spreadsheets, graphics, and presentation files in your R utility.)
A few web searches indicates some relevant libraries and utilities for R exist, which may get you closer to what you need for your project.

Creating an editable slideshow (ideally a powerpoint) from R

As part of a contract, the team I work in has to produce a monthly powerpoint filled with KPI's and other requested values, which is then passed on to another team who write a commentary on last months performance. At the moment the values are created (mostly in SAS) exported to an excel file and then copy and pasted into a powerpoint. This is an old approach which clearly needs updating.
What I would ideally like to do is to automate the presentation using RMarkdown and save myself the hassle of copy and pasting values. The issue is that RMarkdown from what I can see can't produce a .ppt file, or another editable format that the commentary team could add to without having to use R.
From googling around the topic I found packages such as rcom, RDCOMclient, and R2PPT but they don't appear to have been recently updated or maintained.
TLDR; Need a way of making a powerpoint/slideshow in R where the text can be edited afterwards outside of R.
This can be easily done with RStudio and pandoc 2.1:
1) install Pandoc 2 from pandoc.org (this is higher version than the one which currently comes with rstudio )
2) create your RMarkdown file in RStudio,
---
title: 'Some title'
author: 'author'
output:
md_document: default
---
3) knit to md
4) call pandoc to convert to pptx
system("cmd", input = "C:\\Users\\janvy\\AppData\\Local\\Pandoc\\pandoc -f markdown -t pptx -o myfile.pptx myfile.md" )
I used to work for a company that had all their presentations linked to excel sheets and it worked fine for some broad definition of fine.
If you have to keep Powerpoint as a presentation format, I'd advise to not use R for creating it. There are some packages that create great connections to Office products, but in my experience they break easily with development of packages and R versions. (one of the ones I played in the past, would recreate ggplot2 plots with actual shapes and lines on Powerpoint, resulting in a huge file)
With that in mind, I'd advise you to create the results programatically and dump it in an excel spread sheet and build the presentation linked to that spread sheet. To keep you sane, I'd do one file per month (if that is the kpi's periodicity). There are many nice packages that create excel spreadsheets, but I'd stick to csv files for their simplicity.
I recommend that you take a look at SlideMight, a utility for merging data with PowerPoint templates; both text and images, in slides and tables. The usage is in principle similar to mail merge, with some more advanced stuff.
Possibly a solution for you would be to have the R program first write your data in a YAML, JSON or XML file; then it invokes the command-line version of SlideMight.
See www.slidemight.com.
Disclaimer: I am the developer and seller of SlideMight.

Displaying png files from R into spotfire

I want to pass data from Spotfire to R and then display the plot constructed by R.
What is the best way to do this?
I’ve figured out the trick of putting images into Spotfire. It’s not hard if you follow these directions, but it’s done in a way very different from how you guess you would do it in Spotfire, and that’s why it took me awhile to figure out.
Here’s an overview of how to do it. You create a DocumentProperty which is a binary object, you write some Spotfire code that gives a value to that Document Property, and you display that binary object using a Spotfire Property Control of the “Label” type.
The confusing parts are that you DON’T use the Spotfire “Insert Image” tool at all, and that you DON’T use the filename generated inside the R code in Spotfire at all. Once you get used to the idea that the two most obvious ways you think you would approach the problem in Spotfire are entirely useless and wrong, you can make some progress.
I’ll leave out the spiderplot specifics because the code’s pretty long.
Here’s what you do.
1) Create a document Property in Spotfire of type “Binary”, e.g., “imageThatGoesBackToSpotfire”
2) You write some R code that generates an image and writes it to a file:
# get a temporary directory name on the local machine. You wouldn’t need to do this is you were just
# going to run it on your own, but you need to do it if you intend to let anybody else run it on their own machine.
tempfilebase = tempfile()
# take the tempfilebase and prepend it to a filename.
myFilename<-“someFileName.jpg”
myFullFilename <- paste(tempfilebase,myFilename,sep="")
#open a jpeg
jpeg(filename=myFullFileName)
# generate the image, however you normally would in R
plot(input)
# close the file
dev.off
# open a connection to that file.
myConnection<-file(myFullFileName,open=”rb”)
imageThatGoesBackToSpotfire<- data.frame(r=readBin(myConnection, what="raw", n=(file.info(myFullFileName)$size)))
close(myConnection)
3) Run your R script, above. Select some columns that are the “input” to the plot, and make the R script return outputs to the “imageThatGoesBackToSpotfire” DocumentProperties.
4) Create a text area in Spotfire.
5) Insert a Property Control into the text area of type “label”. (Click on the icon that’s circled in the picture below). This opens a dialog,
You need to register a data function with inputs and outputs, and the specific PNG data needs to be returned as a binary label.
Some details: http://spotfire.tibco.com/tips/2014/02/25/dynamically-displaying-images-in-a-text-area/

Create and save R's default codebooks as a pdf

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.
I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.
Is it possible to save the 'content' of ?mtcars and how is it created?
Thanks, Eric
P.S. I did read this thread.
update 2012-05-14 00:39:59 PDT
I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)
update 2012-05-14 09:49:05 PDT
Thank you very much everyone for the many answers.
Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.
R, I am looking for a solution that is based exclusively on R.
Reproducibility, that the codebook can be part of a automated script.
Readability, the text should be easy to read.
Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).
I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.
(I'm not completely sure what you're after, but):
Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.
However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.
For user-data, usually Noweb is much more convenient.
.Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files.
However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.
Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)
If html is an option, both Sweave and knitr can work with that.
I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:
path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD",
"Rd2pdf",shQuote(path)))
I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.
PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.
The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.
.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.
For more information on creating your own packages, writing .Rd files, and building packages:
http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".
Edit
And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .
You can't create a PDF with just R; you need to use other software that creates PDFs.
You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.
It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.
It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.
Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.
In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:
base_text = "
First header
============
This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)
Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Resources