OpenOffice command line PDF creation - build-process

I have some documentation written in OpenOffice, and I would like to include some of it as PDF files in the final build deliveries. I would like to do this with the automated build script.
Is there a way to create a PDF file from OpenOffice with a command line command?

As of September 2012, LibreOffice can convert a document to PDF from the command line:
lowriter --headless --convert-to pdf yourfile.odt
It also has bulk conversion support:
lowriter --headless --convert-to pdf yourfiles*.odt
will convert all the files that match the pattern to the corresponding PDF file.
There must be no LibreOffice windows open when you run this command.

There is a great tool called "unoconv", it was in my Ubuntu repository. It converts ODF, .ods, ... to PDF and I think to other formats too.
I also could convert PowerPoint files to PDF.

Art of Solving has also a very good API to perform the conversion in Java. It is a little slow but it is simple enough. This is how I use it:
File inputFile = new File("C:\\oreyes\\hola.doc");
File outputFile = new File("C:\\oreyes\\hola.pdf");
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
try {
connection.connect();
} catch(Exception e) {}
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
connection.disconnect();
You can create a jar with that and process it from the command line.

Though this question is a little old, here something for the purpose of documenting some common pitfalls with the LibreOffice solution:
If lowriter does not work for you because it ignores command line parameters and brings up the gui just try calling the libreoffice or loffice binaries:
loffice --headless --convert-to pdf yourfile.odt
If you get this message
Error: Please reverify input parameters...
try running it as root (e.g. via sudo). This helped me on Ubuntu 12.04 LTS with LibreOffice 3 installed and may also be a reason why this conversion is not running on a webserver without proper configuration (Libreoffice --headless refuses to convert unless root, won't work from PHP script)
Also make sure that you do not have any other instances of LibreOffice running or it will just fail silently and do no conversion at all.

There is anytopdf. Haven't tried it myself.
Quoting...
anytopdf is a perl script that converts OpenOffice.org, Microsoft Office (Word DOC, Excel XLS), RTF, HTML, and other openoffice.org readable file formats to the PDF format. It will automatically install the supporting 'AnyToPDF' OpenOffice.org Basic macro library in the current user's OpenOffice.org configuration if it's not already present.
Dedicated to peace, love, understanding and respect for all beings.

Related

How can I preview .Rd documentation files in R?

I would like to be able to open up an .Rd documentation file and preview it in R.
For example, I can create a data documentation file using promptData:
df <- data.frame(var1=1:5,var2=6:10)
promptData(df,filename = "df_doc.Rd")
which will produce a documentation file "df_doc.Rd" in the working directory.
In order to preview this file, I can open it up in the RStudio editor and then hit "Preview", which will open up df_doc properly formatted in the Help window. However, I'd like to be able to do that with code rather than having to open up the file and hit the Preview button in the RStudio GUI. Something like a preview("df_doc.Rd") function.
I'm aware that there are ways to 'install' the documentation files so R knows where to find them. But I'm writing some code that will generate these files automatically and preview them (hopefully without having to load in the dev tools that install the documentation files), so I'm specifically hoping to be able to preview these directly from file. Is that possible?
Man, the documentation for this one was pretty well hidden! To be fair, "Rd" isn't exactly Googleable, nor is documentation about documentation. But I managed to scrounge it up.
What I've been looking for is the
previewRd('df_doc.Rd')
command in the rstudioapi library. Unfortunately, this only works in RStudio, so if I want it to be generally usable I'll need to write HTML directly instead of Rd and open that in a browser.
According to 'Writing R Extensions', run:
R CMD Rdconv -t html filename.Rd > filename.html
in the command line. See also:
R CMD Rd2pdf --help
In R: system("R CMD Rdconv -t html filename.Rd > filename.html | chromium-browser")

cmd from R with wkhtmltopdf

I am trying to use wkhtmltopdf to turn website content into pdf and then read it into my R. So I write in my COMMAND PROMPT line to download as html front page of yahoo finance (just for fun). So i create "TemporaryFolder" on my C and write in cmd:
C:\Program Files\wkhtmltopdf\bin>wkhtmltopdf https://finance.yahoo.com/ "C:/TemporaryFolder/myhtml.pdf"
And it downloads yahoo finance website as pdf. Now I want to do the same thing but using R script. I know there is system function however I have very little experience with it (and with cmd to be honest).
So now i try use this command in my Rstudio so I can later create R script which downloads website as html and converts it to pdf.
URL="https://finance.yahoo.com/"
wkhtmltopdf_dir="C:/Program Files/wkhtmltopdf/bin"
save_as="C:/TemporaryFolder/myhtml.pdf"
x=paste0(wkhtmltopdf_dir,">","wkhtmltopdf"," ",URL," ",'\"',save_as,'\"')
system(x)
I also tried shell(x) but I got "permission denied".
But it does nothing... Could someone elaborate how system works and what should be added here?
BTW: can I harm my computer by using system? For example writing some "bad" command? This question might sound silly, but I am really to new to this.
What you're trying to paste as a command ("C:/Program Files/wkhtmltopdf/bin>wkhtmltopdf https://finance.yahoo.com/ \"C:/TemporaryFolder/myhtml.pdf\"") doesn't quite work. The first part ("C:/Program Files/wkhtmltopdf/bin>) is actually the prompt when you run it in commander. It's not a part of the command, but instead shows in which directory you are running that command.
If you replace wkhtmltopdf with C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe, it should work just fine:
URL="https://finance.yahoo.com/"
wkhtmltopdf_exe="C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe"
save_as="C:/TemporaryFolder/myhtml.pdf"
x=paste0(wkhtmltopdf_exe," ",URL," ",'\"',save_as,'\"')
system(x)
To answer your second question, a call to system() runs the command through CMD. So basically anything you could mess up through CMD.exe, you can mess up through system().
I figured out what was wrong. As i posted in comment, after using shell(x) instead of system(x) it returned 'C:/Program' is not recognized as an internal or external command, operable program or batch file.. So I reinstalled my wkhtmltopdf to folder which name contains no spaces. So wkhtmltopdf_exe is now:
wkhtmltopdf_exe="C:/Programs/wkhtmltopdf/bin/wkhtmltopdf.exe"
Rest of the code is the same. Followup here would be nice, is there any way to workaround spaces in folder names? Or should I always avoid spaces? Putting wkhtmltopdf path into quotation marks didnt help.
Thank for user JAD for fixing my first code

PDF to text in R in Mac

I have downloaded PDFtoText in mac and wrote following code to convert pdf files to text:
pdf_to_load =("~/my_directory/my.pdf")
system(paste('pdftotext', pdf_to_load))
The code runs well but I am not able to see my.txt in the source directory nor it has been saved anywhere in the folders. Where I went wrong?
One of my mentors were able to run the same code in his computer and he was able to see the converted .txt file.
Kindly guide.
You get a wrong result if the default PDF extraction engine is not found on your computer, see ?tm::readPDF. Those engines are not part of R or of the tm package, and it depends on your computer whether the necessary programs are already installed.
The easiest solution is to install the programs pdftotext and pdfinfo (you'll need both), which you can obtain as precompiled binaries here.
Once these programs are correctly installed, you should be able to extract the text of the PDF file without a system call, by using the readPDF() function of the tm package
library(tm)
my_pdf_txt <- readPDF(control=list(text="-layout"))(elem=list(uri="~/my_directory/my.pdf"), language="en")

Executing a batch file in an R script

I would like to execute a batch file from a R script. The file is in a directory like \\network\path\to\batch\file.bat.
I know I can use the system command in R to run DOS commands but I can't simply use system("start file.bat"). So how would I best use R script to execute this batch file?
Try shell.exec("\\\\network\\path\\file.bat")
The shell.exec command uses the Windows-associated application to open the file. Note the double back-ticks.
Pro tip: write.csv(file='tmp.csv',tmpdat);shell.exec('tmp.csv') is useful (assuming you've associated CSV files with your preferred application for viewing CSV files) for quickly checking output.
try
shell('\network\path\to\batch\file.bat')
I found this problem when using RSelenium in Windows as well but using this batch file made sure to close all chromedriver processes. I was ending up with a ton of these processes after a lengthy scraping session.
My solution was to execute the batch file from within the R script every so often by using:
shell.exec(file.path(getwd(), "kill_chromedriver.bat"))
This is what I did on windows platform.
cmd='\\prj\\whatkit\\test.bat'
system(cmd, intern = TRUE, show.output.on.console = TRUE)

R in Ubuntu Server 14.04 - Writing file to a samba sharing in which a directory is protected by user/pass

I was wondering if writing to a protected samba directory, to a internal directory with another user/pass could be possible. I have been searching for an R package with these capabilities but I haven't found it yet. Now I'm using the write() function. In the documentation for this function, it says:
Arguments:
x : The data to be written out, usually an atomic vector.
file : A connection, or a character string naming the file to write to. If "", print to the standard output connection. If it is "|cmd", the output is piped to the command given by ‘cmd’.
I don't understand if the attribute "file" when writing "|cmd" launches the Terminal in Ubuntu server or something, but I can't make it work
Hope you have a nice day!
I found it can be done via a shell script in Ubuntu. The command is write(somefile, file = "|sh shellWriteSamba.sh")
Edit: The complete solution is in R - write() a file to a SAMBA share

Resources