Is there a standalone library of OpenOffice's formula renderer? I'm looking for something that can take plain text (e.g. E = mc^2) in the same syntax as used by OpenOffice, and convert to png or pdf fragments.
(note: I don't need the WYSIWYG editor, just the renderer. Basically I would like to work in OpenOffice to interactively edit my formulas, and then copy the source text for use in other contexts w/o needing OpenOffice to render them.)
I'm using unoconv to convert OpenOffice/LibreOffice document to PDF.
However, first I had to create some input document with a formula.
Unfortunately, it is not possible to use just the formula editor to create ODF file because the output PDF file would contain weird headers and footers.
Therefore, I created a simple text document (in Writer) and embedded the formula as a single object (aligned as a character). I saved the ODT file, unzipped it (since ODT is just a ZIP) and edited the content. Then, I identified what files can be deleted and formatted the remaining files to get a minimal example.
In my example, the formula itself is located in Formula/content.xml. It should be easy to change just the code within the <annotation>...</annotation> tags in an automated way.
Finally, I zipped the directory and produced a new ODT file.
Then, using unoconv and pdfcrop, I produced a nice formula as PDF.
# this trick prevents zip from creating an additional directory
cd formula.odt.unzipped
zip -r ../formula.odt .
cd ..
unoconv -f pdf formula.odt # ODT to PDF
pdfcrop formula.pdf # keep only the formula
# you can convert the PDF to bitmap as follows
convert -density 300x300 formula-crop.pdf formula.png
Content of the unzipped ODT directory:
Here is the minimal content of an ODT file formula.odt.
formula.odt.unzipped/Formula/content.xml
formula.odt.unzipped/META-INF/manifest.xml
formula.odt.unzipped/content.xml
File formula.odt.unzipped/Formula/content.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<annotation encoding="StarMath 5.0">
f ( x ) = sum from { { i = 0 } } to { infinity } { {f^{(i)}(0)} over {i!} x^i}
</annotation>
</semantics>
</math>
File formula.odt.unzipped/content.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink">
<office:body>
<office:text>
<text:p>
<draw:frame>
<draw:object xlink:href="./Formula"/>
</draw:frame>
</text:p>
</office:text>
</office:body>
</office:document-content>
File formula.odt.unzipped/META-INF/manifest.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" manifest:version="1.2">
<manifest:file-entry manifest:full-path="/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.text"/>
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Formula/content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Formula/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
</manifest:manifest>
There are several web services that run LaTeX for you and return an image. For instance, http://rogercortesi.com/eqn/
Related
I have created powerpoint files using officer package and I would also like to save them as pdf from R (dont want to manualy open and save as pdf each file). Is this possible?
you can save the powerpoint object edited using the code which is posted here: create pdf in addition to word docx using officer.
You will need to first install pdftools and libreoffice
library(pdftools)
office_shot <- function( file, wd = getwd() ){
cmd_ <- sprintf(
"/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to pdf --outdir %s %s",
wd, file )
system(cmd_)
pdf_file <- gsub("\\.(docx|pptx)$", ".pdf", basename(file))
pdf_file
}
office_shot(file = "your_presentation.pptx")
Note that the author of the officer package is the one who referred someone to this response.
Note that the answer from Corey Pembleton has the LibreOffice iOS path. (Which I personally didn't initially notice). The Windows path would be something like "C:/Program Files/LibreOffice/program/soffice.exe".
Since the initial answer provided by Corey, an example using docxtractr::convert_to_pdf can now be found here.
The package and function are the ones John M commented in Corey initial answer.
An easy solution to this question is to use convert_to_pdf function from docxtractr package. Note: this solution requires to download LibreOffice from here. I used the following order.
First, I need to set the path to LibreOffice and soffice.exe
library(docxtractr)
set_libreoffice_path("C:/Program Files/LibreOffice/program/soffice.exe")
Second, I set the path of the PowerPoint document I want to convert to pdf.
pptx_path <- "G:/My Drive/Courses/Aysem/Certifications/September17_Part2.pptx"
Third, convert it using convert_to_pdf function.
pdf <- convert_to_pdf(pptx_path, pdf_file = tempfile(fileext = ".pdf"))
Be careful here. The converted pdf file is saved in a local temporary folder. Here is mine to give you an idea. Just go and copy it from the temporary folder.
"C:\\Users\\MEHMET~1\\AppData\\Local\\Temp\\RtmpqAaudc\\file3eec51d77d18.pdf"
EDIT: A quick solution to find where the converted pdf is saved. Just replace the third step with the following line of code. You can set the path where you want to save. You don't need to look for the weird local temp folder.
pdf <- convert_to_pdf(pptx_path, pdf_file = sub("[.]pptx", ".pdf", pptx_path))
I'm trying to take biological data saved in a .csv format and convert it into a specific xml format set by Darwin Core standards (an extension of Dublin Core). The data are set up in rows of observation records with headers in the first row. I need to repackage the data with Darwin Core standard XML tags using a basic XML tree/schema. The purpose is to standardize the data and make it readily available to load into any kind of database program.
I am a biologist, so I'm fairly new in computer programming and code. I would like to write something in R or excel that can do this repackaging step automatically so I don't have to manually reenter thousands of records.
I have tried using the developer tools in excel 365 to save the .csv as an .xml file, but it seems like I would have to develop the xml tree or schema in a text editor program first. Also, it seems like the xml add-ons that I would use are no longer available. I have downloaded the free text editor called "Brackets" build 1.14 to write some simple xml. I also have RStudio version 1.1.419 with the XML package downloaded to potentially write a script with R version 3.4.3. I've read up on all the Darwin Core Terms and basic XML syntax and rules, but I don't really know where to start.
This is an example of the data in simple .csv format:
type,institutionCode,collectionCode,catalogNumber,scientificName,individualCount,datasetID
PhysicalObject,ANSP,PH,123,"Cryptantha gypsophila Reveal & C.R. Broome",12,urn:lsid:tim.lsid.tdwg.org:collections:1
PhysicalObject,ANSP,PH,124,"Buxbaumia piperi",2,urn:lsid:tim.lsid.tdwg.org:collections:1
This is what the records should look like as an end product:
[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>123</dwc:catalogNumber>
<dwc:scientificName>Cryptantha gypsophila reveal & C.R. Boome</dwc:scientificName>
<dwc:individualCount>12</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>124</dwc:catalogNumber>
<dwc:scientificName>Buxbaumia piperi</dwc:scientificName>
<dwc:individualCount>2</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
</dwr:SimpleDarwinRecordSet>]
This can be done in a number of ways. Here, I go for the stringi solution because it's easy to read what the inputs are.
The code below imports the data, writes the first part of the XML, then writes SimpleDarwinRecords for each line and finally the last part of the file. unlink is there to clean up before anything is appended to the file. If indentation matters (apparently it doesn't), you may need to tweak the template a bit.
This could also be done using a Jinja2 template and Python.
library(stringr)
xy <- read.table(text = 'type,institutionCode,collectionCode,catalogNumber,scientificName,individualCount,datasetID
PhysicalObject,ANSP,PH,123,"Cryptantha gypsophila Reveal & C.R. Broome",12,urn:lsid:tim.lsid.tdwg.org:collections:1
PhysicalObject,ANSP,PH,124,"Buxbaumia piperi",2,urn:lsid:tim.lsid.tdwg.org:collections:1', header = TRUE,
sep = ",")
unlink("output.txt")
outfile <- file(description = "output.txt", open = "at")
writeLines('[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">', con = outfile)
writeLines(str_glue('<dwr:SimpleDarwinRecord>
<dcterms:type>{xy$type}</dcterms:type>
<dwc:institutionCode>{xy$institutionCode}</dwc:institutionCode>
<dwc:collectionCode>{xy$collectionCode}</dwc:collectionCode>
<dwc:catalogNumber>{xy$catalogNumber}</dwc:catalogNumber>
<dwc:scientificName>{xy$scientificName}</dwc:scientificName>
<dwc:individualCount>{xy$individualCount}</dwc:individualCount>
<dwc:datasetID>{xy$datasetID}</dwc:datasetID>
</dwr:SimpleDarwinRecord>'), con = outfile)
writeLines(
'</dwr:SimpleDarwinRecordSet>]',
con = outfile)
close(outfile)
This is the result:
[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>123</dwc:catalogNumber>
<dwc:scientificName>Cryptantha gypsophila Reveal & C.R. Broome</dwc:scientificName>
<dwc:individualCount>12</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>124</dwc:catalogNumber>
<dwc:scientificName>Buxbaumia piperi</dwc:scientificName>
<dwc:individualCount>2</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
</dwr:SimpleDarwinRecordSet>]
I am reading ISL at the moment which is related to machine learning in R
I really like how the book is laid out specifically where the authors reference code inline or libraries for example library(MASS).
Does anyone know if the same effect can be achieved using R Markdown i.e. making the MASS keyword above brown when i reference it in a paper? I want to color code columns in data frames when i talk about them in the R Markdown document. When you knit it as a HTML document it provides pretty good formatting but when i Knit it to MS Word it seems to just change the font type
Thanks
I've come up with a solution that I think might address your issue. Essentially, because inline source code gets the same style label as code chunks, any change you make to SourceCode will be applied to both chunks, which I don't think is what you want. Instead, there needs to be a way to target just the inline code, which doesn't seem to be possible from within rmarkdown. Instead, what I've opted to do is take the .docx file that is produced, convert it to a .zip file, and then modify the .xml file inside that has all the data. It applies a new style to the inline source code text, which can then be modified in your MS Word template. Here is the code:
format_inline_code = function(fpath) {
if (!tools::file_ext(fpath) == "docx") stop("File must be a .docx file...")
cur_dir = getwd()
.dir = dirname(fpath)
setwd(.dir)
out = gsub("docx$", "zip", fpath)
# Convert to zip file
file.rename(fpath, out)
# Extract files
unzip(out, exdir=".")
# Read in document.xml
xml = readr::read_lines("word/document.xml")
# Replace styling
# VerbatimChar didn't appear to the style that was applied in Word, nor was
# it present to be styled. VerbatimStringTok was though.
xml = sapply(xml, function(line) gsub("VerbatimChar", "VerbatimStringTok", line))
# Save document.xml
readr::write_lines(xml, "word/document.xml")
# Zip files
.files = c("_rels", "docProps", "word", "[Content_Types].xml")
zip(zipfile=out, files=.files)
# Convert to docx
file.rename(out, fpath)
# Remove the folders extracted from zip
sapply(.files, unlink, recursive=TRUE)
setwd(cur_dir)
}
The style that you'll want to modify in you MS Word template is VerbatimStringTok. Hope that helps!
I have a mixed filetype collection of MS Word documents. Some files are *.doc and some are *.docx. I'm learning to use tm and I've (more or less*) successfully created a corpus composed of the *.doc files using this:
ex_eng <- Corpus(DirSource('~/R/expertise/corpus/english'),
readerControl=list(reader=readDOC,
language='en_CA',
load=TRUE));
This command does not handle *.docx files. I assume that I need a different reader. From this article, I understand that I could write my own (given a good understanding of the .docx format which I do not currently have).
The readDOC reader uses antiword to parse *.doc files. Is there a similar application that will parse *.docx files?
Or better still, is there already a standard way of creating a corpus of *.docx files using tm?
* more or less, because although the files go in and are readable, I get this warning for every document: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'path/to/a/file.doc'
.docx files are zipped XML files. If you execute this:
> uzfil <- unzip(file.choose())
And then pick a .docx file in your directory, you get:
> str(uzfil)
chr [1:13] "./[Content_Types].xml" "./_rels/.rels" "./word/_rels/document.xml.rels" ...
> uzfil
[1] "./[Content_Types].xml" "./_rels/.rels" "./word/_rels/document.xml.rels"
[4] "./word/document.xml" "./word/theme/theme1.xml" "./docProps/thumbnail.jpeg"
[7] "./word/settings.xml" "./word/webSettings.xml" "./word/styles.xml"
[10] "./docProps/core.xml" "./word/numbering.xml" "./word/fontTable.xml"
[13] "./docProps/app.xml"
This will also silently unpack all of those files to your working directory. The "./word/document.xml" file has the words you are looking for, so you can probably read them with one of the XML tools in package XML. I'm guessing you would do something along the lines of :
library(XML)
xtext <- xmlTreeParse(unz(uzfil[4]), useInternalNodes = TRUE) )
Actually you will probably need to save this to a temp-directory and add that path to the file name, "./word/document.xml".
You may want to use the further steps provided by #GaborGrothendieck in this answer: How to extract xml data from a CrossRef using R?
I ended up using docx2txt to convert the .docx files to text. Then I created a corpus from them like this:
ex_eng <- Corpus(DirSource('~/R/expertise/corpus/english'),
readerControl=list(reader=readPlain,
language='en_CA',
load=TRUE));
I figure I could probably hack the readDOC reader so that it would use docx2txt or antiword as needed, but this works.
I am using QGIS software. I would like to show value of each raster cell as label.
My idea (I don't know any plugin or any functionality from QGIS which allow to it easier) is to export raster using gdal2xyz.py into coordinates-value format and then save it as vector (GML or shapefile). For this second task, I try to use
*gdal_polygonize.py:*
gdal_polygonize.py rainfXYZ.txt rainf.shp Creating output rainf.shp of
format GML.
0...10...20...30...40...50...60...70...80...90...100 - done.
unfortunately I am unable to load created file (even if I change the extension to .gml)
ogr2ogr tool don't even recognize this format.
yes - sorry I forgot to add such information.
In general after preparing CSV file (using gdal2xyz.py with -csv option),
I need to add one line at begining of it:
"Longitude,Latitude,Value" (without the quotes)
Then I need to create a VRT file which contain
*> <OGRVRTDataSource>
> <OGRVRTLayer name="Shapefile_name">
> <SrcDataSource>Shapefile_name.csv</SrcDataSource>
> <GeometryType>wkbPoint</GeometryType>
>
> <GeometryField encoding="PointFromColumns" x="Longitude"
> y="Latitude"/>
> </OGRVRTLayer> </OGRVRTDataSource>*
Run the command "ogr2ogr -select Value Shapefile_name.shp Shapefile_name.vrt". I got the file evap_OBC.shp and two other associated files.
For the sake of archive completeness, this question has also been asked on GDAL mailing list as thread save raster as point-vector file. It seems Chaitanya provided solution for it.