I found a link (http://wiki.alfresco.com/wiki/Content_Transformations) that says that i need to create a file named my-transformers-context.xml and put my configurations there to convert RTF to PDF...
There says that some configuration are already configured but this one (RTF to PDF) and some others (DOC to PDF) are not.
By the way i couldn't find how to create this xml with the right configuration to convert the RTF file into a PDF...
Someone already done something like this? or someone know a link that explain how to configure this xml file?
PROBLEM SOLVED!!!!
I don't know if there is a way to say that i've solved the problem... But here it goes the solution...
I saw what Gagravarr said and started looking for configuration of openoffice into alfresco...
There is a file named:
alfresco-global.properties
and there is two variables named:
ooo.exe
and
ooo.enabled
the first one must indicate the path to sopenoffice.exe
and the second one must be equal to true...
ooo.enabled = true
That solve a lot of problema to convert some kind of file to another... like RTF to PDF...
Out of the box, Alfresco should be able to transform a RTF file to a PDF using OpenOffice (direct or JodConverter, depending on if you're on Community or Enterprise)
Assuming you're on a new enough Alfresco, this webscript will tell you what transformations are available from and to RTF:
http://localhost:8080/alfresco/service/mimetypes?mimetype=application/rtf#application/rtf
If that doesn't show you RTF -> PDF, then you need to look at your open office configuration/setup
Related
I'm struggling to look for a way to convert an epub file into a txt file using ebook-convert cli, not as a whole, but I need to convert only one certain page.
I'm reading the official document, but I can't see any option which enables you to pick up one page from epub file and generate txt file from it.
If you shed some lights on it, I would appreciate it.
I have en EDI file in mscons format. I am trying to parse the file in R and save it as a csv file. However, I do not have any good explanation how to proceed. Anyone out there worked with these sort of files?
Example:
UNA:+.? '
UNB+UNOC:3+7080005046091:14:TIMER+102953452626:82:TIMER+140312:2152+XGATE019452198++++1'
UNH+1+MSCONS:D:96A:ZZ:E2NO6A'BGM+7+1488136+9+NA'
DTM+137:201403121751:203'DTM+163:201403030000:203'
DTM+164:201403092400:203'DTM+ZZZ:1:805'
NAD+FR+7080005046053::9+++++++NO'
NAD+DO+953452626:NO3:82+++++++NO'UNS+D'
NAD+XX'LOC+90+707057500071137750::9'
RFF+MG:97645'RFF+LI:22446237_17506927'
LIN+1++1491:::SM'MEA+AAZ++KWH'QTY+136:1'
DTM+324:201403030000201403030100:Z13'QTY+136:1'
DTM+324:201403030100201403030200:Z13'QTY+136:2'
DTM+324:201403030200201403030300:Z13'QTY+136:1'
DTM+324:201403030300201403030400:Z13'QTY+136:1'
DTM+324:201403030400201403030500:Z13'QTY+136:2'
DTM+324:201403030500201403030600:Z13'QTY+136:1'
DTM+324:201403030600201403030700:Z13'QTY+136:1'
DTM+324:201403092300201403092400:Z13'CNT+1:167181'
UNT+6832+1'UNZ+1+XGATE019452198'
Download this application to start: EDI Notepad
Open your EDIFACT file in this tool. This will help you with context. What each segment / element is. It should also help give you context related to qualifiers and envelopes in the documents. You should find the source of the document and get an implementation guide, which will also explain their specific usage.
Once you apply context and understand what the elements are, parsing becomes easy. You can write your own parser, use an open source product like BOTS (mentioned in the comments above, or purchase commercial translation software (hundreds available).
The elements within the MSCONS file are well documented. See here: http://www.edi-energy.de - the latest description (in German) is available here: http://www.edi-energy.de/files2/MSCONS_2_2b_Fehlerkorrektur_2014_02_27.pdf
A Question in my mind Is it possible to convert Postscript(PS) File Into Word(doc) file using Asp.Net? If Yes then how can we resolve it via C# Code.
I don't know of any tool which will convert PostScript to word. Not only that, but you certainly can't reliably do anything except render the whole thing to an image, and isert that as a graphic.
Up to a point you can extract text, what is it you actually want to do ?
Can you use PurePDF to view files or is the api only for writing them?
Based on the PurePDF Project Page, reading and extracting information from PDFs is supported:
read existing pdf documents (extract strings, streams, images and all the informations from them). See HelloWorldReader.as for an example
However, if you're looking to view / rasterize a PDF, that's a much more complicated task and doesn't look like it's supported as part of PurePDF.
I suggest converting the PDF into a swf file. There are a number of projects out there (including free / open source) that convert pages into SWF files, including being able to still extract the text. :D
It looks like you can either navigate to the url of the PDF (maybe in an HTML component?) , OR a richer solution might be to use the open source flex paper : http://flexpaper.devaldi.com/
Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?
I require this to populate a lucene.net index.
I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?
Before spending cash, it may be worth looking at the IFilter interface - these were/are designed to do exactly what you want.
http://msdn.microsoft.com/en-us/library/ms691105
http://www.codeproject.com/KB/cs/IFilter.aspx
(Some links at the bottom of the codeprject link).
MS provide IFilters for office file types.
http://www.microsoft.com/downloads/details.aspx?familyid=60c92a37-719c-4077-b5c6-cac34f4227cc&displaylang=en
I know that we use this technology to allow us to index PDFs using Lucene but I did not write the actual code and cannot be of much use I am afraid.
If your Google-fu is strong I am sure you can dig up more examples of using IFilters to do exactly what you want.
watch aspose.com, they have a good library to handle both ppt and pptx.
You can try Toxy, an open source text/data extraction framework for .NET. For now, it supports xls, xlsx, doc, docx. It will support pptx in version 1.5 very soon.
For detail, you can check here