Interpreting edi mscons files - r

I have en EDI file in mscons format. I am trying to parse the file in R and save it as a csv file. However, I do not have any good explanation how to proceed. Anyone out there worked with these sort of files?
Example:
UNA:+.? '
UNB+UNOC:3+7080005046091:14:TIMER+102953452626:82:TIMER+140312:2152+XGATE019452198++++1'
UNH+1+MSCONS:D:96A:ZZ:E2NO6A'BGM+7+1488136+9+NA'
DTM+137:201403121751:203'DTM+163:201403030000:203'
DTM+164:201403092400:203'DTM+ZZZ:1:805'
NAD+FR+7080005046053::9+++++++NO'
NAD+DO+953452626:NO3:82+++++++NO'UNS+D'
NAD+XX'LOC+90+707057500071137750::9'
RFF+MG:97645'RFF+LI:22446237_17506927'
LIN+1++1491:::SM'MEA+AAZ++KWH'QTY+136:1'
DTM+324:201403030000201403030100:Z13'QTY+136:1'
DTM+324:201403030100201403030200:Z13'QTY+136:2'
DTM+324:201403030200201403030300:Z13'QTY+136:1'
DTM+324:201403030300201403030400:Z13'QTY+136:1'
DTM+324:201403030400201403030500:Z13'QTY+136:2'
DTM+324:201403030500201403030600:Z13'QTY+136:1'
DTM+324:201403030600201403030700:Z13'QTY+136:1'
DTM+324:201403092300201403092400:Z13'CNT+1:167181'
UNT+6832+1'UNZ+1+XGATE019452198'

Download this application to start: EDI Notepad
Open your EDIFACT file in this tool. This will help you with context. What each segment / element is. It should also help give you context related to qualifiers and envelopes in the documents. You should find the source of the document and get an implementation guide, which will also explain their specific usage.
Once you apply context and understand what the elements are, parsing becomes easy. You can write your own parser, use an open source product like BOTS (mentioned in the comments above, or purchase commercial translation software (hundreds available).

The elements within the MSCONS file are well documented. See here: http://www.edi-energy.de - the latest description (in German) is available here: http://www.edi-energy.de/files2/MSCONS_2_2b_Fehlerkorrektur_2014_02_27.pdf

Related

Converting Docx to PDF using .Net Core [Open Source]

I'm looking for a plugin which can convert word (docx / doc) to pdf Without Microsoft.Office.Interop and Open Source one. There are questions asked on it but no solution is provided or I didnt found any.
Any suggestion or references will be much appreciated!.
You could do this using Aspose.Words project, however this library is not an opensource (license is required and cost some money): https://blog.aspose.com/2020/01/02/convert-word-doc-docx-to-pdf-in-csharp-net-core/
On our project we needed to keep formating as close as the original. But every plugin we tried never came close to the original.
We opted for I Love Pdf utilities.
Word to PDF
They have a well documented API for some language (including .Net) and it works great.
You can process 250 files freely every month and if you need more, it's not that expensive.
Hope this helps

Open a XML file not knowing the complete name and parse xml

I am using robot framework with RIDE, and for a test I need to find a XML file on my computer and open it to parse the xml and be able to use the datas.
The thing is that I don't know the exact name of the file; the format is numberNameOfTheFile, so it could be 1NameOfTheFile or 25NameOfTheFile.
How can I use regexp in my keyword? Or any other way to achieve this?
Thank you
How would you do it manually - how would you pick the file to use for the verification?
I presume, you are going to look at all the files that are matching a specific name pattern; in Robot Framework you can do that with OperatingSystem's List Files In Directory keyword, which supports passing a name pattern:
${the files}= List Files In Directory /the/path/to/the/dir *NameOfTheFile.xml
Now you have a list object with the filenames that match; if it's empty - there's no such file, which may be a problem (depends on your test/reqs, I don't know). If it has a single member - great, that's your file.
And if there are multiple files - that's another "problem". How would you pick the right file manually? It could be that the newest file is the target one - for that you would go over all of them and find the one through OperatingSystem's Get Modified Tume; or it can be the largest; or the number in its suffix would be the biggest. This really depends on your requirements, and what you are trying to achieve.
"How would you do it manually" is probably the most important question to ask. Think and break down to steps the individual tasks you would do, and now you have the algorithm; see how to put that in code - and presto, the implementation. This applies to scripts, test cases, and business process automation (e.g. software).
I was tempted to mark the question for closing, because precisely this - the algorithm - was missing, only the end goal is stated - while SO is for helping in the implementation part. But, here we are :)

Convert RTF to PDF on a rule into Alfresco

I found a link (http://wiki.alfresco.com/wiki/Content_Transformations) that says that i need to create a file named my-transformers-context.xml and put my configurations there to convert RTF to PDF...
There says that some configuration are already configured but this one (RTF to PDF) and some others (DOC to PDF) are not.
By the way i couldn't find how to create this xml with the right configuration to convert the RTF file into a PDF...
Someone already done something like this? or someone know a link that explain how to configure this xml file?
PROBLEM SOLVED!!!!
I don't know if there is a way to say that i've solved the problem... But here it goes the solution...
I saw what Gagravarr said and started looking for configuration of openoffice into alfresco...
There is a file named:
alfresco-global.properties
and there is two variables named:
ooo.exe
and
ooo.enabled
the first one must indicate the path to sopenoffice.exe
and the second one must be equal to true...
ooo.enabled = true
That solve a lot of problema to convert some kind of file to another... like RTF to PDF...
Out of the box, Alfresco should be able to transform a RTF file to a PDF using OpenOffice (direct or JodConverter, depending on if you're on Community or Enterprise)
Assuming you're on a new enough Alfresco, this webscript will tell you what transformations are available from and to RTF:
http://localhost:8080/alfresco/service/mimetypes?mimetype=application/rtf#application/rtf
If that doesn't show you RTF -> PDF, then you need to look at your open office configuration/setup

Is it possible to read music file metadata using R?

I've got a bunch of audio files (let's say ogg or mp3), with metadata.
I wish to read their metadata into R so to create a data.frame with:
file name
file location
file artist
file album
etc
Any way you know of for doing that ?
You take an existing mp3 or ogg client, look at what library it uses and then write a binding for said library to R, using the existing client as guide for that side -- and something like Rcpp as a guide on the other side to show you how to connect C/C++ libraries to R.
No magic bullet.
A cheaper and less reliable way is to use a cmdline tool that does what you want and write little helper functions that use system() to run that tool over the file, re-reading the output in R. Not pretty, not reliable, but possibly less challenging.
Possible, yes, easy, no.
You "could" use a combination of readChar and/or readBin on the file and parse out the contents. This would be highly dependent, though, on parsing the frame tags from the raw bytes of the ID3v2 tag (and mind you it would change if it was a version 1 tag). If would certainly be a lot of work to implement a straight R solution. Take this Python code for example, it's very clean straight python code but a lot of branching and parsing.
You can use exiftool with system command available in R. Optionally, you can create regexp to handle the fields you need... If I were you, I'd stick with Dirk's advice (as usual) =)!
Out here in 2021, I wanted to do this so I did the following...
Create a new playlist while in 'songs' view.
Select all songs and drag to the new playlist. Highlight that playlist
File> Library>Export Playlist. My default file was to save as .txt, if not, designate.
Open Excel to save as csv or read.delim() in r as the txt file is tab-separated
import to R

ASP.NET library to extract plain text from Open XML file formats

Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?
I require this to populate a lucene.net index.
I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?
Before spending cash, it may be worth looking at the IFilter interface - these were/are designed to do exactly what you want.
http://msdn.microsoft.com/en-us/library/ms691105
http://www.codeproject.com/KB/cs/IFilter.aspx
(Some links at the bottom of the codeprject link).
MS provide IFilters for office file types.
http://www.microsoft.com/downloads/details.aspx?familyid=60c92a37-719c-4077-b5c6-cac34f4227cc&displaylang=en
I know that we use this technology to allow us to index PDFs using Lucene but I did not write the actual code and cannot be of much use I am afraid.
If your Google-fu is strong I am sure you can dig up more examples of using IFilters to do exactly what you want.
watch aspose.com, they have a good library to handle both ppt and pptx.
You can try Toxy, an open source text/data extraction framework for .NET. For now, it supports xls, xlsx, doc, docx. It will support pptx in version 1.5 very soon.
For detail, you can check here

Resources