XML TO CSV USING MAPPING NODE - dfdl

is it possible to convert xml data to csv using DFDL with MAPPING NODE

Sure - just set the output domain to DFDL. The conversion to CSV is not actually performed by the Mapping node. The input message tree is constructed from the XML document by the XMLNSC parser, and the output CSV is constructed by the DFDL parser from the message tree and the model. The role of the Mapping node is to construct the output message tree.

Related

is there a library that can generate csv files given a data dictionary and data model in some format

is there a library in any language that can generate .csv files for each entity of the data model that complies with a data dictionary.
For example:
data dictionary is specified in a csv file with these column names - field,regex,description
data model is specified in another csv file with these column names - entity,field
faker comes very close however it needs some programming to work for a data model. If there is a wrapper around faker, that might work great I suppose.

Is there a way to avoid creating a temporary file in R?

I have a database in which VCF files have been inserted as a blob variable. I am able to retrieve it without issue. However, I then need to pass it to some various functions (VariantAnnotation, etc.) that expect a VCF file name. Is there a way to "fake" a file object to pass to these functions if I already have all the data in a character string?
I'm currently writing it out to a file so I can pass it on:
#x contains the entire vcf file as a character string
temp_filename = tempfile(fileext = ".vcf")
writeChar(x, temp_filename)
testVcf = readVcf(temp_filename)
unlink(temp_filename)
This works ok, but I would like to avoid the unnecessary file I/O if possible.

Retrieving data from large xml file using node path in R

I am new to xml, and many xml nodes I found are not the same as my file. I want to extract data from large xml file using R (dummy xml file is below). I know even though R has memory limitation, extract specific nodes from large xml file is possible using xmlEventParse() from r XML package. properly naming file path to reach my target data. My final output in form of dataframe should have columns that reflects these nodes N9:Shareholder, N5:IdentifierElement, N2:NameElement. Thanks for your help.
XML code
FOO LIMITED
120801
Companies Register

xdmp:document-load Xquery Command

When I ingest a csv file containing multiple xml records using mlcp, I use an options file to change the desired ML output from one csv document into multiple xml documents. How do I script this using xdmp:document-load command within the query console?
I don't think xdmp:document-load provides an option for that. Instead, use xdmp:document-get, split with XPath, then xdmp:document-insert.

xml schema in R

I am quite a newbie with xml. I used XML in R to parse content in xml and put into R objects. I have to deal with nearly 1TB xml data and it took me around 5 hours to parse 2.4 GB data. I know that xmlschema is used to generate xml. I wonder if there is any better method to convert xml to data or another method to use xmlschema to read xml and put values back into raw data other than xmlParse?
I now have 5 xmlschema and xml. (I thought it is complex xml)
xmlns:nxce="http://tfm.faa.gov/tfms/NasXCoreElements"
xmlns:mmd="http://tfm.faa.gov/tfms/MessageMetaData"
xmlns:nxcm="http://tfm.faa.gov/tfms/NasXCommonMessages"
xmlns:idr="http://tfm.faa.gov/tfms/TFMS_IDRS"
xmlns:xis="http://tfm.faa.gov/tfms/TFMS_XIS"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://tfm.faa.gov/tfms/TFMS_XIS
sample data: http://www.fly.faa.gov/ASDI/asdidocs/asdi_sample_data.zip
I want to extract all flightManagementInfomation data out using SAX
Thanks in advance.
Schemas use won't improve the performance of XML loading - they tell you something about the expected structure of the parsed XML, but have nothing to do with the parsing process itself.
You need to use a different parser - if one is available in R (as suggested by Martin), or convert the XML data into something that R can handle more easily using some other language

Resources