Howto speed up the Saxon doc() function? - xquery

I have a 12mb XML file which I am accessing from within an xquery. The file is loaded something like this;
let $t := doc('file:///C:/foo/bar/file12mb.xml')
The code is taking about 950ms to execute.
How can the XML document be loaded faster?
Once the xml file is loaded and parsed, the body of the xquery only takes a few milliseconds to run, so I'm trying to speed up the initial loading and parsing of the xml file which is taking the majority of the execution time.
Is there any way for Saxon to persist an xml document after it has been parsed?
Ideally I would like to persist the xml data file somehow but Saxon seems to be designed purely as an xml processor not an xml database.
Would a Schema help?
The xml file does not currently have a schema associated with it. The Saxon documentation implies that having a schema speeds up query execution but slows down the initial loading and parsing of the xml data, so I haven't tried creating a schema.
Any suggestions gratefully received.
Versions
java version "1.6.0_26"
Saxon-B version 9.1.0.8

That sounds pretty fast for parsing a 12MB file. I don't think you can optimize that, and no, Saxon is not a database.
In MarkLogic the parsing of the XML only ever happens once: during ingest. In other databases, such as Oracle, that may or may not be the case, depending on how you load it.

Related

Can I compile an XQuery into a package using Saxon - or - how to minimize compile times

I'm using the dotnet Saxon9ee-api.
I have a very large schema (180,000 lines), and a schema aware XQuery.
When I compile it, it understandably takes several seconds to compile it. That's life.
But is there a way that I can compile it once, and serialise it to disk as a compiled entity? So that I can load it again later and use it?
(The XSLT compiler allows me to compile into XsltPackages, that I'm pretty sure will let me do this with XSLT).
There's no filestore format for compiled XQuery code in Saxon (unlike XSLT), but there is a filestore format for compiled schemas (the SCM format) and this may help. However, loading a schema this large will not be instantaneous.
Note that the compile time for XSD schemas can be very sensitive to the actual content of the schema. In particular, large finite bounds can be very costly (example maxOccurs="1000"). This is due to the algorithms used to turn a grammar into a finite state machine. Saxon optimises the textbook algorithm for some cases, but not for all. The finite state machine is held in the SCM file, so you won't incur the cost if loading from an SCM file; however, the FSMs that are expensive to compute also tend to be very large, so if you're in this situation, the SCM is going to be big and therefore slower to read.

CSV to Json on a windows server

I'm a front end web developer and I'm trying to convert a CSV file to Json on the server. The server is a windows server with asp.net on it. I thought this would be a simple thing to do but after googling around for a few hours I see it's a bit harder then I original thought. I see there are a bunch of converters online that you can paste your code into and do a one time conversion but these will not work for me because my CSV file will be updated by another program. Does anyone know how to do this conversion using .net or know of a good solution?
Take a look at the FileHelpers.
Use it to read your CSV.
The you should be able to use Json.Net to output Json.

Best way to handle and deploy XQuery stored procedures?

Is there a tool for deploying things into exist? if I've got a bundle of, say, schemas and XQuery stored procedures? Is there a way of, say, bundling those into a zip or tar file and uploading them or deploying them into eXist?
Alternatively what is the best way of storing these things in a version controlled way (in a git repo, say) and deploying them to the eXist server? Ideally, it'd be nice to be able to have a simple script in a scripting language so you can simply call "deploy.py" or whatever and it'd take everything from the repository and load it into the XML database.
The EXpath packaging system specifies a format for generating a ZIP file with XQuery procedures (and other content) and deploying it into multiple XQuery databases.
See the specification. You should be able to use the Python zipfile module to generate these if you're inclined to use Python (though personally, I do so from a makefile).
Unfortunately, the process for checking currently installed package versions to upgrade if necessary is not standardized; I have a solution for BaseX, but nothing for eXist immediately at hand. However, eXist's implementation is well-documented, and you should have little trouble working with it.

Biztalk flat file disassembler

Can anyone tell me.....
When a .txt file is picked up in a pipeline with a flat file disassembler, if the correct schema is used in the pipeline does the .txt file come out the other end of the pipeline as xml or is something else needed to get to this stage?
I'm kind of new to Biztalk so apologies if this is worded wrongly or not making much sense.
That is basically how it works yes. Actually the end result is an xml message delivered from the pipeline into the BizTalk message box database. From this point other processes can pick up the message and process it.

Caching an XDocument object. Is it worth it?

I am loading an XML file in my local server (not a remote server) using:
XDocument.Load(path_to_xml_file);
This file is 500KB. I am wondering if I should cache the XDocument instead of reading the file every time. Thank you for the guidance.
Depends on how often you are going to need it. Consider that apart from the loading time there is also a parsing time which occurs every time you load it from disk.
If your file is not willing to change very often you can put your XDocument in the cache, and defining a file dependency on the file itself, so that the cache is invalidated everytime your document changes. There is an example for this in MSDN

Resources