I have a flow that receives an XML request.
I then call a jdbc outbound endpoint performing a query against an Oracle database.
The result of the query is then transformed to XML using an xquery transformer and sent back.
The sql from the database returns at most 50 000 rows, but the XML file created by the xquery transformer has 60 lines per row resulting in a very large XML file (15-100 MB).
Mule is taking a very long time "mapping/creating" the XML file and I am wondering if I can speed up the process somehow or if I have to rethink my approach.
Regards,
Magnus
Zorba provides a JDBC connector and streaming capabilities: http://www.zorba-xquery.com/
It might be just what you are looking for.
Directly from Mule's documentation:
Efficient Transformations with DelayedResult
Mule contains a special XML output format called DelayedResult. This format allows very efficient XML transformations by delaying any XML serialization until an OutputStream is available.
For example, here is an XSLT transformer set up to use DelayedResult:
<mxml:xslt-transformer name="transform-in"
xsl-file="xslt/transform.xslt"
returnClass="org.mule.module.xml.transformer.DelayedResult"/>
If the result of this transformation were being sent to an HTTP client, the HTTP client would ask Mule for an OutputHandler and pass in the OutputStream to it. Only then would Mule perform the transformation, writing the output directly to the OutputStream.
If DelayedResult were not used, the XML result would first be written to an in-memory buffer before being written to the OutputStream. This will cause your XML processing to be slower.
So it makes more sense to use the XSLT transformer instead of the XQuery one.
Related
Given a Finance & Operations environment, the environmentdomain.com/data/EntityName URL allows OData data retrieval in JSON format.
Is there a way to give extra query string parameters to download the results as CSV using HTTP GET only?
A workaround is described here, however it has more overhead for ad hoc situations.
Unfortunately, the supported features from the OData specification for the D365FO OData requests do not support the system query option $format.
So no, as far as I can tell, there is no query string parameter that would return the HTTP GET request response in csv format.
Additional workarounds
Since the questions mentions a workaround that has some overhead for ad hoc situations, here are two more suggestions how the response can be changed to CSV format with less overhead.
Postman
Postman is often used for ad hoc testing of the D365FO OData API. Convert a JSON reponse to CSV describes how a Javascript test can be added to a Postman request to convert the JSON response to CSV format and write it to the console.
PowerShell
The Invoke-RestMethod cmdlet can be used to send HTTP GET requests to the D365FO API. The result can then be used with the Export-Csv cmdlet to create a CSV file.
I strongly recommend you use the d365fo.integrations PowerShell module written by #splaxi specifically to interact with the D365FO OData API instead of Invoke-RestMethod. The Get-D365ODataEntityData can be used to send an HTTP GET request.
How can I use a data source that is just a plain HTTP data source? I.e. https://cnlohr.com/data_sources/ccu_test where it's just a number?
I could potentially wrap it in JSON, but I can't find any basic JSON, REST, or raw HTTP data source for Grafana Connect.
Ah! Apparently the CSV Plugin here DOES work. I just had to re-create it a few times to get around the internal server error: https://grafana.com/grafana/plugins/marcusolsson-csv-datasource/
Once added to your system, add it as a new integration/connection. Be sure to make each query only output one number (you will need multiple queries, one for each column). Then you can save each as a recorded query.
I'm looking for use-cases for using reactive streams within a servlet container (or just a HTTP server).
The Jetty project has started being asked: "is Jetty reactive?" and we've noticed the proposal to add reactive streams to java 9.
So we've started some experiments with using the reactive streams API for async servlet IO, which are interesting enough..... but lack any focus because we lack real use-cases to focus which concerns are most important.
So does anybody have any good use-cases that they could share/explain so that we can direct our jetty experiments to meet their needs. The sort of thing I've imagined is having a RS based database publisher sending objects all the way out on a HTTP response or websocket connection using Flow.Processors for the conversions along the way.
A viable use case is when consuming the POSTing of multi-part form data, particularly when uploading files.
The Typesafe ConductR project (disclaimer: I'm the Tech Lead for it), receives multi-part form data when a user loads a bundle. We use akka-streams/http.
We read off the first two parts of the stream as our protocol specifies that they must declare some meta data in order for us to know which node to write the bundles to. After some validation, we then determine the node to write them to, and connect the partially consumed stream. Thus the node that receives the request to upload the bundle negotiates which node it is going to write it to, while not having to consume the entire stream (which could be 200MB) and then write it out again.
Writing out multi-part form data is also a great use-case given that you can stream the file from disk as a source and pass it on to some http endpoint i.e. the client-side of what I describe above.
The benefits with both use-cases are that you minimise the amount of memory needed to move bytes over a network, and you only perform file IO where it is necessary.
I have some data which is obtained from an API which I display via a master-detail web page. The data I receive from the API is in JSON format and I currently cache a serialised version of this to disk. All files are stored in a single folder. The file is used for a maximum of 1 week as new content is released every week. There can be up to a maximum of 40,000 files. Each file is about 12kb and a guid is used as the filename.
What is the best caching strategy?
Keep as is.
Store the raw JSON instead of serialised data.
Replace the disk caching solution with a NoSQL solution like Redis.
Organise the files into folders
Use faster serialization / deserialization techniques
If you have huge RAM then in order to retrieve the data faster you can avoid serialization and de serialization and keep the data directly in Redis as key value pair.
I've built an OData endpoint using a generic .ashx handler to run some SQL against a SQL Server database and format the payload using ODataLib 5.6. It appears to work as expected and I can view the results in a browser and can import the data into Excel 2013 successfully using the From OData Data Feed option on the Data ribbon.
However, I've noticed that Excel is actually issuing two GET requests when inspecting the HTTP traffic in Fiddler. This is causing some performance concerns since the SQL statement is being issued twice and the XML feed is being sent across the wire twice. The request headers look identical in both requests. The data is not duplicated in Excel. Is there a way to prevent the endpoint from being called multiple times by Excel? I can provide a code snippet or the Fiddler trace if needed.
My suggestion would be to use Power Query for this instead of ADO .Net.
The reason of raising the "duplicated" calls is that ADO .Net is not aware enough to identify the data at the first time. So it gets the schema back first, knowing the details about the data, and it can get and recognize the real data back with the second call. The first call is through the ADO.NET Provider GetSchema call, but that particular provider determines the schema by looking at the data.