XQuery (saxon) failing with a schema (XPath works) - xquery

I switched in saxon from XPath to XQuery and on the selects where I have a schema I'm getting the error message:
A typed input document can only be used with a schema-aware query
My setup is:
InputSource xmlSource = new InputSource(xmlData);
SAXSource saxSource = new SAXSource(reader, xmlSource);
Source schemaSource = new StreamSource(schemaFile);
Configuration config = createEnterpriseConfiguration();
config.addSchemaSource(schemaSource);
Processor processor = new Processor(config);
SchemaValidator validator = new SchemaValidatorImpl(processor);
DocumentBuilder doc_builder = processor.newDocumentBuilder();
if(!preserveWhiteSpace)
doc_builder.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL);
doc_builder.setSchemaValidator(validator);
XdmNode root_node = doc_builder.build(saxSource);
XQueryCompiler compiler = processor.newXQueryCompiler();
Is there something additional I need to do on queries where there is a schema?
thanks - dave

Call XQueryCompiler.setSchemaAware(true);
This isn't the default because it's good for the optimizer to know whether the data is likely to be typed or untyped, and it's inefficient to generate schema-aware code if the data is untyped (conversely, when the data is typed, schema-aware code is typically faster -- though the savings can be eaten up by the extra cost of validating the input).

Related

How to create many Bokeh figures with multiprocessing?

I would like to speed up figure generation in Bokeh by multiprocessing:
jobs = []
for label in list(peakLabels):
args = {'data': rt_proj_data[label],
'label': label,
'tools': tools,
'colors': itertools.cycle(palette),
'files': files,
'highlight': highlight}
jobs.append(args)
pool = Pool(processes=cpu_count())
m = Manager()
q = m.Queue()
plots = pool.map_async(plot_peaks_parallel, jobs)
pool.close()
pool.join()
def plot_peaks_parallel(args):
data = args['data']
label = args['label']
colors = args['colors']
tools = args['tools']
files = args['files']
highlight = args['highlight']
p = figure(title=f'Peak: {label}',
x_axis_label='Retention Time',
y_axis_label='Intensity',
tools=tools)
...
return p
Though I ran into this error:
MaybeEncodingError: Error sending result: '[Figure(id='1078', ...)]'. Reason: 'PicklingError("Can't pickle at 0x7fc7df0c0ea0>: attribute lookup ColumnDataSource. on bokeh.models.sources failed")'
Can I do something to the object p, so that it becomes pickleable?
Individual Bokeh objects are not serializable in isolation, including with pickle. The smallest meaningful unit of serialization in Bokeh is the Document, which is a specific collection of Bokeh objects guaranteed to be complete with respect to following references. However, I would be surprised if pickle works with Document either (AFAIK you are the first person to ask about it since the project started, it's never been a priority, or even looked into that I know of). Instead, I would suggest if you want to do something like this, to use Bokeh's own JSON serialization functions, such as json_item:
# python code
p_serialized = json.dumps(json_item(p))
This will properly serialize p in the context of the Document it is a part of. Then you can pass this to your page templates to display with the Bokeh JS embed API:
# javascript code
p = JSON.parse(p_serialized);
Bokeh.embed.embed_item(p, "mydiv")

How to modify the avro key/value schema in a RDD map transformation

I'm trying to migrate some Hadoop Map Reduce code to Spark and I have doubts about how to manage map and reduce transformations when the schema of either the key or value change from input to output.
I have avro files with Indicator records that I want to process somehow. I already have this code that works:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[Indicator]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, Indicator.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val indicatorsRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[Indicator]],
classOf[AvroKey[Indicator]],
classOf[NullWritable])
val myRecordOnlyRdd = indicatorsRdd.map(x => (doSomethingWith(x._1), NullWritable.get)
val indicatorPairRDD = new PairRDDFunctions(myRecordOnlyRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)
But this code works since the schema of the input and ouput keys does not change, is always Indicator. In hadoop Map Reduce you can define a map or reduce functions and modify the schema from input to output. In fact, I have map functions which process every Indicator record and generates a new record SoporteCartera. How can I do this in spark? It is possible from the same RDD or I have to define 2 different RDDs and pass from one to another somehow?
Thanks for your help.
To answer my own question... the problem was that you cannot change the RDD type, you must define a different RDD, so I solved it with the above code:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[SoporteCartera]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, SoporteCartera.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val soporteCarteraRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[SoporteCartera]],
classOf[AvroKey[SoporteCartera]],
classOf[NullWritable])
val indicatorsRdd = soporteCarteraRdd.map(x => (fromSoporteCarteraToIndicator(x._1), NullWritable.get))
val indicatorPairRDD = new PairRDDFunctions(indicatorsRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)

Exploding a string ASP

I have the following which is returned from an api call:
<WORST>0</WORST>
<AVERAGE>93</AVERAGE>
<START>1</START>
I need to parse this to just give me the <AVERAGE></AVERAGE> number, 93.
Here's what I'm trying but get error detected:
res = AjaxGet(url)
myArray = split(res,"AVERAGE>")
myArray2 = split(myArray[1],"</AVERAGE>")
response.write myArray2[0]
I'm brand new to ASP, normally code in PHP
VBScript doesn't recognise square brackets [] when accessing Array elements and will produce a Syntax Error in the VBScript Engine.
Try making the following changes to the code snippet to fix this problem;
res = AjaxGet(url)
myArray = split(res,"AVERAGE>")
myArray2 = split(myArray(1),"</AVERAGE>")
response.write myArray2(0)
On a side Note:
Parsing XML data in this way is really inefficient if the AjaxGet() function returns an XML response you could use the XML DOM / XPath to locate the Node and access the value.

xpath query not working in BizTalk orchestration

I'm trying to rewrite a BizTalk 2010 application and do away with an external assembly, but I seem to be running into xpath problems.
We have a process that stores a healthcare claim (837P) as xml in the database, and we need to extract it later. I have a WCF port calling a stored procedure that returns an xml message that looks something like this:
<ClaimXml_SEL_GetClaimXmlResponse xmlns="http://schemas.microsoft.com/Sql/2008/05/TypedProcedures/dbo">
<StoredProcedureResultSet0>
<StoredProcedureResultSet0 xmlns="http://schemas.microsoft.com/Sql/2008/05/ProceduresResultSets/dbo/ClaimXml_SEL_GetClaimXml">
<Claim><![CDATA[<ns0:X12_00401_837_P (etc.)
So what I need to do is extract the actual 837P message - the part that starts with ns0:X12_00401_837_P.
The helper class is very simple, just has a method like this:
public XmlDocument ExtractClaimXml(XmlDocument xDoc)
{
XmlDocument xReturn = new XmlDocument();
XmlNode node = xDoc.SelectSingleNode("/*[local-name()='ClaimXml_SEL_GetClaimXmlResponse' and namespace-uri()='http://schemas.microsoft.com/Sql/2008/05/TypedProcedures/dbo']/*[local-name()='StoredProcedureResultSet0' and namespace-uri()='http://schemas.microsoft.com/Sql/2008/05/TypedProcedures/dbo']/*[local-name()='StoredProcedureResultSet0' and namespace-uri()='http://schemas.microsoft.com/Sql/2008/05/ProceduresResultSets/dbo/ClaimXml_SEL_GetClaimXml']/*[local-name()='Claim' and namespace-uri()='http://schemas.microsoft.com/Sql/2008/05/ProceduresResultSets/dbo/ClaimXml_SEL_GetClaimXml']");
xReturn.LoadXml(node.InnerText);
return xReturn;
}
and then the Message Assignment shape has this code:
rawClaimXml = ClaimXmlResponse;
strippedClaim = XmlHelperClass.ExtractClaimXml(rawClaimXml);
Claim837P = strippedClaim;
...where ClaimXmlResponse; is the message shown above, Claim837P is an 837P message, and rawClaimXml & strippedClaim are xml variables. This works just fine, but it seems excessive to call an external assembly.
I tried this in the assingment shape:
rawClaimXml = xpath(ClaimXmlResponse, "same xpath as above");
strippedClaim.LoadXml(rawClaimXml.InnerText);
Claim837P = strippedClaim;
...but get the error "'UnderlyingXmlDocument.InnerText': .NET property is write-only because it does not have a get accessor".
So then I tried just getting a string from the xpath query:
rawClaimString = xpath(ClaimXmlResponse, "string(same xpath as above)");
rawClaimString = rawClaimString.Replace("<![CDATA[", "");
rawClaimString = rawClaimString.Replace(">]]>",">");
strippedClaim.LoadXml(rawClaimString);
Claim837P = strippedClaim;
...but that's no good. Also tried a variant:
rawClaimXml = xpath(ClaimXmlResponse, "same xpath as above");
rawClaimString = rawClaimXml.InnerXml.ToString();
rawClaimString = rawClaimString.Replace("<![CDATA[", "");
rawClaimString = rawClaimString.Replace(">]]>",">");
strippedClaim.LoadXml(rawClaimString);
Claim837P = strippedClaim;
...but still no good. Any suggestions?
Thanks!
1-
Here's a couple of things you can try:
Wrap the xpath in the string() function. xpath(ClaimXmlResponse,
"string(same xpath as above)");
Append the /text() node to the xpath. xpath(ClaimXmlResponse, "same
xpath as above/text()");
A combination of the two.
Can you elaborate on the goal here? There's nothing wrong with using the helper class. If it's the extra Assembly that's bothering you, you can always add the .cs to the BizTalk Project.
2-
Coming from a different direction, you can use Path option for the Inbound BizTalk message body on the Messages Tab of the WCF-Custom Adpater configuration.
I was also facing the similar issue but when I gone through your various solution I got the solution for my question.
For me this worked **
rawClaimString = xpath(ClaimXmlResponse, "string(same xpath as
above)");
**
thanks for that phew ;)
Coming to the solution for your problem you can distinguishly promote the node that holding your response and try to access that node using .notation and assign it to the sting this ll return the expected output to you :)

storing data in a xml format file

I m trying to build a function that will retrieve 'some' SQL data from multiple tables, and store it in a file in the XML format.
If I do it with C#, is it as simple as:-
SQL statement,
retrieve data,
store data in a string list,
and then WriteXml (xmlFile, variable where data is stored) ??
Can any one show me an example?
I was looking at:-
WriteXml () and WriteXmlSchema() functions in C#
string xmlFile = Server.MapPath("Employees.xml");
ds.WriteXml(xmlFile, XmlWriteMode.WriteSchema);
Also, will xmlSerialization be something I need to take a look at?
Sample SQL query.
SqlConnection Connection1 = new SqlConnection(DBConnect.SqlServerConnection);
String strSQL1 = "SELECT xxx.MEMBERKEY, xxx.MEMBID, xyz.HPCODE, convert(varchar, OPFROMDT, 101) as OPFROMDT"
+ ", convert(varchar, OPTHRUDT, 101) as OPTHRUDT FROM [main].[dbo].[yyy] INNER JOIN [main].[dbo].[xxx] ON xxx.MEMBERKEY = yyy.MEMBERKEY "
+ "and opthrudt >= opfromdt INNER JOIN [main].[dbo].[xyz] ON yyy.HPCODEKEY = xyz.HPCODEKEY where MembID = #memID";
SqlCommand command1 = new SqlCommand(strSQL1, Connection1);
command1.Parameters.AddWithValue("#memID", memID);
SqlDataReader Dr1;
Connection1.Open();
Dr1 = command1.ExecuteReader();
while (Dr1.Read())
{
HPCODEs.Add((Dr1["HPCODE"].ToString()).TrimEnd());
OPFROMDTs.Add((Dr1["OPFROMDT"].ToString()).TrimEnd());
OPTHRUDTs.Add((Dr1["OPTHRUDT"].ToString()).TrimEnd());
}
Dr1.Close();
There are so many approaches to this.
For one, you can invest some time and use SQL built in XML capabilities to query and get an XML document directly which then you can serialize straight into a file. A very basic example could be found here. Then you would use the DataReader's GetSqlXml method, something like this
SqlDataReader r = cmd.ExecuteReader();
SqlXml data = r.GetSqlXml(0);
XmlReader xr = data.CreateReader();
Maybe another option is to read from the sql data reader into DTO (data transfer objects). This is probably the tried and true method. And then once you have a list of DTO's you can use .NET's serialization (DataContractSerializer) to serialize to XML.
I would highly recommending looking at a tool like this:
http://msdn.microsoft.com/en-us/library/x6c1kb0s(v=vs.80).aspx
This will generate .net classes for you if you have an xsd of the xml output you are trying to create.
Either way, dropping your data into POCO style classes and serializing to XML is a lot better than trying to use the XmlWriter directly.

Resources