I have an incoming schema that looks like this:
<Root>
<ClaimDates005H>
<Begin>20120301</Begin>
<End>20120302</End>
</ClaimDates005H>
</Root>
(there's more to it, this is just the area I'm concerned with)
I want to map it to a schema with a repeating section, so it winds up like this:
<Root>
<DTM_StatementFromorToDate>
<DTM01_DateTimeQualifier>Begin</DTM01_DateTimeQualifier>
<DTM02_ClaimDate>20120301</DTM02_ClaimDate>
</DTM_StatementFromorToDate>
<DTM_StatementFromorToDate>
<DTM01_DateTimeQualifier>End</DTM01_DateTimeQualifier>
<DTM02_ClaimDate>20120302</DTM02_ClaimDate>
</DTM_StatementFromorToDate>
</Root>
(That's part of an X12 835, BTW...)
Of course in the destination schema there's only a single occurrence of DTM_StatementFromorToDate, that can repeat... I get that I can run both Begin and End into a looping functoid to create two instances of DTM_StatementFromorToDate, one with Begin and one with End, but then how do I correctly populate DTM01_DateTimeQualifier?
Figured it out, the Table Looping functoid took care of it.
Related
After parsing the JSON data in a column within my Kusto Cluster using parse_json, I'm noticing there is still more data in JSON format nested within the resulting projected value. I need to access that information and make every piece of the JSON data its own column.
I've attempted to follow the answer from this SO post (Parsing json in kusto query) but haven't been successful in getting the syntax correct.
myTable
| project
Time,
myColumnParsedJSON = parse_json(column)
| project myColumnParsedNestedJSON = parse_json(myColumnParsedJSON.nestedJSONDataKey)
I expect the results to be projected columns, each named as each of the keys, with their respective values displayed in one row record.
please see the note at the bottom of this doc:
It is somewhat common to have a JSON string describing a property bag in which one of the "slots" is another JSON string. In such cases, it is not only necessary to invoke parse_json twice, but also to make sure that in the second call, tostring will be used. Otherwise, the second call to parse_json will simply pass-on the input to the output as-is, because its declared type is dynamic
once you're able to get parse_json to properly have your payload parsed, you could use the bag_unpack plugin (doc) in order to achieve this requirement you mentioned:
I expect the results to be projected columns, each named as each of the keys, with their respective values displayed in one row record.
Let's say I have ~50 million records in a collection like this:
<record>
<some_data>
<some_data_id>112423425345235</some_data_id>
</some_data>
</record>
So I have maybe a million records (bad data) that look like this:
<record>
<some_data>
</some_data>
</record>
With some_data element being empty.
So if I have an element-range-index setup on some_data_id, what's an efficient XQuery query that will give me all the empty ones to delete?
I think what I'm looking for is a query that is not a FLWOR where you check the existence of children records for each element, as I think that is inefficient (i.e. pulling the data back and then filtering)?
Whereas if I did it in the cts:search query then it would be more efficient, as in filter the data before pulling it back?
Please write a query that can do this efficiently and confirm whether or not my assumptions about FLWOR statements are correct.
I don't think you need a range index to do this efficiently. Using the "universal" element indexes via cts:query constructors should be fine:
cts:element-query(xs:QName('record'),
cts:element-query(xs:QName('some_data'),
cts:not-query(cts:element-query(xs:QName('some_data_id'), cts:and-query(())))
)
)
I am a total noob with XQuery, but before at start digging deep into it, i'd like to ask some experts advice about whether i am looking at the correct direction.
I have XML in table that something looks like :
'<JOURNALEXT>
<JOURNAL journalno="1" journalpos="1" ledgercode="TD1">
</JOURNAL>
<JOURNAL journalno="1" journalpos="1" ledgercode="TD2">
</JOURNAL>
<JOURNAL journalno="1" journalpos="1" ledgercode="TD3">`enter code here`
</JOURNAL>
-----almost 50 such nodes
</JOURNALEXT>'
Now the ledger code attribute's value is there in some table. I have to filter all the nodes whose ledgercode value is not in the value that is there in table.
For example my ledger_code table has two entries TD1 & TD2
so I should get the resultant XML as
<JOURNALEXT>
<JOURNAL journalno="1" journalpos="1" ledgercode="TD3">
</JOURNAL>
-----almost 50 such nodes
</JOURNALEXT>
I can delete nodes based on one attribute by using.
declare #var_1 varchar(max) = 'TD1'
BEGIN TRANSACTION
update [staging_data_load].[TBL_STG_RAWXML_STORE] WITH (rowlock)
set XMLDATA.modify('delete /JOURNALEXT/JOURNAL[#ledgercode!= sql:variable("#var_1")]')
where job_id=#job_Id
but my case is quite complex..i need to get multiple ledgercodes from table and make sure only those nodes having table consisting ledgercodes remain. Rest all gets deleted.
I am using MS SQL SERVER 2012 ...as database and trying to write an xquery.
Here's the workflow flow I'm using.
<atomic-commit>
<dataset name="foo"/>
</atomic-commit>
<dataset-iterator dataset="foo">
<create-row dataset="hist-foo"/>
<mark-row-created dataset="hist-foo"/>
</dataset-iterator>
So basically, after dataset foo is updated, I want to record the remaining foo entries in another history table. But when I delete rows from the foo table, the rows still remain in the dataset and therefore get added to hist-foo.
I've tried to add a post-workflow to the foo databroker's delete action like this:
<workflow>
<delete-row dataset="{$context.commit-dataset-name}"/>
</workflow>
However I get an error when the delete action is called.
Also, after the first atomic commit, the foo dataset doesn't keep deleted row actions, so I can't identify which rows from deleted from the dataset.
The simplest solution for this situation would be to sift the marked-deleted rows into a separate dataset. Unfortunately this is a little long when using only built-in commands.
<dataset name="deleted-foo" databroker="..."/>
<dataset-iterator dataset="foo">
<if test="row-marked-deleted" value1="foo">
<then>
<create-row dataset="deleted-foo"/>
<copy-row from-dataset="foo" to-dataset="deleted-foo"/>
<mark-row-deleted dataset="deleted-foo"/>
</then>
</if>
</dataset-iterator>
<!-- Keeping in mind that you can't delete rows from a dataset
which is being iterated over. -->
<dataset-iterator dataset="deleted-foo">
<dataset-reset dataset="foo" no-current-row="y"/>
<!-- Assuming rows have a field 'id' which uniquely IDs them -->
<set-current-row-by-field dataset="foo" field="id" value="{$deleted-foo.id}"/>
<if test="dataset-has-current-row" value1="foo">
<then>
<delete-row dataset="foo"/>
</then>
</if>
</dataset-iterator>
<atomic-commit>
<dataset name="deleted-foo"/>
<dataset name="foo"/>
</atomic-commit>
<dataset-iterator dataset="foo">
<create-row dataset="hist-foo"/>
<mark-row-created dataset="hist-foo"/>
</dataset-iterator>
An alternate solution would be to do the history recording at the same time as the inserts/updates were run, for example by running multiple statements within the operations or by having insert/update triggers set up if those are available.
I think that in the answer from Tristan you don't necessarily need to commit "deleted-foo" dataset as you don't mark its rows with any commit flag.
A bit further - I would personally move those operations into pre- and post- commit workflows of the databroker. You'd capture all rows marked as deleted in pre-commit workflow and then delete rows from the foo dataset and populate history dataset in the post-commit workflow.
I developed an automation application of a car service. I started accessories module yet but i cant imagine how should I build the datamodel schema.
I've got data of accessories in a text file, line by line (not a cvs or ext.., Because of that, i split theme by substring). Every month, the factory send the data file to the service. It includes the prices, the names, the codes and etc. Every month the prices are updated. I thought the bulkinsert (and i did) was a good choice to take the data to SQL, but it's not a solution to my problem. I dont want duplicate data just for having the new prices. I thought to insert only the prices to another table and build a relation between the Accessories - AccesoriesPrices but sometimes, some new accessories can be added to the list, so i have to check every line of Accessories table. And, the other side, i have to keep the quantity of the accessories, the invoices, etc.
By the way, they send 70,000 lines every month. So, anyone can help me? :)
Thanks.
70,000 lines is not a large file. You'll have to parse this file yourself and issue ordinary insert and update statements based upon the data contained therein. There's no need for using bulk operations for data of this size.
The most common approach to something like this would be to write a simple SQL statement that accepts all of the parameters, then does something like this:
if(exists(select * from YourTable where <exists condition>))
update YourTable set <new values> where <exists condition>
else
insert into YourTable (<columns>) values(<values>)
(Alternatively, you could try rewriting this statement to use the merge T-SQL statement)
Where...
<exists condition> represents whatever you would need to check to see if the item already exists
<new values> is the set of Column = value statements for the columns you want to update
<columns> is the set of columns to insert data into for new items
<values> is the set of values that corresponds to the previous list of columns
You would then loop over each line in your file, parsing the data into parameter values, then running the above SQL statement using those parameters.