Using Marklogic Xquery data population - xquery

I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.

Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)

You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.

Related

How to replace comma with a dot in GTM for JSON structured data?

I am noob with structured data implementation and don't have any code knowledge.
I have been looking for a week how to solve a warning with price in Google structured data testing tool.
My prices are with a comma which is not accepted by Google.
By checking the http://schema.org/price it tells me that "Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator."
I have a CSS variable element #PdtPrixRef named in a variable "Product-price" with a comma "12.5" but I can't find how to replace it in my structured data with the value "12.5"... Someone to help me?
Hereafter my actual script :
My actual GTM script
Should I add something to my script or making an VARIABLE (Custom Js)?
I think it's something like
value.replace(",", ".")
But I do't know how to write the full proper function from beginning to end...
Yes you can just create a Custom JavaScript Variable
Here is the code
function(){
var price = {{Product-price}};
return price.replace("," , ".");
}
Then using this variable to your JSON-LD script.

Is there a way to parse out information from a xcom_pull in Airflow?

So what I'm working with is I have a DAG that has specific information that is being passed through tasks, everything is working as it should. The file needs to be stored into a reports/ folder for the following tasks to work correctly. I'm calling the actual name of the report through a xcom_pull but I also want to parse out information from this xcom_pull in order to capture the unique filename itself to use later on in other tasks. I have a task later on that inserts this filename into the csv file, but I need it to match the filename itself so its a 1:1 match.
I want to parse out information of a xcom_pull option and I'm having issues doing so. The example I have is below:
report_filename = "reports/{}_{}".format('report_example', str(uuid.uuid1()))
get_report = GoogleCampaignManagerDownloadReportOperator(
task_id="get_report",
profile_id=1234,
api_version=1234,
bucket_name=test_bucket,
report_name=report_filename,
report_id=report_id,
file_id=file_id,
)
report_filename_test = xcom_pull(get_report, 'report_name')
sanitize_report = SanitizeReportOperator(
task_id='sanitize_report',
dest_bucket=test_bucket,
dest_object=report_filename_test,
shared_object=str(report_filename_test).replace('reports/', ''),
append_timestamp=True,
append_filename=True
)
As of right now the xcom_pull pulls down the following:
reports/report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
However, I want to have another xcom_pull that will only pull the following:
report_example_b3413b62-cc8a-11ec-bded-52e9ae62e477.csv.gz
I have tried converting report_filename_test to a string and using the replace function, so for example:
new_test = str(report_filename_test).replace('reports/', '')
But when attempting this, it makes the new_test converting into a NULL format or ignores it completely and saves the file later on into a reports/ folder.
I have also tried passing the report_filename into a list and grabbing the first iteration and grabbing the first iteration, but with how Airflow works from task to task, it creates a new filename with a different uuid each time, which is not what I'm aiming to have done. I have also tried doing a PythonOperator option to create a function specifically to name the file and be called later on throughout the DAG but have not had any luck with this either.
Is there a way to do this where you can parse out the information from a xcom_pull or another way to make this work? The end goal is to essentially have a file name with a specific uuid that I can pass through into the csv file and rename the file to the same specific uuid that is being built without the folder name in front.
I'm just looking to have a unique filename be passed through multiple tasks that is the exact same each time with a uuid format. I'm running out of ideas of how to make this work and have been stuck on this for almost two weeks now.
Any help with this would be greatly appreciated!

Why fn:substring-after Xquery function could not be used inside ML TDE

In my ML db, we have documents with distributor code like 'DIST:5012' (DIST:XXXX) XXXX is a four-digit number.
currently, in my TDE, the below code works well.
However instead of concat all the raw distributor codes, I want to simply concat the number part only. I used the fn:substring-after XQuery function. However, it won't work. It won't show that distributorCode column in the SQL View anymore. (Below code does not work.)
What is wrong? How to fix that?
Both fn:substring-after and fn:string-join is in TDE Dialect page.
https://docs.marklogic.com/9.0/guide/app-dev/TDE#id_99178
substring-after() expects a single string as input, not a sequence of strings.
To demonstrate, this will not work:
let $dist := ("DIST:5012", "DIST:5013")
return substring-after($dist, "DIST:")
This will:
for $dist in ("DIST:5012", "DIST:5013")
return substring-after($dist, "DIST:")
I need to double check what XPath expressions will work in a DTE, you might be able to change it to apply the substring-after() function in the last step:
fn:string-join( distributors/distributor/urn/substring-after(., 'DIST:'), ';')

Read embedded data that starts with numbers?

I have embedded data that I have imported into Qualtrics use a web service block. The data comes from a .json file and reads something like 0.male, 1.male, 2.male, etc.
I have been trying to read this into my survey using the Qualtrics.SurveyEngine.getEmbeddedData method but without luck.
I'm trying to do something that takes the form.
let n = 2
Qualtrics.SurveyEngine.getEmbeddedData(n + ".male")
but this has been returning a NULL result. Is it possible to read embedded data that starts with a number?
Also see:
https://community.qualtrics.com/XMcommunity/discussion/15991/read-in-embedded-variables-using-a-loop#latest
The issue isn't the number, it is the dot. getEmbeddedData() doesn't work when the name contains a dot. See https://stackoverflow.com/a/51802695/4434072 for possible alternatives.

SQLite X'...' notation with column data

I am trying to write a custom report in Spiceworks, which uses SQLite queries. This report will fetch me hard drive serial numbers that are unfortunately stored in a few different ways depending on what version of Windows and WMI were on the machine.
Three common examples (which are enough to get to the actual question) are as follows:
Actual serial number: 5VG95AZF
Hexadecimal string with leading spaces: 2020202057202d44585730354341543934383433
Hexadecimal string with leading zeroes: 3030303030303030313131343330423137454342
The two hex strings are further complicated in that even after they are converted to ASCII representation, each pair of numbers are actually backwards. Here is an example:
3030303030303030313131343330423137454342 evaluates to 00000000111430B17ECB
However, the actual serial number on that hard drive is 1141031BE7BC, without leading zeroes and with the bytes swapped around. According to other questions and answers I have read on this site, this has to do with the "endianness" of the data.
My temporary query so far looks something like this (shortened to only the pertinent section):
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
cast(('X''' || pd.serial || '''') as TEXT)
WHEN pd.serial like "202020%" THEN
LTRIM(X'2020202057202d44585730354341543934383433')
ELSE
pd.serial
END as HDSerial
The result of that query is something like this:
HDModel HDSerial
----------------- -------------------------------------------
Normal Serial 5VG95AZF
202020% test case W -DXW05CAT94843
303030% test case X'3030303030303030313131343330423137454342'
This shows that the X'....' notation style does convert into the correct (but backwards) result of W -DXW05CAT94843 when given a fully literal number (the 202020% line). However, I need to find a way to do the same thing to the actual data in the column, pd.serial, and I can't find a way.
My initial thought was that if I could build a string representation of the X'...' notation, then perhaps cast() would evaluate it. But as you can see, that just ends up spitting out X'3030303030303030313131343330423137454342' instead of the expected 00000000111430B17ECB. This means the concatenation is working correctly, but I can't find a way to evaluate it as hex the same was as in the manual test case.
I have been googling all morning to see if there is just some syntax I am missing, but the closest I have come is this concatenation using the || operator.
EDIT: Ultimately I just want to be able to have a simple case statement in my query like this:
SELECT pd.model as HDModel,
CASE
WHEN pd.serial like "30303030%" THEN
LTRIM(X'pd.serial')
WHEN pd.serial like "202020%" THEN
LTRIM(X'pd.serial')
ELSE
pd.serial
END as HDSerial
But because pd.serial gets wrapped in single quotes, it is taken as a literal string instead of taken as the data contained in that column. My hope was/is that there is just a character or operator I need to specify, like X'$pd.serial' or something.
END EDIT
If I can get past this first hurdle, my next task will be to try and remove the leading zeroes (the way LTRIM eats the leading spaces) and reverse the bytes, but to be honest, I would be content even if that part isn't possible because it wouldn't be hard to post-process this report in Excel to do that.
If anyone can point me in the right direction I would greatly appreciate it! It would obviously be much easier if I was using PHP or something else to do this processing, but because I am trying to have it be an embedded report in Spiceworks, I have to do this all in a single SQLite query.
X'...' is the binary representation in sqlite. If the values are string, you can just use them as such.
This should be a start:
sqlite> select X'3030303030303030313131343330423137454342';
00000000111430B17ECB
sqlite> select ltrim(X'3030303030303030313131343330423137454342','0');
111430B17ECB
I hope this puts you on the right path.

Resources