XSL, comparing dates to exclude any past events - datetime

I have an RSS of an events feed. I would like to hide previous events.
Assuming XML data subset of
<Navigation Name="ItemList" Type="Children">
<Page ID="x32444" URL="..." Title="Class..."
EventStartDate="20090831T23:00:00" EventEndDate="20090904T23:00:00"
EventStartTime="20090830T15:30:00" EventEndTime="20090830T18:30:00" Changed="20090830T20:28:31" CategoryIds="" Schema="Event"
Name="Class of 2010 BAKE SALE"/>
<Page ID="x32443" URL="x32443.xml?Preview=true&Site=&UserAgent=&IncludeAllPages=true&tfrm=4" Title="Class of 2010 BAKE SALE"
Abstract="Treat yourself with our famous 10-star FRIED ICE CREAM!" EventStartDate="20090831T23:00:00" EventEndDate="20090904T23:00:00"
EventStartTime="20090830T15:30:00" EventEndTime="20090830T18:30:00" Changed="20090830T20:25:35" CategoryIds="" Schema="Event"
Name="Class of 2010 BAKE SALE"/>
<Page ID="x32426" URL="x32426.xml?Preview=true&Site=&UserAgent=&IncludeAllPages=true&tfrm=4" Title="Tribute to ..."
Abstract="Event to recognize and celebrate the lifetime of leadership and service ..."
EventStartDate="20091206T00:00:00" EventEndDate="20091206T00:00:00" EventStartTime="20090828T23:00:00" EventEndTime="20090828T04:00:00"
Changed="20090828T22:09:54" CategoryIds="" Schema="Event" Name="Tribute to ...."/>
</Navigation>
How would I not include anything past today's date
<xsl:apply-template select="Page[#EventStartDate=notBeforeToday()]"/>

Easiest with XSL parameters that you set from outside.
<xsl:param name="today" select="'undefined'" />
<!-- time passes... -->
<xsl:apply-templates select="Page[#EventStartDate < $today]"/>
Your date format is such that you can compare it using string comparison, unless there are different timezones involved. You would simply set
20091001T00:00:00
as the param value for $today. Have a look into your XSLT processor's documentation to see how.
The alternative would be to use an extension function. Here it depends on which extension functions your XSLT processor supports, so this approach won't be portable.

For this purpose, i usually add an extra date attribute in the XML which contains the day number since year 1900.
for example #dateid='9876543' or #seconds="9876675446545"
then i can can easily compare with today or another variable in the XSL.
You can also use this technique to compare times using "Unix time" for example

Related

Accessing XML data in R that is several layers embedded

I'm working with a well-structured XML file. So far, I have successfully accessed elements of this dataset that are only one layer/subfield deep. However, now I need to access one type of data that is more deeply embedded within this data structure, and the expected method is not working...
Excerpt from the XML data; this is the "target" field that I need to access, where each node (i.e. drug) can have between 0 and N targets (I am arbitrarily setting N to 20 for now, since I'm not sure what this value is for the entire dataset):
<targets> --> 51st field in each node
<target> --> there are a variable number of targets per drug
<id>BE0000048</id> --> this is the value I want for each Target
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes.
...
I have determined that the main Target field that I need is Field 51 within each node's structure, thus the hardcoded value below. I would think that accessing the i'th node's id value within the j'th target within the node's Target field should have an index of [[i]][[51]][[j]][[1]] or [[i]][[51]][[j]][['id']]:
This is my code that isn't working as expected:
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[51]][[j]][[1]]
}
}
The behavior I'm seeing is that I can extend the subscripts out numerous levels on the command line, and never narrow the result any more than the following:
> Data[[1]][[51]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
[1] "BE0000048ProthrombinHumaninhibitor10505536Turpie AG: Anticoagulants...
It doesn't seem to matter how many subscripts I add; all of the fields in the Target subfield are always conjoined and don't seem to be able to be separated...
Confusingly, when I run my code, I get the following error message:
Error in Data[[i]][[51]][[1]] : subscript out of bounds
... which doesn't seem to make sense, given that I am limiting i to the number of nodes, and that there is no error thrown for even the ridiculously long list of subscripts show above, when I query that phrase on the command line...
Thanks in advance for any insights you can provide.
Thanks for your suggestion, cderv; I will plan to check out the xml2 package and XPATH. I really appreciate your willingness to provide an example.
I am pasting what should be a functional subset of my XML file; however, now instead of the "targets" field being the 51st field, it is the sixth. Again, it is the targets --> target --> id value that I want to report for each target, with each node having a variable number of target values. My code follows the XML content.
<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.drugbank.ca/docs/drugbank.xsd" version="5.0" exported-on="2017-07-06">
<drug type="biotech" created="2005-06-13" updated="2016-08-17">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine at the N-terminal end of the molecule and the absence of a sulfate group on the tyrosine at position 63. It is produced via yeast cells. Bayer ceased the production of lepirudin (Refludan) effective May 31, 2012.</description>
<targets>
<target>
<id>BE0000048</id>
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
</article>
<article>
<pubmed-id>10912644</pubmed-id>
<citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
</article>
</articles>
</references>
<known-action>yes</known-action>
</target>
</targets>
</drug>
</drugbank>
Now that I have significantly truncated the above file, my code is now giving an error message that any subscripts above Data[[1]][[1]] are out of bounds, but hopefully this code gives you an idea of what I'm aiming to do...
library(XML)
# Save the database file as a tree structure
xmldata = xmlRoot(xmlTreeParse("DrugBank_TruncatedDatabase_v4_Tiny.xml"))
# Number of nodes in the entire database file
NumNodes <- xmlSize(xmldata)
MaxTargets <- 20
Data <- xmlSApply(xmldata, function(x) xmlSApply(x, xmlValue))
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[5]][[j]][[1]]
}
}
Thanks for your input!

Testing DAX calculation with NBi

I'm doing some research on automated test tool for our SSAS Tabular project. I found NBi and thought it is really cool. I attempted to set it up and successfully ran some basic tests. However, when I attempted to test dax calculation, it says "Function not found" (see screenshot). It sounds like it does not support SUM, but given that SUM is a basic function I would imagine it should work. Since I'm new to this tool, I wanted to double check if I've done something wrong or it is simply what the error is saying... (not supported function).
I went back and review NBi documentation and it mentioned to check out their NCAL.dll for all available expression. Unfortunately, I'm unable to open a readable version of that dll file. Any help is appreciated.
Here is the formula I want to test:
=SUMX(FILTER(MyTable, AND(MyTable[Date] = EARLIER(MyTable[Date]), MyTable[Account] = EARLIER(MyTable[Account]))), MyTable[Amount])
XML code (nbits) file
<test name="My second test: Calculated column compared to DAX formula">
<system-under-test>
<execution>
<query connectionString="Provider=MSOLAP.7;Data Source...">
<![CDATA[
EVALUATE
SUMMARIZE (MyTable, MyTable[Date], MyTable[Account], MyTable[Amount], MyTable[CalculatedAmount])
]]>
</query>
</execution>
</system-under-test>
<assert>
<evaluate-rows>
<variable column-index="0">Date</variable>
<variable column-index="1">Account</variable>
<variable column-index="2">Amount</variable>
<variable column-index="3">CalculatedAmount</variable>
<expression column-index="3" type="numeric" tolerance="0.01"> = SUMX(FILTER(MyTable, AND(MyTable[Date] = EARLIER(MyTable[Date]), MyTable[Account] = EARLIER(MyTable[Account]))), MyTable[Amount])</expression>
</evaluate-rows>
</assert>
</test>
NBi supports the evaluation of DAX queries in the query tag but not in an expression tag. Expression and evaluate-rows tags are not designed to compare two queries. To achieve this, change your test to use the assertion equalTo between your two queries. It will be easier and will work.
I guess a better question would be how do I test a measure and a
calculated column in term of ensuring that another developer doesn't
accidentally change the calculation/expression I entered when
designing the Tabular model?
I'll answer at three levels: conceptual, logical and technical.
At the conceptual level, your test is wrong: you should never use the same implementation in your assertion and in your system-under-test. This is not specifc to NBi or to any framework but to all automated tests. The role of a test is not ensure that someone doesn't change something but to ensure that something gives the correct result. Comparing an artifact to itself will always result in a green test even if your implementation is wrong. In this case, you must change your assertion with a concrete static result or you need to create a sql statements resulting in the same calculation of your database or find another query in MDX resulting in the same result.
At the logical level the following sentence is not correct
Here is the formula I want to test:
You've defined this formula in your assert and not in your system-under-test. It means that it's not what you're testing but it's your reference (something you're 100% sure that it's correct). What you're testing is the query EVALUATE SUMMARIZE (MyTable, MyTable[Date], MyTable[Account], MyTable[Amount], MyTable[CalculatedAmount]).
At the technical level, using an evaluate-rows is nopt the correct option. This assertion is not expecting a function or a query but an expression based on row's variable (no DAX, no SQL, ...). The usage of EARLIER is a clear sign that it won't be possible. In your case, you must compare two queries probably something as:
<assert>
<equalTo>
<column index="0" role="key" type="dateTime"/>
<column index="1" role="key" type="numeric"/>
<column index="2" role="value" type="numeric"/>
<column index="3" role="value" type="numeric" tolerance="0.01"/>
<query>
EVALUATE SUMMARIZE (MyTable, MyTable[Date], MyTable[Account], MyTable[Amount], SUMX(FILTER(MyTable, AND(MyTable[Date] = EARLIER(MyTable[Date]), MyTable[Account] = EARLIER(MyTable[Account]))), MyTable[Amount])
</query>
</equalTo>
</assert>
PS: I'm clearly not a specilist of DAX and I'm not sure the query above is valid from a syntax point of view.

Using awk/cut/sed with lines containing different numbers of fields

I'm using the pinboard.in API to get a list of my current bookmarks. The results look like this:
<post href="https://www.nocc.meezy.com/doc/view.cgi?id=715" time="2013-02-11T17:38:10Z" description="Disk Errors Process Flow Chart" extended="" tag="nocc work" hash="a3419515b2e956e86886ba630b6028b7" meta="d793aeef6133a26e361695181eb57b9d" />
<post href="https://www.nocc.meezy.com/doc/view.cgi?id=39" time="2013-02-11T17:38:08Z" description="Using socat" extended="" tag="socat work" hash="fd60523bf841b2b95674a0e1d4401f4d" meta="5f2b6ad395fe4da05b2987d199b675ea" />
<post href="https://agora.meezy.com/wiki/Development_Tools" time="2013-02-11T17:38:06Z" description="Development Tools - meezyWiki" extended="" tag="devtools work" hash="dcf904433987a125c00a88bcaf31cad27" meta="5e744562282561390a0417223d323aee" />
I'm only interested in the URL, description, and tags, so I'd like to have the results look like this:
https://www.nocc.meezy.com/doc/view.cgi?id=715 description="Disk Errors Process Flow Chart" tag="nocc work"
https://www.nocc.meezy.com/doc/view.cgi?id=39 description="Using socat" extended="" tag="socat work"
https://agora.meezy.com/wiki/Development_Tools description="Development Tools - meezyWiki" tag="devtools work
I know a little bit about awk/cut/sed but not enough to tell them how to count the fields correctly when the description and tag fields contains spaces and different numbers of strings.
I could probably hack together some really crappy solution if my life depended on it but I'd rather get a proper solution by someone who knows them much better than I do.
Thanks
when you playing with xml with regex/awk/sed.. you should know the risk. here is sed one-liner for your requirement:
sed -r 's/^.*"(http)/\1/; s/" time=.*( desc)/ \1/; s/extended=.*( tag=")/\1/; s/hash=.*//' file
test with your example:
kent$ sed -r 's/^.*"(http)/\1/; s/" time=.*( desc)/ \1/; s/extended=.*( tag=")/\1/; s/hash=.*//' file
https://www.nocc.meezy.com/doc/view.cgi?id=715 description="Disk Errors Process Flow Chart" tag="nocc work"
https://www.nocc.meezy.com/doc/view.cgi?id=39 description="Using socat" tag="socat work"
https://agora.meezy.com/wiki/Development_Tools description="Development Tools - meezyWiki" tag="devtools work"

whats wrong with my XML for document ranking?

I wrote a program in C# to calculate TF-IDF to rank documents.
I used the following XML to store the word frequencies within documents. I was criticised heavily for using this structure. Even though I use the text of the word within the Tag, as per me its efficient and consumes less space. Also, I can make a search using XDocument pretty easily since its a nice tree structure. Can you help me understand why was I criticised heavily?
Criticism: How can you add information within meta-data? (For me its innovative).
<word>
<siddhartha>
<doc1> 4 </doc4>
<doc2> 5 </doc2>
<insipration>
<doc1> 4 </doc1>
<doc6> 5 </doc6>
....
</word>
I was suggested something like this:
<word>
<text> siddhartha </text>
<doc1> 4 </doc1>
<text> inspiration </text>
<doc1> 4 </doc1>
...
</word>
Your structure, with word name as node, will be hard to parse with generic parsers. There is no defined structure: you need to read the whole document to know it.
I may have done something like this (I tried to stay closed to your idea):
<words>
<word id="siddhartha">
<freq id="doc1"> 4 </freq>
<freq id="doc2"> 5 </freq>
</word>
....
</words>

How to convert ticks into a readable datetime with XSLT?

I have an XML with timestamps like this:
<node stamp="1236888746689" />
And I would like to display them in the result HTML as date with time.
Is there a way to do it with XSLT (any Version)?
EDIT:
I am using XSLT2.0 with Saxon9. The base date is 1970-01-01 0:00.
You take the date 1970-01-01T00:00:00 and add as many milliseconds as the value of the stamp tells you:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.w3.org/TR/xhtml1/strict">
<xsl:template match="node">
<xsl:value-of
select='xs:dateTime("1970-01-01T00:00:00") + #stamp * xs:dayTimeDuration("PT0.001S")'/>
</xsl:template>
</xsl:stylesheet>
If you are using an XSLT 1.0 processor which supports the EXSLT date functions (I've just tested this with libxslt in PHP), you can use date:add() and date:duration():
<xsl:value-of select="date:add('1970-01-01T00:00:00Z', date:duration(#stamp div 1000))"/>
The date:duration() function takes a number of seconds (so you have to divide your milliseconds by 1000) and turns it into a "duration" (in this case, "P14315DT20H12M26.6889998912811S"), which is then added to the start of your epoch (looks like the standard epoch, for this stamp) with date:add() to get a stamp of "2009-03-12T20:12:26.6889998912811Z". You can then format this using the EXSLT date functions or just substring(), depending on what you need.
Belated answer, yes, I know, but I couldn't find the one I was looking for here, so I thought I'd pay it forward with my solution.
My XML was a few nodes dumped from Drupal using export_node and drush. I was using the xslt processor in PHP5, which only supports xslt 1.0. Some EXSLT functions appear to be supported, but I couldn't tell whether my syntax was wrong or the function I was trying to use was not supported. Anyway, the following worked for me. I used the example code from w3schools.com, but added a line right after declaring the xsltprocessor, like below:
$xp = new XsltProcessor();
$xp->registerPHPFunctions();
PHP has a trivial function for date conversion, so I cheated and used the PHP processor, since I was already using it to transform my xsl.
<xsl:for-each select="node_export/node">
<xsl:value-of select="php:function('date', 'n-j-y', number(timestamp))"/>
</xsl:for-each>
Hope this helps someone out there. I was banging my head for quite a while as I worked this one out.
If you wanted to use an XSL 1.0 processor that does not support the EXSLT date and time functions this is non-trivial, but it has been done.
You can have a look at Katy Coe's XSLT 1.0 implementation of the "iso-from-unix" function. It's part of a rather huge "free for non-commercial use" set of date and time functions she created.
However, your XSL processor must support the "http://exslt.org/functions" namespace for this implementation to work. Other than that there is no dependency on EXSLT.
P.S.: I'm aware that a Unix timestamp and ticks are not exactly the same thing. They are close enough, though.
XSLT is Turing complete, so there must be a way. :) Knowing at least a bit of XSLT, it will probably involve recursion.
You don't specify the exact interpretation of your "ticks", I'm guessing milliseconds since some epoch, but which? 1970?

Resources