convert cellosaurus.xml file into a data.frame in R - r

I have an XML file that I haven't been able to get into a good data.frame format. I'm close but it's not quite there yet.
cellosaurus.xml slightly modified this file by removing everything before and after <cell-line-list> and </cell-line-list> tags
This is the messy code I've written so far:
require(XML)
require(xml2)
require(rvest)
require(dplyr)
require(xmltools)
require(stringi)
require(gtools)
setwd("~/Documents/Cancer_Cell_Lines/Cellosaurus")
file <- "cellosaurus.xml"
cellosaurus <- file %>% xml2::read_xml()
nodeset <- cellosaurus %>% xml_children()
terminal_xpaths <- nodeset[1] %>% xml_get_paths() %>% unlist() %>% unique()
terminal_nodesets <- lapply(terminal_xpaths[1], xml2::xml_find_all, x = cellosaurus)
df_list <- terminal_nodesets %>% purrr::map(xml_dig_df)
df <- lapply(df_list[[1]], function(x) as.data.frame(x))
table <- do.call("smartbind", df)
Problem 1: There are duplicate column names that are mixed up. For example in the file there are many paths that end up at a node called cv.term like
"/cell-line-list/cell-line/disease-list/cv-term"
"/cell-line-list/cell-line/species-list/cv-term"
"/cell-line-list/cell-line/derived-from/cv-term"
but in the table I get columns called cv.term, cv.term.1,cv.term.2 but the contents are mixed up because of missing data. Is there a way to fix this.
Problem 2: The file is big and it takes a long time to run (I've only been able to test on a small subset of the full file), I haven't been able to figure out how to split the xml correctly except by splitting into as many files are there are nodes ~109,000. And then I had a hard time incorporating that many files into my code for R to read.
Any help appreciated.

To use the relational database terminology, consider data normalization. Specifically, keep your data long as most nodes in XML are practically all one-to-many lists which you can extract each one as individual long data frames and merge together by a unique id such as cell_line node number.
Fortunately, there is a great extraction tool available known as XSLT, the special purpose, declarative language (same type as SQL) designed to transform XML into various end use needs such as extracting the individual pieces that you can parse more simply into data frames and then merge all items together. The beauty too is XSLT has nothing to do with R and is portable to other application layers (Java, PHP, Python) or dedicated XSLT processors.
See process below for roadmap to final solution. All XSLT scripts below parses from a specific part of every cell-line node and flattens XML to one child level:
R
library(xml2)
library(xslt) # INSTALL PACKAGE BEFORE HAND
library(dplyr) # ONLY FOR bind_rows
# PARSE XML AND XSLT
doc <- read_xml('Cellosaurus.xml')
scripts <- list.files(path='/path/to/xslt/scripts', pattern='.xsl')
xpaths <- c('//accession', '//cell-line', '//hla_gene', '//marker',
'//name', '//species_list', '//url')
proc_xml_parse <- function(x, s) {
style <- read_xml(s, package = "xslt")
# TRANSFORM INPUT INTO OUTPUT
new_xml <- xslt::xml_xslt(doc, style)
# INNER DF LIST BUILD
df_list <- lapply(xml_find_all(new_xml, x), function(x) {
vals <- xml_children(x)
setNames(data.frame(t(xml_text(vals)), stringsAsFactors = FALSE), xml_name(vals))
})
bind_rows(df_list)
}
# OUTER DF LIST BUILD
df_list <- Map(proc_xml_parse, xpaths, scripts)
# CHAIN MERGE
final_df <- Reduce(function(x,y) merge(x, y, by="cell_num", all=TRUE), df_list)
XSLT Scripts
Save each as separate .xsl or .xslt files (special .xml files) to be loaded in R above. Add more XSLT scripts by replicating patterns for other list nodes in XML as below does not capture all.
Cell Line List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Accession List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="accession-list"/>
</xsl:template>
<xsl:template match="accession-list">
<xsl:apply-templates select="accession"/>
</xsl:template>
<xsl:template match="accession">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line[1]/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<accession_value><xsl:value-of select="."/></accession_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Name List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="name-list"/>
</xsl:template>
<xsl:template match="name-list">
<xsl:apply-templates select="name"/>
</xsl:template>
<xsl:template match="name">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<name_value><xsl:value-of select="."/></name_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Web Page List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="web-page-list"/>
</xsl:template>
<xsl:template match="web-page-list">
<xsl:apply-templates select="url"/>
</xsl:template>
<xsl:template match="url">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<url_value><xsl:value-of select="."/></url_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
HLA List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="hla-lists/hla-list"/>
</xsl:template>
<xsl:template match="hla-list">
<xsl:apply-templates select="hla-gene"/>
</xsl:template>
<xsl:template match="hla-gene">
<hla_gene>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<hla_value><xsl:value-of select="."/></hla_value>
</hla_gene>
</xsl:template>
</xsl:stylesheet>
Special List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="species-list/cv-term"/>
</xsl:template>
<xsl:template match="cv-term">
<species_list>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<species_value><xsl:value-of select="."/></species_value>
</species_list>
</xsl:template>
</xsl:stylesheet>
Marker List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="str-list"/>
</xsl:template>
<xsl:template match="str-list">
<xsl:apply-templates select="marker-list"/>
</xsl:template>
<xsl:template match="marker-list">
<xsl:apply-templates select="marker"/>
</xsl:template>
<xsl:template match="marker">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<xsl:copy-of select="marker-data-list/marker-data/alleles"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
After chain merge where values repeat for every unique row similar to SQL joins for long data frames (many-to-many). Do note: there is a named list of data frames should you not want below merged output:

Just one comment: when you say "~109,000 cell lines with variations in missing data between each cell-line", you need to understand that the only madatory fields in a Cellosaurus entry are the primary accession, the cell line name (identifier), the cell line category and the taxonomy, all the rest are not required. All this is described in the cellosaurus.xsd files either using "minoccurs="0" or use "optional" depending on the type of field.

Related

Combining XML Nodes into a single node with an XSLT

I'm trying to edit some XML with a transform but I'm struggling to achieve my desired results.
I have some XML:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY" />
<Attribute NAME="N1" VALUE="V1" />
<Attribute NAME="N2" VALUE="V2" />
<Attribute NAME="N3" VALUE="V3" />
</FX>
And I want to transform it to look like:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY" />
<Attribute NAME="N1, N2, N3" VALUE="V1,V2,V3" />
</FX>
Is this possible? Can anyone offer any suggestions on how to do this with a transform?
You can use the following, Asp.NET compatable, XSLT-1.0 stylesheet to perform an XSLT transformation from your source XML to your destination XML:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/FX">
<xsl:copy>
<xsl:copy-of select="Order" />
<Attribute>
<xsl:attribute name="NAME">
<xsl:for-each select="Attribute">
<xsl:value-of select="#NAME" />
<xsl:if test="position() != last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
<xsl:attribute name="VALUE">
<xsl:for-each select="Attribute">
<xsl:value-of select="#VALUE" />
<xsl:if test="position() != last()">
<xsl:text>,</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
</Attribute>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Its output is:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY"/>
<Attribute NAME="N1, N2, N3" VALUE="V1,V2,V3"/>
</FX>
In general, if you want to transform some nodes but keep the rest you use the identity transformation template as the starting point and then add templates that change those nodes you want to change:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FX/Attribute[1]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FX/Attribute[position() > 1]"/>
<xsl:template match="FX/Attribute[1]/#*">
<xsl:attribute name="{name()}">
<xsl:for-each select=". | ../following-sibling::Attribute/#*[name() = name(current())]">
<xsl:if test="position() > 1">,</xsl:if>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/jyH9rNk

How to split into 3 JSONs from single XML file using XSLT in Oxygen

My Input XML is having Header, Content and Footer Part. Conversion from XML to JSON works well using XSLT. But I need the output as a three parts as header, Content and Footer:
My Input XML file is:
<header>
<trackingSettings>
<urlcode>W3333</urlcode>
<apiurl>http://mlucenter.com/like/api</apiurl>
</trackingSettings>
</header>
<mlu3_body>
<columnsCount>2</columnsCount>
<lineBackground>linear-gradient(to right, rgba(94, 172, 192, 0) 0%, c4cccf 50%, rgba(94, 172, 192, 0) 100%)</lineBackground>
</mlu3_body>
<footer>
<buttons>
<button/>
</buttons>
<banner/>
</footer>
My XSLT using:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" method="xml" />
<xsl:template match="*">
<xsl:value-of select="name()"/> : <xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template match="*" mode="ArrayElement">
<xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template name="Properties">
<xsl:variable name="childName" select="name(*[1])"/>
<xsl:choose>
<xsl:when test="not(*|#*)">"<xsl:value-of select="."/>"</xsl:when>
<xsl:when test="count(*[name()=$childName]) > 1">{ "<xsl:value-of select="$childName"/>" :[<xsl:apply-templates select="*" mode="ArrayElement"/>] }</xsl:when>
<xsl:otherwise>{
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="*"/>
}</xsl:otherwise>
</xsl:choose>
<xsl:if test="following-sibling::*">,</xsl:if>
</xsl:template>
<xsl:template match="#*">"<xsl:value-of select="name()"/>" : '<xsl:value-of select="."/>',
</xsl:template>
</xsl:stylesheet>
Here im using Saxon PE in the oxygen:
I want this XML converted to 3 JSON files named header.json, content.json(mlu3_body) and footer.json in the output.
Is this possible by using XSLT or do I want to keep all input files separately. Please provide some ideas.
Change the XSLT to
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="*">
<xsl:value-of select="name()"/> : <xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template match="*" mode="ArrayElement">
<xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template name="Properties">
<xsl:variable name="childName" select="name(*[1])"/>
<xsl:choose>
<xsl:when test="not(*|#*)">"<xsl:value-of select="."/>"</xsl:when>
<xsl:when test="count(*[name()=$childName]) > 1">{ "<xsl:value-of select="$childName"/>" :[<xsl:apply-templates select="*" mode="ArrayElement"/>] }</xsl:when>
<xsl:otherwise>{
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="*"/>
}</xsl:otherwise>
</xsl:choose>
<xsl:if test="following-sibling::*">,</xsl:if>
</xsl:template>
<xsl:template match="#*">"<xsl:value-of select="name()"/>" : '<xsl:value-of select="."/>',
</xsl:template>
<xsl:template match="header">
<xsl:result-document href="header.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="mlu3_body">
<xsl:result-document href="content.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="footer">
<xsl:result-document href="footer.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
and it should generate three result files for those three elements. You will have to edit your question and tell us exactly which result you want if the produced contents is not quite right yet. It is also not clear whether the snippet of XML you have shown is part of a larger document.

XSLT String length from beginning of document

I have this XSLT to split a 25 MB XHTML file.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="html/body"/>
</xsl:template>
<xsl:template match="body">
<xsl:for-each-group select="node()"
group-starting-with="*[position()=1 or #class='toc']">
<xsl:if test="count(current-group()[self::*]) > 0 ">
<xsl:variable name="filename" select="concat('/home/t',position(),'.xml' )"/>
<xsl:apply-templates/>
<xsl:result-document
indent="yes" method="xml" href="$filename}">
<html>
<xsl:copy-of select="/html/#*"/>
<xsl:for-each select="/html/node()">
<xsl:choose>
<xsl:when test="not(self::body)">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:copy-of select="current-group()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</html>
</xsl:result-document>
</xsl:if>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
It currently works at splitting up the file when it finds a #toc. I need to alter this to be sensitive to size of the output file, as opposed to breaking at the #toc.
Desired end state: I want the result document to be about 500KB. I suppose position() might be the best way to regulate the split points?? I tried various string-length() approaches--I could not get one to work. Also, I think white space may be an issue.
By my calculations with these documents, splitting the file at a <p class="i0"> found at or near every 150th position increment should reliably give me the filesize I need.
I guess the best way to get there is to change this:
group-starting-with="*[position()=1 or #class='toc']"
So far I have not succeeded in anything I have changed it to. Thoughts?
UPDATE: I'm not ready to say this is answered, because someone may have a better idea. But right now I'm using group-starting-with="body/*[position()=1 or position() mod 350 = 0]" with some success. It is testing well.
UPDATE 2: The group-starting-with="body/*[position()=1 or position() mod 350 = 0]" is not working well. Problem is that it is the position within the for-each-loop, not the overall file.
The successful solution ended up being an xslt 3.0 accumulator.
As an alternative:
Dmitiri Novatchev solution for XSLT 1.0:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:variable name="vResult">
<xsl:apply-templates/>
</xsl:variable>
Length of output is: <xsl:text/>
<xsl:value-of select="concat(string-length($vResult), '
')"/>
<xsl:if test="string-length($vResult) <= 1800">
<xsl:copy-of select="$vResult"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on this source.xml:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
produces the wanted result:
Length of output is: 51
01
02
03
04
05
06
07
08
09
10
References
XSLT FAQ: WML and HDML - Measuring the size of the output file, in bytes
XSLT 3.0: Accumulator Function
Utilizing new capabilities of XML languages to verify integrity constraints
A Functional Tokenizer (Was: Re: Looping over a CSV in XSL)
XSL Techniques
FXSL:sumTree

xsl to retrieve other attribute values and append values into one attribute

To start:
<test style="font:2px;color:#FFFFFF" bgcolor="#CCCCCC" TOPMARGIN="5">style</test>
Using XSLT/XPATH, I copy everything over from my document
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
But I'm not sure how to get this result using XSLT/XPATH:
<test style="background-color:#CCCCCC; margin-top:1;font:2px;color:#FFFFFF">style</test>
I think I'm failing at the XPATH. This is my attempt at just retrieving bgColor:
<xsl:template match="#bgColor">
<xsl:attribute name="style">
<xsl:text>background-color:</xsl:text>
<xsl:value-of select="."/>
<xsl:text>;</xsl:text>
<xsl:value-of select="../#style"/>
</xsl:attribute>
</xsl:template>
Unfortunately, even this breaks when style is placed after bgColor in the original document. How can I append these deprecated attribute values into one inline style attribute?
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<test style="{#style};background-color:{#bgcolor};margin-top:{#TOPMARGIN}">
<xsl:value-of select="."/>
</test>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<test style="font:2px;color:#FFFFFF"
bgcolor="#CCCCCC" TOPMARGIN="5">style</test>
produces the wanted, correct result:
<test style="font:2px;color:#FFFFFF;background-color:#CCCCCC;margin-top:5">style</test>
Explanation: Use of AVT.
May be not the best way, but it works:
<xsl:template match="test">
<xsl:element name="{name()}">
<xsl:apply-templates select="#*[name() != 'bgcolor']"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:copy/>
</xsl:template>
<xsl:template match="#style">
<xsl:attribute name="style">
<xsl:value-of select="."/>
<xsl:text>;background-color:</xsl:text>
<xsl:value-of select="../#bgcolor"/>
</xsl:attribute>
</xsl:template>

How do I pass a xml attribute to xslt parameter?

I got everything working (thank empo) except the ctrlname column. I don't know the syntax well enough. What I am trying to do is use the xslt to sort the xml in the gridview by the column name. Everything is working but the ctrlname column. How do I pass an attribute to the XSLT? I've tried: #name, Data/#name, Data[#name], ctrlname. Nothing works.
XmlDataSource1.EnableCaching = False
Dim xslTrnsform As System.Xml.Xsl.XsltArgumentList = New System.Xml.Xsl.XsltArgumentList
xslTrnsform.AddParam("sortby", "", sortAttr)
xslTrnsform.AddParam("orderas", "", orderby)
XmlDataSource1.TransformArgumentList = xslTrnsform
XmlDataSource1.DataFile = "~/App_LocalResources/DST_Test.xml"
XmlDataSource1.XPath = "//data"
XmlDataSource1.TransformFile = xsltFileName
'XmlDataSource1.DataBind()
GridView1.DataSource = XmlDataSource1
GridView1.DataBind()
XSL
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:param name="sortby"></xsl:param>
<xsl:param name="orderas"></xsl:param>
<xsl:output method="xml" indent="yes"/>
<!--<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>-->
<xsl:template match="root">
<root>
<xsl:apply-templates select="data">
<xsl:sort select="*[name()=$sortby]" data-type="text" order="{$orderas}"/>
</xsl:apply-templates>
</root>
</xsl:template>
<xsl:template match="data">
<data>
<xsl:attribute name="ctrlname">
<xsl:value-of select="#name"/>
</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="value" />
</xsl:attribute>
<xsl:attribute name="comment">
<xsl:value-of select="comment" />
</xsl:attribute>
</data>
</xsl:template>
</xsl:stylesheet>
XML input
<?xml version="1.0" encoding="utf-8" ?>
<root>
<data name="Test1.Text" xml:space="preserve">
<value>Please Pick Bare Pump</value>
<comment>Tab - Pump Configuration</comment>
</data>
<data name="Test2.Text" xml:space="preserve">
<value>Complete</value>
<comment>A07</comment>
</data>
<data name="Test3.Text" xml:space="preserve">
<value>Confirmed</value>
<comment>A01</comment>
</data>
</root>
The currently accepted answer has one flaw: Whenever there is an attribute of data with the same name as a child element of data, the sort will always be performed using as keys the values of the attribute. Also, it is too long.
This solution solves the problem (and is shorter) allowing to specify whether the sort should be by attribute-name or by element-name:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="sortby" select="'attrib!name'"/>
<xsl:param name="orderas" select="'ascending'"/>
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="root">
<root>
<xsl:apply-templates select="data">
<xsl:sort select=
"*[name()=substring-after($sortby, 'elem!')]
|
#*[name()=substring-after($sortby, 'attrib!')]"
data-type="text" order="{$orderas}"/>
</xsl:apply-templates>
</root>
</xsl:template>
<xsl:template match="data">
<data ctrlname="{#name}" value="{value}"
comment="{comment}"/>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document (based on the provided one, but made a little-bit more interesting):
<root>
<data name="Test3.Text" xml:space="preserve">
<value>Please Pick Bare Pump</value>
<comment>Tab - Pump Configuration</comment>
<name>X</name>
</data>
<data name="Test2.Text" xml:space="preserve">
<value>Complete</value>
<comment>A07</comment>
<name>Z</name>
</data>
<data name="Test1.Text" xml:space="preserve">
<value>Confirmed</value>
<comment>A01</comment>
<name>Y</name>
</data>
</root>
the correct result (sorted by the name attribute) is produced:
<root>
<data ctrlname="Test1.Text" value="Confirmed" comment="A01"/>
<data ctrlname="Test2.Text" value="Complete" comment="A07"/>
<data ctrlname="Test3.Text" value="Please Pick Bare Pump" comment="Tab - Pump Configuration"/>
</root>
Now, replace the <xsl:param name="sortby" select="'attrib!name'"/> with:
<xsl:param name="sortby" select="'elem!name'"/>
and apply the transformation again on the same XML document. This time we get the result correctly sorted by the values of the child-element name:
<root>
<data ctrlname="Test3.Text" value="Please Pick Bare Pump" comment="Tab - Pump Configuration"/>
<data ctrlname="Test1.Text" value="Confirmed" comment="A01"/>
<data ctrlname="Test2.Text" value="Complete" comment="A07"/>
</root>
Explanation:
To distinguish whether we want to sort by an element-child or by an attribute, we use the convention that elem!someName means the sort must be by the values of a child element named someName. Similarly, attrib!someName means the sort must be by the values of an attribute named someName.
The <xsl:sort> insruction is modified accordingly to select as key correctly either an attribute or a child element. No ambiguity is allowed, because the starting substring of the sortby parameter now uniquely identifies whether the key should be an attribute or a child element.
Yes, I'm sorry didnt notice that you wanted also sort by attributes. Note also that you have changed the syntax of xsl:param and it's not correct in that way. It's very important that you keep the single quotes inside the double ones. Here is the final template:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:param name="sortby" select="'value'"/>
<xsl:param name="orderas" select="'ascending'"/>
<xsl:output method="xml" indent="yes"/>
<!--<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>-->
<xsl:template match="root">
<root>
<xsl:apply-templates select="data">
<xsl:sort select="*[name()=$sortby]|#*[name()=$sortby]" data-type="text" order="{$orderas}"/>
</xsl:apply-templates>
</root>
</xsl:template>
<xsl:template match="data">
<data>
<xsl:attribute name="ctrlname">
<xsl:value-of select="#name"/>
</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="value" />
</xsl:attribute>
<xsl:attribute name="comment">
<xsl:value-of select="comment" />
</xsl:attribute>
</data>
</xsl:template>
</xsl:stylesheet>
OK, I think this should work for you, allowing you to specify either attribute or element names in the $sortby parameter:
<xsl:template match="root">
<root>
<xsl:apply-templates select="data">
<xsl:sort select="*[name()=$sortby] | #*[name()=$sortby]" data-type="text" order="{$orderas}"/>
</xsl:apply-templates>
</root>
</xsl:template>
(you would just pass in "name" as the value of the $sortby parameter)
The problem was that the sort value node selection was only matching elements (* matches elements only).

Resources