xsl to retrieve other attribute values and append values into one attribute - css

To start:
<test style="font:2px;color:#FFFFFF" bgcolor="#CCCCCC" TOPMARGIN="5">style</test>
Using XSLT/XPATH, I copy everything over from my document
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
But I'm not sure how to get this result using XSLT/XPATH:
<test style="background-color:#CCCCCC; margin-top:1;font:2px;color:#FFFFFF">style</test>
I think I'm failing at the XPATH. This is my attempt at just retrieving bgColor:
<xsl:template match="#bgColor">
<xsl:attribute name="style">
<xsl:text>background-color:</xsl:text>
<xsl:value-of select="."/>
<xsl:text>;</xsl:text>
<xsl:value-of select="../#style"/>
</xsl:attribute>
</xsl:template>
Unfortunately, even this breaks when style is placed after bgColor in the original document. How can I append these deprecated attribute values into one inline style attribute?

This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<test style="{#style};background-color:{#bgcolor};margin-top:{#TOPMARGIN}">
<xsl:value-of select="."/>
</test>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<test style="font:2px;color:#FFFFFF"
bgcolor="#CCCCCC" TOPMARGIN="5">style</test>
produces the wanted, correct result:
<test style="font:2px;color:#FFFFFF;background-color:#CCCCCC;margin-top:5">style</test>
Explanation: Use of AVT.

May be not the best way, but it works:
<xsl:template match="test">
<xsl:element name="{name()}">
<xsl:apply-templates select="#*[name() != 'bgcolor']"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:copy/>
</xsl:template>
<xsl:template match="#style">
<xsl:attribute name="style">
<xsl:value-of select="."/>
<xsl:text>;background-color:</xsl:text>
<xsl:value-of select="../#bgcolor"/>
</xsl:attribute>
</xsl:template>

Related

convert cellosaurus.xml file into a data.frame in R

I have an XML file that I haven't been able to get into a good data.frame format. I'm close but it's not quite there yet.
cellosaurus.xml slightly modified this file by removing everything before and after <cell-line-list> and </cell-line-list> tags
This is the messy code I've written so far:
require(XML)
require(xml2)
require(rvest)
require(dplyr)
require(xmltools)
require(stringi)
require(gtools)
setwd("~/Documents/Cancer_Cell_Lines/Cellosaurus")
file <- "cellosaurus.xml"
cellosaurus <- file %>% xml2::read_xml()
nodeset <- cellosaurus %>% xml_children()
terminal_xpaths <- nodeset[1] %>% xml_get_paths() %>% unlist() %>% unique()
terminal_nodesets <- lapply(terminal_xpaths[1], xml2::xml_find_all, x = cellosaurus)
df_list <- terminal_nodesets %>% purrr::map(xml_dig_df)
df <- lapply(df_list[[1]], function(x) as.data.frame(x))
table <- do.call("smartbind", df)
Problem 1: There are duplicate column names that are mixed up. For example in the file there are many paths that end up at a node called cv.term like
"/cell-line-list/cell-line/disease-list/cv-term"
"/cell-line-list/cell-line/species-list/cv-term"
"/cell-line-list/cell-line/derived-from/cv-term"
but in the table I get columns called cv.term, cv.term.1,cv.term.2 but the contents are mixed up because of missing data. Is there a way to fix this.
Problem 2: The file is big and it takes a long time to run (I've only been able to test on a small subset of the full file), I haven't been able to figure out how to split the xml correctly except by splitting into as many files are there are nodes ~109,000. And then I had a hard time incorporating that many files into my code for R to read.
Any help appreciated.
To use the relational database terminology, consider data normalization. Specifically, keep your data long as most nodes in XML are practically all one-to-many lists which you can extract each one as individual long data frames and merge together by a unique id such as cell_line node number.
Fortunately, there is a great extraction tool available known as XSLT, the special purpose, declarative language (same type as SQL) designed to transform XML into various end use needs such as extracting the individual pieces that you can parse more simply into data frames and then merge all items together. The beauty too is XSLT has nothing to do with R and is portable to other application layers (Java, PHP, Python) or dedicated XSLT processors.
See process below for roadmap to final solution. All XSLT scripts below parses from a specific part of every cell-line node and flattens XML to one child level:
R
library(xml2)
library(xslt) # INSTALL PACKAGE BEFORE HAND
library(dplyr) # ONLY FOR bind_rows
# PARSE XML AND XSLT
doc <- read_xml('Cellosaurus.xml')
scripts <- list.files(path='/path/to/xslt/scripts', pattern='.xsl')
xpaths <- c('//accession', '//cell-line', '//hla_gene', '//marker',
'//name', '//species_list', '//url')
proc_xml_parse <- function(x, s) {
style <- read_xml(s, package = "xslt")
# TRANSFORM INPUT INTO OUTPUT
new_xml <- xslt::xml_xslt(doc, style)
# INNER DF LIST BUILD
df_list <- lapply(xml_find_all(new_xml, x), function(x) {
vals <- xml_children(x)
setNames(data.frame(t(xml_text(vals)), stringsAsFactors = FALSE), xml_name(vals))
})
bind_rows(df_list)
}
# OUTER DF LIST BUILD
df_list <- Map(proc_xml_parse, xpaths, scripts)
# CHAIN MERGE
final_df <- Reduce(function(x,y) merge(x, y, by="cell_num", all=TRUE), df_list)
XSLT Scripts
Save each as separate .xsl or .xslt files (special .xml files) to be loaded in R above. Add more XSLT scripts by replicating patterns for other list nodes in XML as below does not capture all.
Cell Line List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Accession List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="accession-list"/>
</xsl:template>
<xsl:template match="accession-list">
<xsl:apply-templates select="accession"/>
</xsl:template>
<xsl:template match="accession">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line[1]/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<accession_value><xsl:value-of select="."/></accession_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Name List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="name-list"/>
</xsl:template>
<xsl:template match="name-list">
<xsl:apply-templates select="name"/>
</xsl:template>
<xsl:template match="name">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<name_value><xsl:value-of select="."/></name_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Web Page List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="web-page-list"/>
</xsl:template>
<xsl:template match="web-page-list">
<xsl:apply-templates select="url"/>
</xsl:template>
<xsl:template match="url">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<url_value><xsl:value-of select="."/></url_value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
HLA List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="hla-lists/hla-list"/>
</xsl:template>
<xsl:template match="hla-list">
<xsl:apply-templates select="hla-gene"/>
</xsl:template>
<xsl:template match="hla-gene">
<hla_gene>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<hla_value><xsl:value-of select="."/></hla_value>
</hla_gene>
</xsl:template>
</xsl:stylesheet>
Special List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="species-list/cv-term"/>
</xsl:template>
<xsl:template match="cv-term">
<species_list>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<species_value><xsl:value-of select="."/></species_value>
</species_list>
</xsl:template>
</xsl:stylesheet>
Marker List
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Cellosaurus">
<xsl:copy>
<xsl:apply-templates select="cell-line-list/cell-line"/>
</xsl:copy>
</xsl:template>
<xsl:template match="cell-line">
<xsl:apply-templates select="str-list"/>
</xsl:template>
<xsl:template match="str-list">
<xsl:apply-templates select="marker-list"/>
</xsl:template>
<xsl:template match="marker-list">
<xsl:apply-templates select="marker"/>
</xsl:template>
<xsl:template match="marker">
<xsl:copy>
<cell_num>
<xsl:value-of select="count(ancestor::cell-line/preceding-sibling::*)+1"/>
</cell_num>
<xsl:for-each select="#*">
<xsl:element name="{name(.)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:for-each>
<xsl:copy-of select="marker-data-list/marker-data/alleles"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output
After chain merge where values repeat for every unique row similar to SQL joins for long data frames (many-to-many). Do note: there is a named list of data frames should you not want below merged output:
Just one comment: when you say "~109,000 cell lines with variations in missing data between each cell-line", you need to understand that the only madatory fields in a Cellosaurus entry are the primary accession, the cell line name (identifier), the cell line category and the taxonomy, all the rest are not required. All this is described in the cellosaurus.xsd files either using "minoccurs="0" or use "optional" depending on the type of field.

Combining XML Nodes into a single node with an XSLT

I'm trying to edit some XML with a transform but I'm struggling to achieve my desired results.
I have some XML:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY" />
<Attribute NAME="N1" VALUE="V1" />
<Attribute NAME="N2" VALUE="V2" />
<Attribute NAME="N3" VALUE="V3" />
</FX>
And I want to transform it to look like:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY" />
<Attribute NAME="N1, N2, N3" VALUE="V1,V2,V3" />
</FX>
Is this possible? Can anyone offer any suggestions on how to do this with a transform?
You can use the following, Asp.NET compatable, XSLT-1.0 stylesheet to perform an XSLT transformation from your source XML to your destination XML:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/FX">
<xsl:copy>
<xsl:copy-of select="Order" />
<Attribute>
<xsl:attribute name="NAME">
<xsl:for-each select="Attribute">
<xsl:value-of select="#NAME" />
<xsl:if test="position() != last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
<xsl:attribute name="VALUE">
<xsl:for-each select="Attribute">
<xsl:value-of select="#VALUE" />
<xsl:if test="position() != last()">
<xsl:text>,</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:attribute>
</Attribute>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Its output is:
<FX>
<Order ATTRIBUTE1="ACTIVE" ATTRIBUTE2="CCY"/>
<Attribute NAME="N1, N2, N3" VALUE="V1,V2,V3"/>
</FX>
In general, if you want to transform some nodes but keep the rest you use the identity transformation template as the starting point and then add templates that change those nodes you want to change:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FX/Attribute[1]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FX/Attribute[position() > 1]"/>
<xsl:template match="FX/Attribute[1]/#*">
<xsl:attribute name="{name()}">
<xsl:for-each select=". | ../following-sibling::Attribute/#*[name() = name(current())]">
<xsl:if test="position() > 1">,</xsl:if>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/jyH9rNk

XML Rename node name and convert datetime to date

I have an xml file I want to both rename the element name and return the date part of the date only to produce
<!-- reference the stylesheet -->
<?xml-stylesheet type="text/xsl" href="Dates.xsl"?>
<user>
<dob>1992-02-22T00:00:00.0000000</dob>
</user>
I want to both rename the element name and return the date part of the date only to produce
<!-- reference the stylesheet -->
<?xml-stylesheet type="text/xsl" href="Dates.xsl"?>
<user>
<USER_DOB>1992-02-22</USER_DOB>
</user>
In my XSL file
To change the element name this works
<xsl:template match="dob">
<USER_DOB><xsl:apply-templates select="node()"/></USER_DOB>
</xsl:template>
To change the date this works
<xsl:template match="dob">
<xsl:copy>
<xsl:call-template name="FormatDate">
<xsl:with-param name="DateTime" select="."/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
<xsl:template name="FormatDate">
<xsl:param name="DateTime" />
<xsl:variable name="date">
<xsl:value-of select="substring-before($DateTime,'T')" />
</xsl:variable>
<xsl:if test="string-length($date) != 10">
<xsl:value-of select="$DateTime"/>
</xsl:if>
<xsl:if test="string-length($date) = 10">
<xsl:value-of select="$date"/>
</xsl:if>
</xsl:template>
I need to know how to combine both changes to produce the single output element with the renamed node and the formatted date
Thanks,
Brevan
Simply have one template matching dob that does this...
<xsl:template match="dob">
<USER_DOB>
<xsl:call-template name="FormatDate">
<xsl:with-param name="DateTime" select="."/>
</xsl:call-template>
</USER_DOB>
</xsl:template>

How to split into 3 JSONs from single XML file using XSLT in Oxygen

My Input XML is having Header, Content and Footer Part. Conversion from XML to JSON works well using XSLT. But I need the output as a three parts as header, Content and Footer:
My Input XML file is:
<header>
<trackingSettings>
<urlcode>W3333</urlcode>
<apiurl>http://mlucenter.com/like/api</apiurl>
</trackingSettings>
</header>
<mlu3_body>
<columnsCount>2</columnsCount>
<lineBackground>linear-gradient(to right, rgba(94, 172, 192, 0) 0%, c4cccf 50%, rgba(94, 172, 192, 0) 100%)</lineBackground>
</mlu3_body>
<footer>
<buttons>
<button/>
</buttons>
<banner/>
</footer>
My XSLT using:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" method="xml" />
<xsl:template match="*">
<xsl:value-of select="name()"/> : <xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template match="*" mode="ArrayElement">
<xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template name="Properties">
<xsl:variable name="childName" select="name(*[1])"/>
<xsl:choose>
<xsl:when test="not(*|#*)">"<xsl:value-of select="."/>"</xsl:when>
<xsl:when test="count(*[name()=$childName]) > 1">{ "<xsl:value-of select="$childName"/>" :[<xsl:apply-templates select="*" mode="ArrayElement"/>] }</xsl:when>
<xsl:otherwise>{
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="*"/>
}</xsl:otherwise>
</xsl:choose>
<xsl:if test="following-sibling::*">,</xsl:if>
</xsl:template>
<xsl:template match="#*">"<xsl:value-of select="name()"/>" : '<xsl:value-of select="."/>',
</xsl:template>
</xsl:stylesheet>
Here im using Saxon PE in the oxygen:
I want this XML converted to 3 JSON files named header.json, content.json(mlu3_body) and footer.json in the output.
Is this possible by using XSLT or do I want to keep all input files separately. Please provide some ideas.
Change the XSLT to
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="*">
<xsl:value-of select="name()"/> : <xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template match="*" mode="ArrayElement">
<xsl:call-template name="Properties"/>
</xsl:template>
<xsl:template name="Properties">
<xsl:variable name="childName" select="name(*[1])"/>
<xsl:choose>
<xsl:when test="not(*|#*)">"<xsl:value-of select="."/>"</xsl:when>
<xsl:when test="count(*[name()=$childName]) > 1">{ "<xsl:value-of select="$childName"/>" :[<xsl:apply-templates select="*" mode="ArrayElement"/>] }</xsl:when>
<xsl:otherwise>{
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="*"/>
}</xsl:otherwise>
</xsl:choose>
<xsl:if test="following-sibling::*">,</xsl:if>
</xsl:template>
<xsl:template match="#*">"<xsl:value-of select="name()"/>" : '<xsl:value-of select="."/>',
</xsl:template>
<xsl:template match="header">
<xsl:result-document href="header.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="mlu3_body">
<xsl:result-document href="content.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
<xsl:template match="footer">
<xsl:result-document href="footer.json">
<xsl:next-match/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
and it should generate three result files for those three elements. You will have to edit your question and tell us exactly which result you want if the produced contents is not quite right yet. It is also not clear whether the snippet of XML you have shown is part of a larger document.

Trim first 10 characters off the title of RSS feed

I'm trying to write some xsl to style an RSS feed. I need to trim the first 10 characters off the title of each item.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/rss">
<ul>
<xsl:for-each select="channel/item">
<li><strong><xsl:value-of select="title"/>
</strong>
More</li>
</xsl:for-each>
</ul>
</xsl:template>
<xsl:template name="trimtitle">
<xsl:param name="string" select="." />
<xsl:if test="$string">
<xsl:text>Foo</xsl:text>
<xsl:call-template name="trimtitle">
<xsl:with-param name="string" select="substring($string, 10)" />
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="title">
<xsl:call-template name="title" />
<xsl:value-of select="." />
</xsl:template>
</xsl:stylesheet>
I think you should write your substring function as this:
substring($string,1, 10)
Look at here
http://www.zvon.org/xxl/XSLTreference/Output/function_substring.html
What are you doing in your trimtitle template?
Why are you calling trimtitle recursive..?
The easiest way to show a trimmed string is with:
<xsl:value-of select="substring(title,0,10)"/>

Resources