I've using asp.net, and I've got an xml file provided from a third party website. I want it to be scraped so it only display the first main node. My problem is there is no attribute on any of the node. How can I manage to remove them?
The following is the xml.
<?xml version="1.0"?>
<offers>
<class_offer>
<name><![CDATA[Learn to surf and save 52% at Muriwai Surf School]]></name>
<url>http://domain.co.nz/wai-surf-school-just-29</url>
<location>Auckland</location>
</class_offer>
<class_offer>
<name><![CDATA[$35 for a 30 minute luxury Slipper Bath experience for TWO]]></name>
<url>http://domain.co.nz/uxury-slipper-bath-experience-for-two</url>
<location>Auckland</location>
</class_offer>
<class_offer>
<name><![CDATA[Save 52% at Te Aroha Mineral Spas]]></name>
<url>http://domain.co.nz/rience-for-two-PLUS-massage</url>
<location>Auckland</location>
</class_offer>
</offers>
And I want it to be this below,only keeps the first (the last 2 "<class_offer>" has been removed, and "<location>" has been removed)
<?xml version="1.0"?>
<offers>
<class_offer>
<name><![CDATA[Learn to surf and save 52% at Muriwai Surf School]]></name>
<url>http://domain.co.nz/wai-surf-school-just-29</url>
</class_offer>
</offers>
I really have no idea what to do to remove without the attribute in the node. If anyone could help that'll be great! Thanks in advance.
From the top of my head:
XElement root = XElement.Parse(xml);
XElement firstNode = root.Element("offers").Elements().First();
firstNode.Element("location").Remove();
foreach(XElement x in root.Element("offers").Elements().Skip(1))
x.Remove();
Related
Hello everyone.
I am new to Atom and using atom to see xml files. (I didn't setup any additional packages yet. Version 1.19.4)
One of my xml files consist of many attributes. For example..
<book id="test_xml">
<class name="First_row" attrib_01="Grape" attrib_02="Apple" attrib_03="banana" attrib_04="Water melon" attrib_05="Orange" ... (and so on )
</book>
Every has 50 attributes at least.
First time I opened this xml file in atom editor, It shows every class in single line. (This is what I want.) But when I edit attribute value ("Melon" to "Apple"), atom editor breaks the line suddenly and showed one line to multi line like belows.
<book id="Fruit">
<class name="First_row" attrib_01="Grape" attrib_02="Apple"
attrib_03="banana" attrib_04="Water melon"
attrib_05="Orange" ... (and so on )
</book>
Without changing xml format, how to prevent split the single line to multi line?
Thank you.
I'm working with a well-structured XML file. So far, I have successfully accessed elements of this dataset that are only one layer/subfield deep. However, now I need to access one type of data that is more deeply embedded within this data structure, and the expected method is not working...
Excerpt from the XML data; this is the "target" field that I need to access, where each node (i.e. drug) can have between 0 and N targets (I am arbitrarily setting N to 20 for now, since I'm not sure what this value is for the entire dataset):
<targets> --> 51st field in each node
<target> --> there are a variable number of targets per drug
<id>BE0000048</id> --> this is the value I want for each Target
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes.
...
I have determined that the main Target field that I need is Field 51 within each node's structure, thus the hardcoded value below. I would think that accessing the i'th node's id value within the j'th target within the node's Target field should have an index of [[i]][[51]][[j]][[1]] or [[i]][[51]][[j]][['id']]:
This is my code that isn't working as expected:
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[51]][[j]][[1]]
}
}
The behavior I'm seeing is that I can extend the subscripts out numerous levels on the command line, and never narrow the result any more than the following:
> Data[[1]][[51]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
[1] "BE0000048ProthrombinHumaninhibitor10505536Turpie AG: Anticoagulants...
It doesn't seem to matter how many subscripts I add; all of the fields in the Target subfield are always conjoined and don't seem to be able to be separated...
Confusingly, when I run my code, I get the following error message:
Error in Data[[i]][[51]][[1]] : subscript out of bounds
... which doesn't seem to make sense, given that I am limiting i to the number of nodes, and that there is no error thrown for even the ridiculously long list of subscripts show above, when I query that phrase on the command line...
Thanks in advance for any insights you can provide.
Thanks for your suggestion, cderv; I will plan to check out the xml2 package and XPATH. I really appreciate your willingness to provide an example.
I am pasting what should be a functional subset of my XML file; however, now instead of the "targets" field being the 51st field, it is the sixth. Again, it is the targets --> target --> id value that I want to report for each target, with each node having a variable number of target values. My code follows the XML content.
<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.drugbank.ca/docs/drugbank.xsd" version="5.0" exported-on="2017-07-06">
<drug type="biotech" created="2005-06-13" updated="2016-08-17">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine at the N-terminal end of the molecule and the absence of a sulfate group on the tyrosine at position 63. It is produced via yeast cells. Bayer ceased the production of lepirudin (Refludan) effective May 31, 2012.</description>
<targets>
<target>
<id>BE0000048</id>
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
</article>
<article>
<pubmed-id>10912644</pubmed-id>
<citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
</article>
</articles>
</references>
<known-action>yes</known-action>
</target>
</targets>
</drug>
</drugbank>
Now that I have significantly truncated the above file, my code is now giving an error message that any subscripts above Data[[1]][[1]] are out of bounds, but hopefully this code gives you an idea of what I'm aiming to do...
library(XML)
# Save the database file as a tree structure
xmldata = xmlRoot(xmlTreeParse("DrugBank_TruncatedDatabase_v4_Tiny.xml"))
# Number of nodes in the entire database file
NumNodes <- xmlSize(xmldata)
MaxTargets <- 20
Data <- xmlSApply(xmldata, function(x) xmlSApply(x, xmlValue))
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[5]][[j]][[1]]
}
}
Thanks for your input!
I have a flat file below. I'm having trouble with the schema with this layout. I switched it around to have a header and a detail and created an application with no problems, but the customer won't change the layout. This is probably pretty basic, but I'm a beginner. How do I take certain fields from this layout and create a header and a detail? The last date field needs to be in the header so you can see how it's random.
PO207730CO|1271|customer 1|john doe|1|161075|161075|BROOM FLAGGED LOBBY|2|5.62|24-Feb-2014|
PO207730CO|1271|customer 1|john doe|2|167316|167316|CLEANER DISPATCH SPRAY HOSPITAL DISINFECTANT W/BLEACH|1|59.84|24-Feb-2014|
PO207730CO|1271|customer 1|john doe|3|162175|162175|DUST PAN LOBBY|2|6.26|24-Feb-2014|
PO207730CO|1271|customer 1|john doe|4|163325|163325|MOP WET LARGE GENERAL-PURPOSE BLUE WB/LP|1|18.45|24-Feb-2014|
PO207730CO|1271|customer 1|john doe|5|164715|164715|SOAP PROVON MEDICATED TFX|1|32.79|24-Feb-2014|
PO207730CO|1271|customer 1|john doe|6|166338|166338|TOWEL MULTI-FOLD SCOTT WHITE|5|18.91|24-Feb-2014|
PO207814CO|1264|customer 2|jane doe|1|Cups||Bib 20x35 2 Ply Lab (756220)|1|17.47|24-Feb-2014|
PO207814CO|1264|customer 2|jane doe|2|Cups||Cup 9oz Translucent (098219)|1|24-Feb-2014|
PO207814CO|1264|customer 2|jane doe|3|Cups||Cup Foam 16oz (177190)|2|35.1|24-Feb-2014|
PO207814CO|1264|customer 2|jane doe|4|Cups||Lid 16/20 Whte Tab W/Sslot (194088)|2|16.57|24-Feb-2014|
PO207814CO|1264|customer 2|jane doe|5|Cups||Tissue 2-Ply 100-Sht (343227)|3|16.38|24-Feb-2014|
The basic problem here is the Flat File Disassembler does not support the concept of splitting/debatching based on changing values, PO207730CO -> PO207814CO for example.
So, you'll have to regroup by PO number at some following step.
You have a few options:
Use a Custom XSLT Map to group the lines based on the PO Number,
then split, by using a Receive Pipeline on an Orchestration for example.
https://social.technet.microsoft.com/wiki/contents/articles/17985.xslt-muenchian-grouping-biztalk-complex-transformation.aspx
Use an xPath Debatching Pattern in an Orchestration. http://www.biztalkgurus.com/biztalk_server/biztalk_2004/m/biztalk_2004_samples/32438.aspx
Either way, you would parse the flat file as you are now, row by row.
Thanks for the replies.
I went with an SSIS package to preload a table and build the pipe delimited file. I have Biztalk then picking up that file and debatching it the way I need it.
I'm working with Spring MVC for portlets and liferay tabs. I'm having a problem to put spaces into the tab title. Let's say I want to define something like this inside a JSP:
<liferay-ui:tabs
names="Sample Tab 1, Sample Tab 2"
refresh="false"
value="${myControllerValue}"
>
<liferay-ui:section>
<jsp:include page='/jsp/myPage1.jsp' flush="true"/>
</liferay-ui:section>
<liferay-ui:section>
<jsp:include page='/jsp/myPage2.jsp' flush="true"/>
</liferay-ui:section>
</liferay-ui:tabs>
This is not working at all (Eventhough, it's exactly the example from the documentation) and the problem is just the spaces into the names (It works fine if I use names="tab1,tab2", but that's not what I want to show in the tab titles)
Besides, I need to control the tab I show from the controller. Something like this:
if(whatever){
renderrequest.setAttribute("myControllerValue", "Sample Tab 1");
}
And this causes another problem, because I need to show the tab names in several languages, so I'd need to pass the tab I want in the locale language to match the jsp id. The best thing to do would be to split the title from the tab id and use the tabValues param, but no idea how to do it...
I read something about redefine the Languages-ext.properties, but I just import the tab,
<%#taglib prefix="liferay-ui" uri="http://liferay.com/tld/ui" %>
So I don't have this properties file, and no clue how to solve it.
I'd really appreciate any kind of help with this issue.
Thanks in advance!
EDIT:
Trying to apply the answer posted below I'm having the next error:
07:26:12,297 ERROR [PortletLocalServiceImpl:542] com.liferay.portal.kernel.xml.DocumentException: Error on line 20 of document : cvc-complex-type.2.4.a: Invalid content was found starting with element 'resource-bundle'. One of '{"http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":portlet-info, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":portlet-preferences, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":security-role-ref, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":supported-processing-event, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":supported-publishing-event, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":supported-public-render-parameter, "http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd":container-runtime-option}' is expected.
And this is my portlet.xml file:
<portlet-app
version="2.0"
xmlns="http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd http://java.sun.com/xml/ns/portlet/portlet-app_2_0.xsd">
<portlet>
<portlet-name>MyAPP</portlet-name>
<display-name>MyAPP</display-name>
<portlet-class>org.springframework.web.portlet.DispatcherPortlet</portlet-class>
<init-param>
<name>contextConfigLocation</name>
<value>/WEB-INF/portlet/MyAPP-portlet.xml</value>
</init-param>
<expiration-cache>0</expiration-cache>
<supports>
<mime-type>text/html</mime-type>
<portlet-mode>VIEW</portlet-mode>
</supports>
<supported-locale>gl_ES</supported-locale>
<supported-locale>es_ES</supported-locale>
<resource-bundle>messages</resource-bundle>
<resource-bundle>content/Language</resource-bundle>
<portlet-info>
<title>MyAPP</title>
<short-title>MyAPP</short-title>
<keywords>MyAPP</keywords>
</portlet-info>
</portlet>
</portlet-app>
You can use Language.properties to have the exact title names you want in the tab.
In the <liferay-ui:tabs> you can have:
<liferay-ui:tabs
names="sample-tab-1, sample-tab-2"
refresh="false"
value="${myControllerValue}"
>
which are nothing but keys in the Language.properties files as:
sample-tab-1=Sample Tab 1
sample-tab-2=Sample Tab 2
You can define the Language.properties file in the portlet.xml as:
<portlet>
...
...
<resource-bundle>content/Language</resource-bundle>
...
</portlet>
And this file and other Language files would reside in the source package inside the content folder, something like this:
docroot
|
|--> src
|
|--> content
|--> Language.properties
|--> Language_en.properties
|--> Language_ja.properties
|--> Language_de.properties
...
So in your controller you would use:
if(whatever){
renderrequest.setAttribute("myControllerValue", "sample-tab-1");
}
More about Liferay Localization.
Here is an example of the XQuery output that I get:
<clinic>
<Name xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Healthy Kids Pediatrics</Name>
<Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">510 W 27th St, Los Angeles, CA 90007</Address>
<PhoneNumberList>213-555-5845</PhoneNumberList>
<NumberOfPatientGroups>2</NumberOfPatientGroups>
</clinic>
As you can see, in the <Name> and <Address> tag, there are these strange xmlns:xsi tags being added to it.
The funny thing is if I go to the top of my xml file, and remove:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="vaccination.xsl"?>
<Vaccination xsi:noNamespaceSchemaLocation="vaccination.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
the phrase
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Then now my XQuery XML output will look like this (which is what I want)
<clinic>
<Name>Healthy Kids Pediatrics</Name>
<Address>510 W 27th St, Los Angeles, CA 90007</Address>
<PhoneNumberList>213-555-5845</PhoneNumberList>
<NumberOfPatientGroups>2</NumberOfPatientGroups>
</clinic>
BUT, when I view my XML in my browser, it will give an error and display something like:
XML Parsing Error: prefix not bound to a namespace
Location: file:///C:/Users/Pac/Desktop/csci585-hw3/vaccination.xml
Line Number 3, Column 1:<Vaccination xsi:noNamespaceSchemaLocation="vaccination.xsd">
^
Does anyone have an idea of how to remove those xsi tags from my XQuery output without breaking my XML/XSL ?
Removing the namespace declaration from the top node makes the XML document invalid, as the xsi prefix is used but not declared. This should have caused an error when you try to load the document in a query.
I assume that the Name and Address nodes are copied directly from the source document and the other nodes are constructed.
When copying a node from the source document, the in scope namespaces from the source node are combined with the in scope namespaces in the node that contains the copy. The way these are combined is specified by the copy-namespaces-mode.
In your case you want namespaces to be inherited from the parent node (the node in the query), but you do not want to preserve namespaces in the source document where they are unnecessary.
This can be achieved by adding the following line to the top of the query:
declare copy-namespaces no-preserve, inherit;