XmlDocument difficulties returning specific nodes/elements - xml-namespaces

first post. I hope it meets with the rules of asking questions.
I'm in a bit of bother with an xml document (its an API returned Xml). Now it uses a multitude of internet (http based) security measures which I have worked thru and I am now able to return the the top tier of nodes that are not nested.
however there are a few nodes which are nested under these and I need to return some of these values.
I'm set on using XMLDocument to do this, and I'm not interested in using XPath.
I should also note that I'm using the .Net 4.5 environment.
Example XML
<?xml version="1.0" encoding="utf-8"?>
<results>
<Info xmlns="http://xmlns.namespace">
<title>This Title</title>
<ref>
<SetId>317</SetId>
</ref>
<source>
<name>file.xxx</name>
<type>thisType</type>
<hash>cc7b99599c1bebfc4b8f12e47aba3f76</hash>
<pers>65.97602</pers>
<time>02:20:02.8527777</time>
</source>
....... Continuation which is same as above
Ok so above is the Xml that gets returned from the API, now, I can return title node no problem. What I would also like to return is any of the node values in the Element, for example the pers node value. But I only want to return one (as there are many in the existing xml further down)
Please note that there is an xmlns in the Info node which may not be allowing me to return the values.
So here is my code
using (var response = (HttpWebResponse) request.GetResponse())
{
//Get the response stream
using (Stream stream = response.GetResponseStream())
{
if (stream != null)
{
var xDoc = new XmlDocument();
var nsm = new XmlNamespaceManager(xDoc.NameTable);
nsm.AddNamespace("ns", XmlNamespace);
//Read the response stream
using (XmlReader xmlReader = XmlReader.Create(stream))
{
// This is straight forward, we just need to read the XML document and return the bits we need.
xDoc.Load(xmlReader);
XmlElement root = xDoc.DocumentElement;
var cNodes = root.SelectNodes("/results/ns:Info", nsm);
//Create a new instance of Info so that we can store any data found in the Info Properties.
var info = new Info();
// Now we have a collection of Info objects
foreach (XmlNode node in cNodes)
{
// Do some parsing or other relevant filtering here
var title = node["title"];
if (title != null)
{
info.Title = title.InnerText;
_logger.Info("This is the title returned ############# {0}", info.Title);
}
//This is the bit that is killing me as i can't return the any values in the of the sub nodes
XmlNodeList sourceNodes = node.SelectNodes("source");
foreach (XmlNode sn in sourceNodes)
{
XmlNode source = sn.SelectSingleNode("source");
{
var pers = root["pers"];
if (pers != null) info.pers = pers.InnerText;
_logger.Info("############FPS = {0}", info.pers);
}
}
}
}
Thanks in advance for any help

So I finally figured it out.
Here is the code that gets the subnodes. Basically I wasn't using my namespace identifier or my namespace for returning subnodes within the "Source" node.
For anybody else in this situation,
When you declare your name space there are to parts to it, a namespace identifier which is anything you want it to be in my case I chose "ns" and then the actual namespace in the XML file which is prefixed by xmlns and will contain something like for example: "http://xmlns.mynamespace".
So when searching subnodes inside the top level you need to declare these namespaces for the main node of the subnode you want to get.
// get the <source> subnode using the namespace to returns all <source> values
var source = node.SelectSingleNode("ns:source", nsm);
if (source != null)
{
info.SourceType = source["type"].InnerText;
info.Pers = source["pers"].InnerText;
_logger.Info("This SourceNode is {0}", info.SourceType);
_logger.Info("This PersNode is {0}", info.FramesPerSecond);
}
I hope this helps somebody else that's chasing their tails as I have.
Thanks

Related

scraping html without htmlagilitypack

Due to the limitation of the system, i am not allowed to use htmlagilitypack as i dont have the rights to refer the library. So i can only use native asp.net programming language to parse page.
e.g. i want to scrap this page https://sg.linkedin.com/job/google/jobs/ to get the list of google jobs ( just an example, i am not really planning to get this list but my own company's) , i see they are under how can i extra these jobs description and name.
My current codes are
System.Net.WebClient client = new System.Net.WebClient();
try{
System.IO.Stream myStream = client.OpenRead("https://sg.linkedin.com/job/google/jobs/");
System.IO.StreamReader sr = new System.IO.StreamReader(myStream);
string htmlContent = sr.ReadToEnd();
//do not know how to carry on
}catch(Exception e){
Response.Write(e.Message);
}
how can i carry on?
You can fetch that page and use a regular expression to isolate the useful parts. If you get real lucky, you may have a valid XML file:
var html = new WebClient().DownloadString("https://sg.linkedin.com/job/google/jobs/");
var jobs = new XmlDocument();
jobs.LoadXml(Regex.Replace(Regex.Match(html,
#"<ul class=""jobs"">[\s\S]*?</ul>").Value,
#"itemscope | itemprop="".*?""", "")); // clean invalid attributes
foreach (XmlElement job in jobs.SelectNodes("//li[#class='job']"))
{
Console.WriteLine(job.SelectSingleNode(".//a[#class='company']").InnerText);
Console.WriteLine(job.SelectSingleNode(".//h2/a").InnerText);
Console.WriteLine(job.SelectSingleNode(".//p[#class='abstract']").InnerText);
Console.WriteLine();
}

Reading all components from folder and subfolder

I am working on Tridon 2009 using .NET Templating C# 2.0
I need to read all the components from folders and its subfolder.
If in my code I write:
OrganizationalItem imageFolder =
(OrganizationalItem)m_Engine.GetObject(comp.OrganizationalItem.Id);
I am able to read all the components in subfolder from the place where indicator component is present, but I am not able to read other components present in the folder where indicator is present.
But If I write
OrganizationalItem imageFolder = (OrganizationalItem)m_Engine.GetObject(
comp.OrganizationalItem.OrganizationalItem.Id);
then I am able to read only folder where indicator component is present.
Below is my code.
XmlDocument doc = xBase.createNewXmlDocRoot("ImageLibrary");
XmlElement root = doc.DocumentElement;
Filter filter = new Filter();
Component comp = this.GetComponent();
filter.Conditions["ItemType"] = ItemType.Folder;
filter.Conditions["Recursive"] = "true";
OrganizationalItem imageFolder =
(OrganizationalItem)m_Engine.GetObject(comp.OrganizationalItem.Id);
XmlElement itemList = imageFolder.GetListItems(filter);
foreach (XmlElement itemImg in itemList)
{
filter.Conditions["ItemType"] = ItemType.Component;
filter.Conditions["BasedOnSchema"] = comp.Schema.Id;
OrganizationalItem imgFolder =
(OrganizationalItem)m_Engine.GetObject(itemImg.GetAttribute("ID")
.ToString());
XmlElement imageLibs = imgFolder.GetListItems(filter);
doc = this.createImageNodes(imageLibs, doc, filter, comp);
foreach (XmlElement imglib in imageLibsList)
{
XmlElement imageroot = doc.CreateElement("Image");
XmlElement uploadeddateNode = doc.CreateElement("DateUploaded");
Component imgComp =
(Component)m_Engine.GetObject(imglib.GetAttribute("ID"));
}
}
Please suggest.
I see a lot of superfluous code on your snippet regarding the question "Reading all components from folder and subfolder"
But answering the question itself, when you are doing:
OrganizationalItem imageFolder = (OrganizationalItem)m_Engine.GetObject(comp.OrganizationalItem.Id);
Your are not being able to read components present on that folder, because you have previously set the filter to folders only on the following line:
filter.Conditions["ItemType"] = ItemType.Folder;
Solution:
If you want to retrieve all components on the "indicator component" folder and below, you need to set the filter on your first search as following:
filter.Conditions["Recursive"] = "true";
filter.Conditions["ItemType"] = ItemType.Component;
filter.Conditions["BasedOnSchema"] = comp.Schema.Id;
And perform the search:
OrganizationalItem imageFolder = (OrganizationalItem)m_Engine.GetObject(comp.OrganizationalItem.Id);
XmlElement itemList = imageFolder.GetListItems(filter);
Pretty basic stuff. Try to avoid using Filter class, since it was deprecated in 2009, and use GetListItems as much as possible as fetching lists is ALWAYS faster.
public class GetComponentsInSameFolder : ITemplate
{
public void Transform(Engine engine, Package package)
{
TemplatingLogger log = TemplatingLogger.GetLogger(GetType());
if (package.GetByName(Package.ComponentName) == null)
{
log.Info("This template should only be used with Component Templates. Could not find component in package, exiting");
return;
}
var c = (Component)engine.GetObject(package.GetByName(Package.ComponentName));
var container = (Folder)c.OrganizationalItem;
var filter = new OrganizationalItemItemsFilter(engine.GetSession()) { ItemTypes = new[] { ItemType.Component } };
// Always faster to use GetListItems if we only need limited elements
foreach (XmlNode node in container.GetListItems(filter))
{
string componentId = node.Attributes["ID"].Value;
string componentTitle = node.Attributes["Title"].Value;
}
// If we need more info, use GetItems instead
foreach (Component component in container.GetItems(filter))
{
// If your filter is messed up, GetItems will return objects that may
// not be a Component, in which case the code will blow up with an
// InvalidCastException. Be careful with filter.ItemTypes[]
Schema componentSchema = component.Schema;
SchemaPurpose purpose = componentSchema.Purpose;
XmlElement content = component.Content;
}
}
}
I'd think you'd want to collect sub folders and recursively call your function for each of them, which seems like what you're trying to achieve.
Is this function called createImageNodes() and where do you set imageLibsList?
It looks like you're treating each item as a folder in your first loop, what about the components?

Iterating through xmltextreader

I have a xml in the following format.
<?xml version="1.0" encoding="UTF-8" standalone= "yes"?>
<rss>
<report name="rpt1">
<title>AAA</title>
<image></image>
<weblink></weblink>
<pdflink></pdflink>
<pdfsize></pdfsize>
</report>
<report name="rpt2">
<title>BBB</title>
<image>CCC</image>
<weblink>DDD</weblink>
<pdflink>EEE</pdflink>
<pdfsize>FFF</pdfsize>
</report>
</rss>
Now i want to iterate this xml and get the report node and from there get childnodes like title/pdflink/size etc which would be thru. looping using for loop. I want to use xmltextreader to accompalish this. I tried using while but i get only 1 loop after iterating. I dont know why. If thru for loop how do i iterate like,
for(loop when reader.element("reports)){} and then get the rest of the nodes and put them in an array or list or so. Once i get them stored in list i would want to dipaly them ina feed. which is a best way to do this? pls help.
In my case I was worried about the performance of loading a large document. What I have done is define a constructor on my objects to receive a XmlReader and hydrate - passing the reader back after it reaches a complete node.
This allows me to yield a populated object back as IEnumerable for each object as it's being read. Then I launch a new Task/Thread to handle processing that individual item and go back to processing the file.
private IEnumerable<Report> readReports(Stream reader)
{
var settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
var xmlReader = XmlReader.Create(reader, settings);
xmlReader.MoveToContent();
xmlReader.Read();
while (!xmlReader.EOF)
{
if (xmlReader.Name.ToUpper() == "report")
yield return new Report(xmlReader);
xmlReader.Read();
}
}
public Report(XmlReader reader) : this()
{
reader.MoveToContent();
if (reader.IsEmptyElement)
{
reader.Read();
return;
}
reader.Read();
while (!reader.EOF)
{
if (reader.IsStartElement())
{
switch (reader.Name.ToLower())
{
case "order_id":
this.OrderId = reader.ReadElementContentAsString();
break;
// abreviated the rest of the fields
default:
reader.Skip();
break;
}
}
else if (reader.Name.ToLower() == "report") //this watches for the end element of the container and returns after populating all properties
return;
}
}
I would definitly appreciate any feedback from the rest of the community, if there is a better approach or if there are any errors here please let me know.

XmlWriter - reading an attribute (quick question)

I'm using this for my code, it outputs to the xml file perfectly, but it adds an ' = ' sign after the element name even though only one of my elements has an attribute.
I suppose I could do something like
if(reader.Getattribute != "")
// I made that up on the spot, I'm not sure if that would really work
{
Console.WriteLine("<{0} = {1}>", reader.Name, reader.GetAttribute("name"));
}
else
{
Console.WriteLine("<{0}>", reader.Name);
}
but is there a cleaner way to code that?
My code (without workaround)
using System;
using System.Xml;
using System.IO;
using System.Text;
public class MainClass
{
private static void Main()
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
XmlWriter w = XmlWriter.Create(#"Path\test.xml", settings);
w.WriteStartDocument();
w.WriteStartElement("classes");
w.WriteStartElement("class");
w.WriteAttributeString("name", "EE 999");
w.WriteElementString("Class_Name", "Programming");
w.WriteElementString("Teacher", "James");
w.WriteElementString("Room_Number", "333");
w.WriteElementString("ID", "2324324");
w.WriteEndElement();
w.WriteEndDocument();
w.Flush();
w.Close();
XmlReader reader = XmlReader.Create(#"Path\test.xml");
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
Console.WriteLine("<{0} = {1}>", reader.Name, reader.GetAttribute("name"));
break;
case XmlNodeType.Text:
Console.WriteLine(reader.Value);
break;
case XmlNodeType.CDATA:
Console.WriteLine("<[CDATA[{0}]>", reader.Value);
break;
case XmlNodeType.ProcessingInstruction:
Console.WriteLine("<?{0} {1}?>", reader.Name, reader.Value);
break;
case XmlNodeType.Comment:
Console.WriteLine("<!--{0}-->", reader.Value);
break;
case XmlNodeType.XmlDeclaration:
Console.WriteLine("<?xml version='1.0'?>");
break;
case XmlNodeType.Document:
break;
case XmlNodeType.DocumentType:
Console.WriteLine("<!DOCTYPE {0} [{1}]", reader.Name, reader.Value);
break;
case XmlNodeType.EntityReference:
Console.WriteLine(reader.Name);
break;
case XmlNodeType.EndElement:
Console.WriteLine("</{0}>", reader.Name);
break;
}
}
}
}
Output
<?xml version='1.0'?>
<classes = >
<class = EE 999>
<Class_Name = >
Programming
</Class_Name>
<Teacher = >
James
</Teacher>
<Room_Number = >
333
</Room_Number>
<ID = >
2324324
</ID>
</class>
</classes>
Because this line
case XmlNodeType.Element:
Console.WriteLine("<{0} = {1}>", reader.Name, reader.GetAttribute("name"));
break;
Always writes the '=' without checking.
A rough fix :
case XmlNodeType.Element:
Console.WriteLine("<{0}", reader.Name);
if (reader.HasAttributes)
// Write out attributes
Console.WriteLine(">");
break;
But why are you using the XmlReader at all? It is cumbersome and only useful when dealing with huge Xml streams.
If your datasets are not >> 10 MB then take a look at XDocument or XmlDocument
The XmlWriter in your Example can be replaced by (rough approx):
// using System.Xml.Linq;
var root = new XElement("classes",
new XElement("class", new XAttribute("name", "EE 999"),
new XElement("Class_Name", "Programming"),
new XElement("Teacher", "James")
));
root.Save(#"Path\test.xml");
var doc = XDocument.Load(#"Path\test.xml");
// doc is now an in-memory tree of XElement objects
// that you can navigate and query
And here is an intro
I don't know exactly what you're trying to accomplish but personally I would create a .NET class representing your class element with properties identifying the sub elements then use System.Xml.Serialization.XmlSerializer to write or read it from a file.
Here is an example:
using System.Xml.Serialization;
public class MyClasses : List<MyClass>{}
public class MyClass{
public String Teacher{ get; set; }
}
void main(){
MyClasses classList = new MyClasses();
MyClass c = new MyClass();
c.Teacher = "James";
classList.Add(c);
XmlSerializer serializer = new XmlSerializer(classList.GetType());
serializer.Serialize(/*Put your stream here*/);
}
And, after leaving setting up your stream as an exercise to the reader, blamo, you're done outputing an XML representation of your object to some stream. The stream could be a file, string, whatever. Sorry for nasty C# (if its nasty) I use VB.NET everyday so the syntax and keywords may be a little off.
Update
I added some code to show how to serialize a collection of the classes. If nodes aren't coming out named correctly there are attributes you can add to your class properties, just do a quick google for them.
Update again
Sorry, its hard to explain when we're using the same word to mean two different things. Lets say you're trying to represent a bucket of bricks. You would write a C# class called Brick and a C# class called Bucket that inherited from List<Brick> your Brick would have a property called Color. You would then make all your bricks with different colors and fill the bucket with your bricks. Then you would pass your bucket to the serializer and it would give you something like:
<Bucket>
<Brick>
<Color>
blue
</Color>
</Brick>
</Bucket>
The serializer builds the XML for you from the definitions of your classes so you don't have to worry about the details. You can read more about it here and here

NVelocity not finding the template

I'm having some difficulty with using NVelocity in an ASP.NET MVC application. I'm using it as a way of generating emails.
As far as I can make out the details I'm passing are all correct, but it fails to load the template.
Here is the code:
private const string defaultTemplatePath = "Views\\EmailTemplates\\";
...
velocityEngine = new VelocityEngine();
basePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, defaultTemplatePath);
ExtendedProperties properties = new ExtendedProperties();
properties.Add(RuntimeConstants.RESOURCE_LOADER, "file");
properties.Add(RuntimeConstants.FILE_RESOURCE_LOADER_PATH, basePath);
velocityEngine.Init(properties);
The basePath is the correct directory, I've pasted the value into explorer to ensure it is correct.
if (!velocityEngine.TemplateExists(name))
throw new InvalidOperationException(string.Format("Could not find a template named '{0}'", name));
Template result = velocityEngine.GetTemplate(name);
'name' above is a valid filename in the folder defined as basePath above. However, TemplateExists returns false. If I comment that conditional out and let it fail on the GetTemplate method call the stack trace looks like this:
at NVelocity.Runtime.Resource.ResourceManagerImpl.LoadResource(String resourceName, ResourceType resourceType, String encoding)
at NVelocity.Runtime.Resource.ResourceManagerImpl.GetResource(String resourceName, ResourceType resourceType, String encoding)
at NVelocity.Runtime.RuntimeInstance.GetTemplate(String name, String encoding)
at NVelocity.Runtime.RuntimeInstance.GetTemplate(String name)
at NVelocity.App.VelocityEngine.GetTemplate(String name)
...
I'm now at a bit of an impasse. I feel that the answer is blindingly obvious, but I just can't seem to see it at the moment.
Have you considered using Castle's NVelocityTemplateEngine?
Download from the "TemplateEngine Component 1.1 - September 29th, 2009" section and reference the following assemblies:
using Castle.Components.Common.TemplateEngine.NVelocityTemplateEngine;
using Castle.Components.Common.TemplateEngine;
Then you can simply call:
using (var writer = new StringWriter())
{
_templateEngine.Process(data, string.Empty, writer, _templateContents);
return writer.ToString();
}
Where:
_templateEngine is your NVelocityTemplateEngine
data is your Dictionary of information (I'm using a Dictionary to enable me to access objects by a key ($objectKeyName) in my template.
_templateContents is the actual template string itself.
I hope this is of help to you!
Just to add, you'll want to put that into a static method returning a string of course!
Had this issue recently - NVelocity needs to be initialised with the location of the template files. In this case mergeValues is an anonymous type so in my template I can just refer to $Values.SomeItem:
private string Merge(Object mergeValues)
{
var velocity = new VelocityEngine();
var props = new ExtendedProperties();
props.AddProperty("file.resource.loader.path", #"D:\Path\To\Templates");
velocity.Init(props);
var template = velocity.GetTemplate("MailTemplate.vm");
var context = new VelocityContext();
context.Put("Values", mergeValues);
using (var writer = new StringWriter())
{
template.Merge(context, writer);
return writer.ToString();
}
}
Try setting the file.resource.loader.path
http://weblogs.asp.net/george_v_reilly/archive/2007/03/06/img-srchttpwwwcodegenerationnetlogosnveloc.aspx
Okay - So I'm managed to get something working but it is a bit of a hack and isn't anywhere near a solution that I want, but it got something working.
Basically, I manually load in the template into a string then pass that string to the velocityEngine.Evaluate() method which writes the result into the the given StringWriter. The side effect of this is that the #parse instructions in the template don't work because it still cannot find the files.
using (StringWriter writer = new StringWriter())
{
velocityEngine.Evaluate(context, writer, templateName, template);
return writer.ToString();
}
In the code above templateName is irrelevant as it isn't used. template is the string that contains the entire template that has been pre-loaded from disk.
I'd still appreciate any better solutions as I really don't like this.
The tests are the ultimate authority:
http://fisheye2.atlassian.com/browse/castleproject/NVelocity/trunk/src/NVelocity.Tests/Test/ParserTest.cs?r=6005#l122
Or you could use the TemplateEngine component which is a thin wrapper around NVelocity that makes things easier.

Resources