I want to add an attribute to the dataset declared below whose value is the value of one of the field in the row.
So I want to add the id as shown below.
<root>
<Table id="GAS-405">
<apple>2009FA</apple>
<orange>3.00</orange>
<pear>BGPR</pear>
<banana>GAS-405</banana>
</Table>
</root>
This will help me identify the node later in my application.
Is this possible? Is this easier to do using XMLDocument?
Dim sdaFoo As SqlDataAdapter = New SqlDataAdapter("SELECT BLAH FROM BLAHBLAH", conn)
Dim dsFoo As DataSet = New DataSet()
dsFoo.DataSetName = "apple"
sdaFoo.Fill(dsFoo)
dsFoo.WriteXml("C:\Inetpub\wwwroot\foo.xml")
Dataset.WriteXml() is really a convenience method rather than a flexible way of dealing with XML.
You'll need to take another approach. There are a few options:
If you're just adding a single attribute, you could hack it into the resulting xml by re-opening the file as an XDocument, adding the attribute to the necessary element, and saving it again. Not too elegant, but easy, and sometimes easy is best. Even better, just use WriteXml() to put your xml into a string, then load the string as your XDocument.
Generate the XML from your query directly, rather than as a dataset. Sql Server 2005 and 2008 have some good XML methods that allow you to select a set of rows as XML (SELECT ... FOR XML) and specify what it looks like.
Use XmlSerialization for your dataset and inject the attribute using custom control of the serialization process.... which will be way more trouble than it's worth.
Store the attribute somewhere else outside of your XML and use some kind of object to keep track of it. Not really sure what your code is like, but that might be a great option.
Use GetXML() method of DataSet to get whole XML as a string. Then add your custom attributes and write that string into xml file using StreamWriter.
Related
There is this website that we purchase widgets from that provides details for each of their parts on its own webpage. Example: http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND. I have to find all of their parts that are in our database, and add Manufacturer and Manufacturer Part Number values to their fields.
I was told that there is a way for Visual Basic to access a webpage and extract information. If someone could point me in the right direction on where to start, I'm sure I can figure this out.
Thanks.
How to scrape a website using HTMLAgilityPack (VB.Net)
I agree that htmlagilitypack is the easiest way to accomplish this. It is less error prone than just using Regex. The following will be how I deal with scraping.
After downloading htmlagilitypack*dll, create a new application, add htmlagilitypack via nuget, and reference to it. If you can use Chrome, it will allow you to inspect the page to get information about where your information is located. Right-click on a value you wish to capture and look for the table that it is found in (follow the HTML up a bit).
The following example will extract all the values from that page within the "pricing" table. We need to know the XPath value for the table (this value is used to instruct htmlagilitypack on what to look for) so that the document we create looks for our specific values. This can be achieved by finding whatever structure your values are in and right click copy XPath. From this we get...
//*[#id="pricing"]
Please note that sometimes the XPath you get from Chrome may be rather large. You can often simplify it by finding something unique about the table your values are in. In this example it is "id", but in other situations, it could easily be headings or class or whatever.
This XPath value looks for something with the id equal to pricing, that is our table. When we look further in, we see that our values are within tbody,tr and td tags. HtmlAgilitypack doesn't work well with the tbody so ignore it. Our new XPath is...
//*[#id='pricing']/tr/td
This XPath says look for the pricing id within the page, then look for text within its tr and td tags. Now we add the code...
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[#id='pricing']/tr/td")
Next
To extract the values we simply reference our table value that was created in our loop and it's innertext member.
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[#id='pricing']/tr/td")
MsgBox(table.InnerText)
Next
Now we have message boxes that pop up the values...you can switch the message box for an arraylist to fill or whatever way you wish to store the values. Now simply do the same for whatever other tables you wish to get.
Please note that the Doc variable that was created is reusable, so if you wanted to cycle through a different table in the same page, you do not have to reload the page. This is a good idea especially if you are making many requests, you don't want to slam the website, and if you are automating a large number of scrapes, it puts some time between requests.
Scraping is really that easy. That's is the basic idea. Have fun!
Html Agility Pack is going to be your friend!
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports
plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor
XSLT to use it, don't worry...). It is a .NET code library that allows
you to parse "out of the web" HTML files. The parser is very tolerant
with "real world" malformed HTML. The object model is very similar to
what proposes System.Xml, but for HTML documents (or streams).
Looking at the source of the example page you provided, they are using HTML5 Microdata in their markup. I searched some more on CodePlex and found a microdata parser which may help too: MicroData Parser
I want to load XML in a single table of a dataset. I use following code
string val = getAbonentInfoParametr(ai,"abonentDescription");
DataSet ds = new DataSet();
ds.ReadXml(new StringReader(val));
but when I do this, I got three tables because in one node of XML file I now get HTML code that I want to have like a string field in my only table. What should I do?
Also I prefer not to use scheme files because the structure of that xml file can be changeable except several field that I use, please suggest me something.
Use CDATA to wrap around the HTML, otherwise there is no way to differentiate HTML from XML.
I need to create a service that will return XML containing data from the database. So I am thinking about using an ASHX that will accept things like date range and POST an XML file back. I have dealt with pages pulling data from SQL Server and populating into a datagrid for visual display but never into XML for delivery, what is the best way to do this? Also if an ASHX and POST isn't the best method for delivery let me know... thanks!
EDIT: These answers are great and pointing me in the right direction. I should have also mentioned that the XML format has already been decided so I can't use any automatically generated one.
Combining linq2sqlwith the XElement classes, something along the lines:
var xmlContacts =
new XElement("contacts",
(from c in context.Contacts
select new XElement("contact",
new XElement
{
new XElement("name", c.Name),
new XElement("phone", c.Phone),
new XElement("postal", c.Postal)
)
)
).ToArray()
)
);
Linq2sql will retrieve the data in a single call to the db, and the processing of the XML will be done at the business server. This splits the load better, since you don't have the sql server doing all the job.
Have you tried DataSet.WriteXml()?
You could have this be the output of a web service call.
Sql Server 2005 and above has a "FOR XML AUTO" command that will convert your recordset to XML for you. Then you just have to return a string from your ASHX.
Beginning with SQL Server 2000, you can return query results as XML. For absolute control of them, use the "FOR XML EXPLICIT" command. You can use any format you desire that way.
http://msdn.microsoft.com/en-us/library/ms189068.aspx
It's as easy as writing your result to the raw output then. For added points, you can return the result set to a XPathDocument, pass it through an XSL transformation, and send the results out in any format you choose (HTML vs XML at the click of a button perhaps).
you can obtained that to a datatable and then call myTable.WriteXML()
if you are populating classes with the your database results then add the serializable attribute to the header your classes, and use the XMLSerializer
Converting an automatically generated format into a specified one is a job for xslt. Just find a way to run the output from the tool through an xslt filter.
Oracle has a great product for doing exactly this job - the oracle XDK. But it's a java thing, not ASP as far as I know.
For an example, this XHTML
http://www.anbg.gov.au/abrs/online-resources/flora/stddisplay.xsql?pnid=2524
is generated automatically from this XML, which is generated by oracle
http://www.anbg.gov.au/abrs/online-resources/flora/stddisplay.xsql?pnid=2524&xml-stylesheet=none
Of course, you are not after XHTML, but some other XML format. But XSLT will do the job.
I want to read an specific xml node and its value for example
<customers>
<name>John</name>
<lastname>fetcher</lastname>
</customer>
and my code behind should be some thing like this (I don't know how it should be though):
Response.Write(xml.Node["name"].Value)
As I said it is just an example because I don't know how to do it.
The most basic answer:
Assuming "xml" is an XMLDocument, XMLNodeList, XMLNode, etc...
Response.Write(xml.SelectSingleNode("//name").innerText)
Which version of .NET are you using? If you're using .NET 3.5 and can use LINQ to XML, it's as simple as:
document.Descendant("name").Value
(except with some error handling!) If you're stuk with the DOM API, you might want:
document.SelectSingleNode("//name").InnerText
Note that this hasn't shown anything about how you'd read the XML in the first place - if you need help with that bit, please give more detail in the question.
If using earlier versions of the .Net framework, take a look at the XMLDocument class first as this is what you'd load the XML string into. Subclasses like XMLElement and XMLNode are also useful for doing some of this work.
haven't tried testing it but should point you in the right direction anyway
'Create the XML Document
Dim l_xmld As XmlDocument
'Create the XML Node
Dim l_node As XmlNode
l_xmld = New XmlDocument
'Load the Xml file
l_xmld.LoadXml("XML Filename as String")
'get the attributes
l_node = l_xmld.SelectSingleNode("/customers/name")
Response.Write(l_node.InnerText)
I'm building a listing/grid control in a Flex application and using it in a .NET web application. To make a really long story short I am getting XML from a webservice of serialized objects. I have a page limit of how many things can be on a page. I've taken a data grid and made it page, sort across pages, and handle some basic filtering.
In regards to paging I'm using a Dictionary keyed on the page and storing the XML for that page. This way whenever a user comes back to a page that I've saved into this dictionary I can grab the XML from local memory instead of hitting the webservice. Basically, I'm caching the data retrieved from each call to the webservice for a page of data.
There are several things that can expire my cache. Filtering and sorting are the main reason. However, a user may edit a row of data in the grid by opening an editor. The data they edit could cause the data displayed in the row to be stale. I could easily go to the webservice and get the whole page of data, but since the page size is set at runtime I could be looking at a large amount of records to retrieve.
So let me now get to the heart of the issue that I am experiencing. In order to prevent getting the whole page of data back I make a call to the webservice asking for the completely updated record (the editor handles saving its data).
Since I'm using custom objects I need to serialize them on the server to XML (this is handled already for other portions of our software). All data is handled through XML in e4x. The cache in the Dictionary is stored as an XMLList.
Now let me show you my code...
var idOfReplacee:String = this._WebService.GetSingleModelXml.lastResult.*[0].*[0].#Id;
var xmlToReplace:XMLList = this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee);
if(xmlToReplace.length() > 0)
{
delete (this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0]);
this._DataPages[this._Options.PageIndex].Data += this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
}
Basically, I get the id of the node I want to replace. Then I find it in the cache's Data property (XMLList). I make sure it exists since the filter on the second line returns the XMLList.
The problem I have is with the delete line. I cannot make that line delete that node from the list. The line following the delete line works. I've added the node to the list.
How do I replace or delete that node (meaning the node that I find from the filter statement out of the .Data property of the cache)???
Hopefully the underscores for all of my variables do not stay escaped when this is posted! otherwise this._ == this._
Thanks for the answers guys.
#Theo:
I tried the replace several different ways. For some reason it would never error, but never update the list.
#Matt:
I figured out a solution. The issue wasn't coming from what you suggested, but from how the delete works with Lists (at least how I have it in this instance).
The Data property of the _DataPages dictionary object is list of the definition nodes (was arrived at by a previous filtering of another XML document).
<Models>
<Definition Id='1' />
<Definition Id='2' />
</Models>
I ended up doing this little deal:
//gets the index of the node to replace from the same filter
var childIndex:int = (this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0]).childIndex();
//deletes the node from the list
delete this._DataPages[this._Options.PageIndex].Data[childIndex];
//appends the new node from the webservice to the list
this._DataPages[this._Options.PageIndex].Data += this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
So basically I had to get the index of the node in the XMLList that is the Data property. From there I could use the delete keyword to remove it from the list. The += adds my new node to the list.
I'm so used to using the ActiveX or Mozilla XmlDocument stuff where you call "SelectSingleNode" and then use "replaceChild" to do this kind of stuff. Oh well, at least this is in some forum where someone else can find it. I do not know the procedure for what happens when I answer my own question. Perhaps this insight will help someone else come along and help answer the question better!
Perhaps you could use replace instead?
var oldNode : XML = this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0];
var newNode : XML = this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
oldNode.parent.replace(oldNode, newNode);
I know this is an incredibly old question, but I don't see (what I think is) the simplest solution to this problem.
Theo had the right direction here, but there's a number of errors with the way replace was being used (and the fact that pretty much everything in E4X is a function).
I believe this will do the trick:
oldNode.parent().replace(oldNode.childIndex(), newNode);
replace() can take a number of different types in the first parameter, but AFAIK, XML objects are not one of them.
I don't immediately see the problem, so I can only venture a guess. The delete line that you've got is looking for the first item at the top level of the list which has an attribute "Id" with a value equal to idOfReplacee. Ensure that you don't need to dig deeper into the XML structure to find that matching id.
Try this instead:
delete (this._DataPages[this._Options.PageIndex].Data..(#Id == idOfReplacee)[0]);
(Notice the extra '.' after Data). You could more easily debug this by setting a breakpoint on the second line of the code you posted, and ensure that the XMLList looks like you expect.