HTTPModule filter questions - asp.net

I have one issues I'm struggling with with regards to my HTTPModule filter:
1) I notice that the module gets it's data in chunks. This is problematic for me because I'm using a regex to find and replace. If I get a partial match in one chunk and the rest of the match in the second, it will not work. Is there any way to get the entire response before I do my thing to it? I have seen code where it appends data to a string builder until it uses a matches on an "" end tag but my code must work for more that just (xml, custom tags, etc). I don't know how to detect the End Of Stream or if that is even possible.
I am attaching the filter in the BeginRequest.

Have a look at this example. It looks for "" in the stream of the page.

Here's a sample project which performs buffered search and replace in an HttpModule using a Request.Filter and Response.Filter. You should be able to adapt this technique to perform a Regex easily.
https://github.com/snives/HttpModuleRewrite

Related

Retrieve data from a website via Visual Basic

There is this website that we purchase widgets from that provides details for each of their parts on its own webpage. Example: http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND. I have to find all of their parts that are in our database, and add Manufacturer and Manufacturer Part Number values to their fields.
I was told that there is a way for Visual Basic to access a webpage and extract information. If someone could point me in the right direction on where to start, I'm sure I can figure this out.
Thanks.
How to scrape a website using HTMLAgilityPack (VB.Net)
I agree that htmlagilitypack is the easiest way to accomplish this. It is less error prone than just using Regex. The following will be how I deal with scraping.
After downloading htmlagilitypack*dll, create a new application, add htmlagilitypack via nuget, and reference to it. If you can use Chrome, it will allow you to inspect the page to get information about where your information is located. Right-click on a value you wish to capture and look for the table that it is found in (follow the HTML up a bit).
The following example will extract all the values from that page within the "pricing" table. We need to know the XPath value for the table (this value is used to instruct htmlagilitypack on what to look for) so that the document we create looks for our specific values. This can be achieved by finding whatever structure your values are in and right click copy XPath. From this we get...
//*[#id="pricing"]
Please note that sometimes the XPath you get from Chrome may be rather large. You can often simplify it by finding something unique about the table your values are in. In this example it is "id", but in other situations, it could easily be headings or class or whatever.
This XPath value looks for something with the id equal to pricing, that is our table. When we look further in, we see that our values are within tbody,tr and td tags. HtmlAgilitypack doesn't work well with the tbody so ignore it. Our new XPath is...
//*[#id='pricing']/tr/td
This XPath says look for the pricing id within the page, then look for text within its tr and td tags. Now we add the code...
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[#id='pricing']/tr/td")
Next
To extract the values we simply reference our table value that was created in our loop and it's innertext member.
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[#id='pricing']/tr/td")
MsgBox(table.InnerText)
Next
Now we have message boxes that pop up the values...you can switch the message box for an arraylist to fill or whatever way you wish to store the values. Now simply do the same for whatever other tables you wish to get.
Please note that the Doc variable that was created is reusable, so if you wanted to cycle through a different table in the same page, you do not have to reload the page. This is a good idea especially if you are making many requests, you don't want to slam the website, and if you are automating a large number of scrapes, it puts some time between requests.
Scraping is really that easy. That's is the basic idea. Have fun!
Html Agility Pack is going to be your friend!
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports
plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor
XSLT to use it, don't worry...). It is a .NET code library that allows
you to parse "out of the web" HTML files. The parser is very tolerant
with "real world" malformed HTML. The object model is very similar to
what proposes System.Xml, but for HTML documents (or streams).
Looking at the source of the example page you provided, they are using HTML5 Microdata in their markup. I searched some more on CodePlex and found a microdata parser which may help too: MicroData Parser

Read and modify POST fields "on-the-fly" using Fiddler

I need to use Fiddler to modify the POST fields sent by a browser. I know I can do that using the Fiddler UI but I want to create a script to do it automatically.
I need to insert the code inside the OnBeforeRequest method and I know I can use regular expressions to parse the POST fields but maybe there is something already available to do it like some sort of object POST with all the current fields, e.g: POST["field1"], POST["field2"], etc.
So...is it possible or do I have to do it manually?
Thanks!
Fiddler itself does not contain a script-accessible POST body parser, which means you'd either need to import one, write one, or use string processing to accomplish this task.

Nesting HTTP GET parameters (request within a request)

I want to call a JSP with GET parameters within the GET parameter of a parent JSP. The URL for this would be http://server/getMap.jsp?lat=30&lon=-90&name=http://server/getName.jsp?lat1=30&lon1=-90
getName.jsp will return a string that goes in the name parameter of getMap.jsp.
I think the problem here is that &lon1=-90 at the end of the URL will be given to getMap.jsp instead of getName.jsp. Is there a way to distinguish which GET parameter goes to which URL?
One idea I had was to encode the second URL (e.g. = -> %3D and & -> %26) but that didn't work out well. My best idea so far is to allow only one parameter in the second URL, comma-delimited. So I'll have http://server/getMap.jsp?lat=30&lon=-90&name=http://server/getName.jsp?params=30,-90 and leave it up to getName.jsp to parse its variables. This way I leave the & alone.
NOTE - I know I can approach this problem from a completely different angle and avoid nested URLs altogether, but I still wonder (for the sake of knowledge!) if this is possible or if anyone has done it...
This has been done a lot, especially with ad serving technologies and URL redirects
But an encoded URL should just work fine. You need to completely encode it tho. A generator can be found here
So this:
http://server/getMap.jsp?lat=30&lon=-90&name=http://server/getName.jsp?lat1=30&lon1=-90
becomes this: http://server/getMap.jsp?lat=30&lon=-90&name=http%3A%2F%2Fserver%2FgetName.jsp%3Flat1%3D30%26lon1%3D-90
I am sure that jsp has a function for this. Look for "urlencode". Your JSP will see the contents of the GET-Variable "name" as the unencoded string: "http://server/getName.jsp?lat1=30&lon1=-90"

How to generate an archive of multiple pages in Spring MVC?

I want to allow my users to "bulk export" an archive of selected resources, i.e., http://.../resource/1, resource/2, resource/4, ... ,
My thought was "render the HTML of each page to a string and use java.util.zip to create a multifile archive."
My problem then became "how to get the HTML of a page so that I can loop over them?"
I cannot figure out a way to get a JstlView to render to a String, nor can I see a way to set the ServletOutputStream to be a ZipOutputStream.
My last thought is to actually GET the HTML of each of the resources via HTTP. I imagine that will be easy enough to code, but it seems pretty byzantine. Is there a better way? (Perhaps something with RequestDispatcher.forward()? )
Use a SwallowingHttpServletResponse from DWR (or a PageResponseWrapper from Sitemesh) as a parameter to RequestDispatcher.include() and then get the output from that response object.
See my response (no pun intended) to this question.

Deleting / Replacing A Node in E4X (AS3 - Flex)

I'm building a listing/grid control in a Flex application and using it in a .NET web application. To make a really long story short I am getting XML from a webservice of serialized objects. I have a page limit of how many things can be on a page. I've taken a data grid and made it page, sort across pages, and handle some basic filtering.
In regards to paging I'm using a Dictionary keyed on the page and storing the XML for that page. This way whenever a user comes back to a page that I've saved into this dictionary I can grab the XML from local memory instead of hitting the webservice. Basically, I'm caching the data retrieved from each call to the webservice for a page of data.
There are several things that can expire my cache. Filtering and sorting are the main reason. However, a user may edit a row of data in the grid by opening an editor. The data they edit could cause the data displayed in the row to be stale. I could easily go to the webservice and get the whole page of data, but since the page size is set at runtime I could be looking at a large amount of records to retrieve.
So let me now get to the heart of the issue that I am experiencing. In order to prevent getting the whole page of data back I make a call to the webservice asking for the completely updated record (the editor handles saving its data).
Since I'm using custom objects I need to serialize them on the server to XML (this is handled already for other portions of our software). All data is handled through XML in e4x. The cache in the Dictionary is stored as an XMLList.
Now let me show you my code...
var idOfReplacee:String = this._WebService.GetSingleModelXml.lastResult.*[0].*[0].#Id;
var xmlToReplace:XMLList = this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee);
if(xmlToReplace.length() > 0)
{
delete (this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0]);
this._DataPages[this._Options.PageIndex].Data += this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
}
Basically, I get the id of the node I want to replace. Then I find it in the cache's Data property (XMLList). I make sure it exists since the filter on the second line returns the XMLList.
The problem I have is with the delete line. I cannot make that line delete that node from the list. The line following the delete line works. I've added the node to the list.
How do I replace or delete that node (meaning the node that I find from the filter statement out of the .Data property of the cache)???
Hopefully the underscores for all of my variables do not stay escaped when this is posted! otherwise this.&#95 == this._
Thanks for the answers guys.
#Theo:
I tried the replace several different ways. For some reason it would never error, but never update the list.
#Matt:
I figured out a solution. The issue wasn't coming from what you suggested, but from how the delete works with Lists (at least how I have it in this instance).
The Data property of the _DataPages dictionary object is list of the definition nodes (was arrived at by a previous filtering of another XML document).
<Models>
<Definition Id='1' />
<Definition Id='2' />
</Models>
I ended up doing this little deal:
//gets the index of the node to replace from the same filter
var childIndex:int = (this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0]).childIndex();
//deletes the node from the list
delete this._DataPages[this._Options.PageIndex].Data[childIndex];
//appends the new node from the webservice to the list
this._DataPages[this._Options.PageIndex].Data += this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
So basically I had to get the index of the node in the XMLList that is the Data property. From there I could use the delete keyword to remove it from the list. The += adds my new node to the list.
I'm so used to using the ActiveX or Mozilla XmlDocument stuff where you call "SelectSingleNode" and then use "replaceChild" to do this kind of stuff. Oh well, at least this is in some forum where someone else can find it. I do not know the procedure for what happens when I answer my own question. Perhaps this insight will help someone else come along and help answer the question better!
Perhaps you could use replace instead?
var oldNode : XML = this._DataPages[this._Options.PageIndex].Data.(#Id == idOfReplacee)[0];
var newNode : XML = this._WebService.GetSingleModelXml.lastResult.*[0].*[0];
oldNode.parent.replace(oldNode, newNode);
I know this is an incredibly old question, but I don't see (what I think is) the simplest solution to this problem.
Theo had the right direction here, but there's a number of errors with the way replace was being used (and the fact that pretty much everything in E4X is a function).
I believe this will do the trick:
oldNode.parent().replace(oldNode.childIndex(), newNode);
replace() can take a number of different types in the first parameter, but AFAIK, XML objects are not one of them.
I don't immediately see the problem, so I can only venture a guess. The delete line that you've got is looking for the first item at the top level of the list which has an attribute "Id" with a value equal to idOfReplacee. Ensure that you don't need to dig deeper into the XML structure to find that matching id.
Try this instead:
delete (this._DataPages[this._Options.PageIndex].Data..(#Id == idOfReplacee)[0]);
(Notice the extra '.' after Data). You could more easily debug this by setting a breakpoint on the second line of the code you posted, and ensure that the XMLList looks like you expect.

Resources