ASP.net: Scrape Part of webage - asp.net

Dim url As New Uri("http://www.testpage.com")
If url.Scheme = Uri.UriSchemeHttp Then
'Create Request Object
Dim objRequest As HttpWebRequest = DirectCast(HttpWebRequest.Create(url), HttpWebRequest)
'Set Request Method
objRequest.Method = WebRequestMethods.Http.[Get]
'Get response from requested url
Dim objResponse As HttpWebResponse = DirectCast(objRequest.GetResponse(), HttpWebResponse)
'Read response in stream reader
Dim reader As New StreamReader(objResponse.GetResponseStream())
Dim tmp As String = reader.ReadToEnd()
objResponse.Close()
'Set response data to container
Label1.Text = tmp
End If
How Would I only scrape part of a webpage..The code succesfulyl get the full html content.
For Example..I want to scrape eveyrthing between <div id="content"> </div>

Once you have the page's full html content in a string variable, you can use Regular Expressions over this string to return the parts you want to extract.
Since you have not provided details on what you want to extract, I will provide you with a link on how to use Regular Expressions.
A short tutorial on Regular Expressions can be found here

Related

Simple HttpWebRequest POST with Redirect URL

Here I have simple HTML page that make a POST to other page.
After I fill out that form and press Submit it takes me back to callback URL.
<form action="http://somedomain/products/cotd" method="post">
<input type="hidden" name="callback" value="http://DOMAIN.org">
<button type="submit">Go</button>
</form>
Now I need to implement the same in ASP.NET
Private Sub post()
Dim req2 As HttpWebRequest
Dim resp As HttpWebResponse
dim result As string
dim url = "http://somedomain/products/cotd"
url += "?callback=http://DOMAIN.org"
Dim req As WebRequest = WebRequest.Create(url)
req.Method = "POST"
req2 = CType(req, HttpWebRequest)
req2.AllowAutoRedirect = true
resp = TryCast(req2.GetResponse(), HttpWebResponse)
result = getResponse(resp)
resp.Close()
End Sub
function getResponse(response as HttpWebResponse) As string
Dim responseText As String
Dim encoding1 = ASCIIEncoding.ASCII
Using reader = New StreamReader(response.GetResponseStream(), encoding1)
responseText = reader.ReadToEnd()
End Using
Return responseText
End function
The thing is nothing happens here. Neither the new page opens neither it redirects back. IN a response I do get callback page but first it does not open page where I need to submit values.
What am I missing here please?

Scrape information from asp.net gridview with paging

I'm trying to scrape some schedules off of a website. the information is displayed in a GridView with paging.
The url is:
http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0
My Issue is when I want to scrape pages other then #1 in the grid view.
The best post I found so far was This One, but it doesn't work and that topic is not complete. I tried to use Fiddler and Chrome to get the post data and use it, but I can't get it to work for me. Can you guys see what's missing?
Here's the code I am using. it's in VB, but you can answer in C# and I'll translate -) (sorry)
Protected Sub Page_Load(sender As Object, e As System.EventArgs) Handles Me.Load
Dim lcUrl As String = "http://www.landmarkworldwide.com/when-and-where/register/search-results.aspx?prgid=0&pgID=270&crid=0&ctid=&sdt=0"
' first, request the login form to get the viewstate value
Dim webRequest__1 As HttpWebRequest = TryCast(WebRequest.Create(lcUrl), HttpWebRequest)
Dim responseReader As New StreamReader(webRequest__1.GetResponse().GetResponseStream())
Dim responseData As String = responseReader.ReadToEnd()
responseReader.Close()
' extract the viewstate value and build out POST data
Dim viewState As String = ExtractViewState(responseData)
Dim loHttp As HttpWebRequest = DirectCast(WebRequest.Create(lcUrl), HttpWebRequest)
' *** Send any POST data
Dim lcPostData As String = [String].Format("__VIEWSTATE={0}&__EVENTTARGET={1}&__EVENTARGUMENT={2}", viewState, HttpUtility.UrlEncode("contentwrapper_0$maincontent_0$maincontentfullwidth_0$ucSearchResults$gvPrograms"), HttpUtility.UrlEncode("Page$3"))
loHttp.Method = "POST"
Dim lbPostBuffer As Byte() = System.Text.Encoding.GetEncoding(1252).GetBytes(lcPostData)
loHttp.ContentLength = lbPostBuffer.Length
Dim loPostData As Stream = loHttp.GetRequestStream()
loPostData.Write(lbPostBuffer, 0, lbPostBuffer.Length)
loPostData.Close()
Dim loWebResponse As HttpWebResponse = DirectCast(loHttp.GetResponse(), HttpWebResponse)
Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252)
Dim loResponseStream As New StreamReader(loWebResponse.GetResponseStream(), enc)
Dim lcHtml As String = loResponseStream.ReadToEnd()
loWebResponse.Close()
loResponseStream.Close()
Response.Write(lcHtml)
End Sub
Private Function ExtractViewState(s As String) As String
Dim viewStateNameDelimiter As String = "__VIEWSTATE"
Dim valueDelimiter As String = "value="""
Dim viewStateNamePosition As Integer = s.IndexOf(viewStateNameDelimiter)
Dim viewStateValuePosition As Integer = s.IndexOf(valueDelimiter, viewStateNamePosition)
Dim viewStateStartPosition As Integer = viewStateValuePosition + valueDelimiter.Length
Dim viewStateEndPosition As Integer = s.IndexOf("""", viewStateStartPosition)
Return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition))
End Function
To make it work you need to send all input fields to the page, not only viewstate. Other critical data is the __EVENTVALIDATION for example that you do not handle it. So:
First you need to make scrape on the #1 page. So load it and use the Html Agility Pack to convert it to a usable struct.
Then extract from that struct the input data that you need to post. From this answer HTML Agility Pack get all input fields here is a code sniped on how you can do that.
foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input"))
{
// use this to create the post string
// input.Attributes["value"];
}
Then when you have the post data that is needed to be a valid post, you move to the next step. Here is an example How to pass POST parameters to ASP.Net web request?
You can also read: How to use HTML Agility pack

HttpWebRequest.GetResponse does not return AutoID for server side controls

In .NET by default the client side ID's, for server side controls, get concatenated with generated text.
For example:
<asp:TextBox ID="txtUser" runat="server">
would become...
<input type="text" id="ctl00_body_txbUser">
However when I use HttpWebRequest.GetResponse(objReq.Getresponse, HttpWebResponse) to request the same page the item comes back without the auto generated text.
<input type="text" id="txbUser">
Is it possible to use an HttpWebRequest object and GetResponse in such a way that it returns a response with the Auto generated ID's .NET uses for server side controls?
I am working with a 3rd party that has previously set up translation rules specific to ID, now we are attempting have the same rules work against an API call passed a string generated from the page. However, the string generated from the page does not have the same IDs.
Below is code being used to return the Web Page as a string.
Public Shared Function GetWebPageAsString(ByVal strURI As String, ByVal strPostData As String) As String
' Declare our variables. '
Dim objHttpRequest As HttpWebRequest
Dim PostBuffer() As Byte
Dim PostDataStream As Stream = Nothing
Dim objHttpResponse As HttpWebResponse = Nothing
Dim objStreamReader As StreamReader = Nothing
Dim strResponseText As String = ""
Try
' Create a new request. '
objHttpRequest = CType(WebRequest.Create(strURI), HttpWebRequest)
objHttpRequest.Timeout = 3000000
objHttpRequest.Method = "POST"
PostBuffer = Encoding.ASCII.GetBytes(strPostData)
objHttpRequest.ContentType = "application/x-www-form-urlencoded"
objHttpRequest.ContentLength = PostBuffer.Length
PostDataStream = objHttpRequest.GetRequestStream
PostDataStream.Write(PostBuffer, 0, PostBuffer.Length)
PostDataStream.Close()
' Get the response to our request as a stream object. '
objHttpResponse = CType(objHttpRequest.GetResponse, HttpWebResponse)
' Create a stream reader to read the data from the stream. '
objStreamReader = New StreamReader(objHttpResponse.GetResponseStream, Encoding.UTF8)
' Copy the text retrieved from the stream to a variable. '
strResponseText = objStreamReader.ReadToEnd()
' Close our objects. '
objStreamReader.Close()
objHttpResponse.Close()
Catch ex As Exception
strResponseText = strURI & " | " & strPostData
Throw (ex)
Finally
If Not objStreamReader Is Nothing Then
objStreamReader.Close()
End If
If Not PostDataStream Is Nothing Then
PostDataStream.Close()
End If
If Not objHttpResponse Is Nothing Then
objHttpResponse.Close()
End If
objHttpRequest = Nothing
PostBuffer = Nothing
PostDataStream = Nothing
objHttpResponse = Nothing
objStreamReader = Nothing
End Try
' Set return value. '
Return strResponseText
End Function
EDIT: Just to Clarify, I need the IDs to continue to be Auto generated by .NET. I understand that I could make them equal by setting the mode to Static. Unfortunately the 3rd Party we are working with has already created the rules for our current pages based on the IDs that were generated by .NET. Requesting the same page using the HTTPRequest object and pushing data into a stream. I am not seeing the Auto Generated IDs anymore, even though its the same page.
Create a clean master page and put your page in it. That should fix the IDs issue.

How to send a POST in .net vb?

What i am trying to do is get my user to enter in a phone number and message and then post it to text marketer which send the message.
at the moment if i use a response.redirect the message sense..
response.redirect("http://www.textmarketer.biz/gateway/?username=*****&password=*****&message=test+message&orig=test&number=447712345678")
However, i do not want to send the user there. all i want to do it post the data to the url and that's all for now and the user stay on the current page.
Any help?
actually, you don't have to do this server side (vb), just plain html will do the trick:
<html>
<body>
<form action="http://google.com" method="post">
<input type="hidden" value="somevalue"/>
<input Type="submit" value="Submit"/>
</form>
</body>
</html>
this will post the data (and in effect, redirect) to google.com.
Maybe you could use client script (jQuery) - $.ajax() or $.post(). but I think you will face cross domain restrictions (there is a workaround but its not that clean and straightforward).
Another is using the HttpWebRequest class. This is server side and the post will originate from your server instead of the client (as what the 1st approach will do). upon calling request.GetResponse(), you can retrieve the output from the remote server and render it on your page. But if you want to post and redirect to the remote url then I guess you should use the first approach.
EDIT:
try this in VB:
Option Infer On
Imports System.Net
Imports System.Text
Public Class Test
Private Sub TESTRUN()
Dim s As HttpWebRequest
Dim enc As UTF8Encoding
Dim postdata As String
Dim postdatabytes As Byte()
s = HttpWebRequest.Create("http://www.textmarketer.biz/gateway/")
enc = New System.Text.UTF8Encoding()
postdata = "username=*****&password=*****&message=test+message&orig=test&number=447712345678"
postdatabytes = enc.GetBytes(postdata)
s.Method = "POST"
s.ContentType = "application/x-www-form-urlencoded"
s.ContentLength = postdatabytes.Length
Using stream = s.GetRequestStream()
stream.Write(postdatabytes, 0, postdatabytes.Length)
End Using
Dim result = s.GetResponse()
End Sub
End Class
update2:
a GET request using HttpWebRequest in VB.net.
Dim s As HttpWebRequest
Dim username = "username=" + HttpUtility.UrlEncode("yourusername")
Dim password = "password=" + HttpUtility.UrlEncode("yourp#assword)!==&#(*#)!##(_")
Dim message = "message=" + HttpUtility.UrlEncode("yourmessage")
Dim orig = "orig=" + HttpUtility.UrlEncode("dunno what this is")
Dim num = "number=" + HttpUtility.UrlEncode("123456")
Dim sep = "&"
Dim sb As New StringBuilder()
sb.Append(username).Append(sep).Append(password).Append(sep)
sb.Append(message).Append(sep).Append(orig).Append(sep).Append(num)
s = HttpWebRequest.Create("http://www.textmarketer.biz/gateway/?" + sb.ToString())
s.Method = "GET"
Dim result = s.GetResponse()
you have to use the webrequest class. refer http://msdn.microsoft.com/en-us/library/debx8sh9.aspx
Don't do this server side, but client side using AJAX.
The jQuery ajax library is quite good.

What are valid values for the MediaType Property on a HttpWebRequest

What are valid values for the MediaType Property on a HttpWebRequest?
I want to do something like this:
Dim url As String = HttpContext.Current.Request.Url.AbsoluteUri
Dim req As System.Net.HttpWebRequest = DirectCast(System.Net.WebRequest.Create(New Uri(url)), System.Net.HttpWebRequest)
' Add the current authentication cookie to the request
Dim cookie As HttpCookie = HttpContext.Current.Request.Cookies(FormsAuthentication.FormsCookieName)
Dim authenticationCookie As New System.Net.Cookie(FormsAuthentication.FormsCookieName, cookie.Value, cookie.Path, HttpContext.Current.Request.Url.Authority)
req.CookieContainer = New System.Net.CookieContainer()
req.CookieContainer.Add(authenticationCookie)
req.MediaType = "PRINT"
Dim res As System.Net.WebResponse = req.GetResponse()
'Read data
Dim ResponseStream As Stream = res.GetResponseStream()
'Write content into the MemoryStream
Dim resReader As New BinaryReader(ResponseStream)
Dim docStream As New MemoryStream(resReader.ReadBytes(CInt(res.ContentLength)))
Thanks.
I think this wikipedia page should give you a fairly comprehensive list of media types:
Media Types

Resources