Using classic ASP, how to fetch or screenscrape meta tags of html page? - asp-classic

Using the following code, I can reach the site, grab the data, but I cannot obtain the meta title tag. Surprisingly, I searched for methods to obtain meta tags while screenscraping with classic ASP and only found a couple of example, neither of which I could get to work.
Any assistance?
rss_url = "https://www.nationalgeographic.com/science/2019/06/opal-fossils-reveal-new-species-dinosaur-australia-fostoria/"
Set objHTTP = CreateObject("Microsoft.XMLHTTP")
objHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
objHTTP.Open "GET", rss_url, False
objHTTP.Send
if objHTTP.Status = 200 Then sdata = BinaryToString(objHTTP.ResponseBody)
Set objHTTP = Nothing
Set regEx = New RegExp
regEx.Pattern = "<meta.*property=""og:image"".*content=""(.*)"".*\/>"
regEx.IgnoreCase = True
Set matches = regEx.Execute(sdata)
if matches.Count > 0 then
KeywordAl = matches(0).SubMatches(0)
response.write "Image = " & KeywordAl&"<hr>"
end if
I included the BinaryToString Function just to be complete:
Function BinaryToString(byVal Binary)
'--- Converts the binary content to text using ADODB Stream
'--- Set the return value in case of error
BinaryToString = ""
'--- Creates ADODB Stream
Dim BinaryStream
Set BinaryStream = CreateObject("ADODB.Stream")
'--- Specify stream type.
BinaryStream.Type = 1 '--- adTypeBinary
'--- Open the stream And write text/string data To the object
BinaryStream.Open
BinaryStream.Write Binary
'--- Change stream type to text
BinaryStream.Position = 0
BinaryStream.Type = 2 '--- adTypeText
'--- Specify charset for the source text (unicode) data.
BinaryStream.CharSet = "UTF-8"
'--- Return converted text from the object
BinaryToString = BinaryStream.ReadText
End Function

Try this:
Function GetTextFromUrl(url)
Dim oXMLHTTP
Dim strStatusTest
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.3.0")
oXMLHTTP.Open "GET", url, False
oXMLHTTP.Send
If oXMLHTTP.Status = 200 Then
GetTextFromUrl = oXMLHTTP.responseText
End If
End Function
Dim sResult : sResult = GetTextFromUrl("https://www.nationalgeographic.com/science/2019/06/opal-fossils-reveal-new-species-dinosaur-australia-fostoria/")
Set regEx = New RegExp
regEx.Pattern = "<meta.*property=""og:image"".*content=""(.*)"".*\/>"
regEx.IgnoreCase = True
Set matches = regEx.Execute(sResult)
if matches.Count > 0 then
KeywordAl = matches(0).SubMatches(0)
response.write "Image = " & KeywordAl&"<hr>"
end if
For me this outputs for that page:
Image = https://www.nationalgeographic.com/content/dam/science/2019/05/22/gemstone-dino/og-fostoria_final.ngsversion.1559624211907.adapt.1900.1.jpg
edit: Added some debugging info here. Try this snippet and see what it says about your TLS version - it is possible this site refuses connections below a certain TLS level.
Set objHttp = Server.CreateObject("WinHTTP.WinHTTPRequest.5.1")
objHttp.open "GET", "https://howsmyssl.com/a/check", False
objHttp.Send
Response.Write objHttp.responseText
Set objHttp = Nothing
Response.End

Related

Why will some site not load via XMLHTTPREQUEST using server side script (ASP) but I can load with javascript?

I'm trying to load the following test urls with Classic ASP through xmlhttprequest. But two of the sites will not load. I know the script works because I can run some sites, but some will not load. Any explanation?
I've loaded these sites with Javascript and they do load (code not included, but standard AJAX or plain JS script). So why would client-side script work and not server-side code (ASP)?
'rss_url = "https://www.nationalgeographic.com/science/2019/06/opal-fossils-reveal-new-species-dinosaur-australia-fostoria" 'THiS URL DOES NOT LOAD
rss_url = "https://www.nbcnews.com/news/us-news/ex-minneapolis-officer-who-killed-justine-damond-sentenced-12-5-n1013926" 'THIS URL DOES NOT LOAD
'rss_url = "https://www.reuters.com/article/us-usa-saudi-arms/republican-democratic-senators-seek-to-block-trump-saudi-arms-sales-idUSKCN1T61PL" 'THIS URL LOADS
Dim objHTTP
Set objHTTP = Server.CreateObject("MSXML2.XMLHTTP.6.0")
Err.Clear ' shouldn't be needed; can't hurt
ON ERROR RESUME NEXT
objHTTP.Open "GET", rss_url, False
objHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
'objHTTP.setRequestHeader "Content-type", "text/html"
aErr = Array(Err.Number, Err.Description)
On Error GoTo 0
If 0 = aErr(0) Then
On Error Resume Next
objHTTP.Send
aErr = Array(Err.Number, Err.Description)
On Error GoTo 0
Select Case True
Case 0 <> aErr(0)
response.write "send failed:" & aErr(0) & aErr(1)
Case 200 = objHTTP.status
response.write rss_url & objHTTP.status & objHTTP.statusText
Case Else
response.write "further work needed:"
response.write rss_url & objHTTP.status & objHTTP.statusText
End Select
Else
response.write "open failed:" & aErr(0) & aErr(1)
End If
'ON ERROR GOTO 0
If Err.Number <> 0 Then
Response.Write "NO feed from ..."
end if
if objHTTP.Status = 200 Then sdata = BinaryToString(objHTTP.ResponseBody)
response.write sdata & "<hr>"
Set objHTTP = Nothing
Function BinaryToString(byVal Binary)
'--- Converts the binary content to text using ADODB Stream
'--- Set the return value in case of error
BinaryToString = ""
'--- Creates ADODB Stream
Dim BinaryStream
Set BinaryStream = CreateObject("ADODB.Stream")
'--- Specify stream type.
BinaryStream.Type = 1 '--- adTypeBinary
'--- Open the stream And write text/string data To the object
BinaryStream.Open
BinaryStream.Write Binary
'--- Change stream type to text
BinaryStream.Position = 0
BinaryStream.Type = 2 '--- adTypeText
'--- Specify charset for the source text (unicode) data.
BinaryStream.CharSet = "UTF-8"
'--- Return converted text from the object
BinaryToString = BinaryStream.ReadText
End Function
Try this (from this answer)
Function GetTextFromUrl(url)
Dim oXMLHTTP
Dim strStatusTest
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.3.0")
oXMLHTTP.Open "GET", url, False
oXMLHTTP.Send
If oXMLHTTP.Status = 200 Then
GetTextFromUrl = oXMLHTTP.responseText
End If
End Function
Dim sResult : sResult = GetTextFromUrl("https://www.nbcnews.com/news/us-news/ex-minneapolis-officer-who-killed-justine-damond-sentenced-12-5-n1013926")
Response.Write sResult
Works fine for me, grabs the page and displays it.

save a webpage with VBS

When I use FireFox for "https://twitter.com/search?q=vbs",
all works well and I see the tweets (without ever logging on). But, when
I try to use the simplest VBS-scripting way with XMLHTTP, it seems like I
am declared a mobile user to Twitter and I don't get search results. So,
how can I change my VBS code below to make this work? In principle,
it seems like I should be able to set some objXMLHTTP property to spoof
any browser, but then again, Microsoft probably wouldn't give me this
freedom so easily. Any comments would be great!
strFileURL = "https://twitter.com/search?q=vbs"
strHDLocation = "C:\Users\me\webpages\saved_tweets.html"
Set objXMLHTTP = CreateObject("MSXML2.XMLHTTP")
objXMLHTTP.open "GET", strFileURL, false
objXMLHTTP.send()
if objXMLHTTP.Status = 200 then
Set objADOStream = CreateObject("ADODB.Stream")
objADOStream.Open
objADOStream.Type = 1
objADOStream.Write objXMLHTTP.ResponseBody
objADOStream.Position = 0
Set objFSO = Createobject("Scripting.FileSystemObject")
if objFSO.Fileexists(strHDLocation) then objFSO.DeleteFile strHDLocation
Set objFSO = Nothing
objADOStream.SaveToFile strHDLocation
objADOStream.Close
Set objADOStream = Nothing
end if
Set objXMLHTTP = Nothing
Maybe you could fake a User Agent (browser) in your HTTP request so that Twitter will consider your browser as a desktop with something like this :
Set objXMLHTTP = CreateObject("MSXML2.XMLHTTP")
objXMLHTTP.open "GET", strFileURL, false
objXMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:23.0) Gecko/20131011 Firefox/23.0"
objXMLHTTP.send()
Would using the mobile search page be an option?
strFileURL = "https://mobile.twitter.com/search?q=vbs"
Set objXMLHTTP = CreateObject("MSXML2.XMLHTTP")
objXMLHTTP.open "GET", strFileURL, false
objXMLHTTP.send

Download Files from URL using Classic ASP

I have few urls which are linked to a pdf example
abc.com/1.pdf
abc.com/2g.pdf
abc.com/i8.pdf
What i wanted to do is Download the PDFs automatically in a Folder using Classic ASP
I tried to use this code http://blog.netnerds.net/2007/01/classic-asp-push-file-downloads-from-directory-outside-of-the-web-root/
but this doesnt work for Http it works good if the files are local.
I want to do it automatically.
I used the code posted by user580950 and the comment by AnthonyWJones and created a function version of the code. Call the function and it returns the content type of the file downloaded or an empty string if the file wasn't found.
public function SaveFileFromUrl(Url, FileName)
dim objXMLHTTP, objADOStream, objFSO
Set objXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.3.0")
objXMLHTTP.open "GET", Url, false
objXMLHTTP.send()
If objXMLHTTP.Status = 200 Then
Set objADOStream = CreateObject("ADODB.Stream")
objADOStream.Open
objADOStream.Type = 1 'adTypeBinary
objADOStream.Write objXMLHTTP.ResponseBody
objADOStream.Position = 0 'Set the stream position to the start
Set objFSO = Createobject("Scripting.FileSystemObject")
If objFSO.Fileexists(FileName) Then objFSO.DeleteFile FileName
Set objFSO = Nothing
objADOStream.SaveToFile FileName
objADOStream.Close
Set objADOStream = Nothing
SaveFileFromUrl = objXMLHTTP.getResponseHeader("Content-Type")
else
SaveFileFromUrl = ""
End if
Set objXMLHTTP = Nothing
end function
I got this code somewhere on the internet if anyone wants to use it.
<%
Server.ScriptTimeout = 60 * 20
' Set your settings
strFileURL = "http://pathtofile.zip"
strHDLocation = "c:\filename.zip"
' Fetch the file
Set objXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.3.0")
objXMLHTTP.Open "GET", strFileURL, False
objXMLHTTP.Send()
If objXMLHTTP.Status = 200 Then
Set objADOStream = CreateObject("ADODB.Stream")
objADOStream.Open
objADOStream.Type = 1 'adTypeBinary
objADOStream.Write objXMLHTTP.ResponseBody
objADOStream.Position = 0 'Set the stream position to the start
Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists(strHDLocation) Then objFSO.DeleteFile strHDLocation
Set objFSO = Nothing
objADOStream.SaveToFile strHDLocation
objADOStream.Close
Set objADOStream = Nothing
End if
Set objXMLHTTP = Nothing
%>

vbs xmlhttp responseText Truncated?

Hello all i have a simple vbs script that is grabbing a url(with values using GET) and I need to parse thru the text. However the responsetext i am getting is not the full response. i am getting a string that is 1000 in length however i know the response should be more like 5000.
Function getServer(server_hostname)
Set objHTTP = CreateObject("msxml2.xmlhttp.3.0")
objHTTP.open "GET", "http://someurl/ServerInfo.asp", False
objHTTP.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
objHTTP.send "B1=GO!!&Server=" + server_hostname
getServer = objHTTP.responseText
End Function
serverStr = getServer(server_hostname)
msgbox(Len(serverStr))
Is there a limitation on how much can be returned? Thank you for your help.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
So i change my function to:
Function getServer(server_hostname)
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
objHTTP.open "GET", "http://someurl/ServerInfo.asp?B1=GO!!&Server=" + server_hostname , False
'objHTTP.setRequestHeader "Content-Type", "text/html"
'objHTTP.send "B1=GO!!&Server=" + server_hostname
objHTTP.send
getServer = objHTTP.responseText
End Function
And now it works...no idea why.
The maximum MsgBox length is 1024 characters.
Ref: http://www.w3schools.com/vbscript/func_msgbox.asp

MSXML2.XMLHTTP send method works with early binding, fails with late binding

The code below works. But if I comment out the line Dim objRequest As MSXML2.XMLHTTP and uncomment the line Dim objRequest As Object it fails with the error message :
The parameter is incorrect
Why, and what (if anything) can I do about it?
Public Function GetSessionId(strApiId, strUserName, strPassword) As String
Dim strPostData As String
Dim objRequest As MSXML2.XMLHTTP
'Dim objRequest As Object '
strPostData = "api_id=" & strApiId & "&user=" & strUserName & "&password=" & strPassword
Set objRequest = New MSXML2.XMLHTTP
With objRequest
.Open "POST", "https://api.clickatell.com/http/auth", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send strPostData
GetSessionId = .responseText
End With
End Function
Corey, yes, I know I would have to do that in order for my code to work without a reference to the MSXML type library. That's not the issue here. The code fails when using Dim objRequest As Object regardless of whether I use
Set objRequest = NEW MSXML2.XMLHTTP with the reference, or
Set objRequest = CreateObject("MSXML2.XMLHTTP") without the reference.
For some reason, this works:
Dim strPostData As String
Dim objRequest As Object
strPostData = "api_id=" & strApiId & "&user=" & strUserName & "&password=" & strPassword
Set objRequest = New MSXML2.XMLHTTP
With objRequest
.Open "POST", "https://api.clickatell.com/http/auth", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send (strPostData)
GetSessionId = .responseText
End With
Instead of building the URL-encoded strPostData via string concatenation, it's strongly advisable to use a URL encoding function:
strPostData = "api_id=" & URLEncode(strApiId) & _
"&user=" & URLEncode(strUserName) & _
"&password=" & URLEncode(strPassword)
A couple of choices for a URLEncode() function in VBA are in this thread: How can I URL encode a string in Excel VBA?
If you use the Dim objRequest As Object then you would need to code:
Set objRequest = CreateObject("MSXML2.XMLHTTP")
I realise this is nearly identical to the code from Tomalek above (all credit due to you!), but this question helped me towards a full solution to a problem I had (Excel submitting to PHP server, then dealing with response)...so in case this is of any help to anyone else:
Sub Button1_Click2()
Dim objXMLSendDoc As Object
Set objXMLSendDoc = New MSXML2.DOMDocument
objXMLSendDoc.async = False
Dim myxml As String
myxml = "<?xml version='1.0'?><Request>Do Something</Request>"
If Not objXMLSendDoc.LoadXML(myxml) Then
Err.Raise objXMLSendDoc.parseError.ErrorCode, , objXMLSendDoc.parseError.reason
End If
Dim objRequest As MSXML2.XMLHTTP
Set objRequest = New MSXML2.XMLHTTP
With objRequest
.Open "POST", "http://localhost/SISADraftCalcs/Test2.php", False
.setRequestHeader "Content-Type", "application/xml;charset=UTF-16"
.setRequestHeader "Cache-Control", "no-cache"
.send objXMLSendDoc
End With
Dim objXMLDoc As MSXML2.DOMDocument
Set objXMLDoc = objRequest.responseXML
If objXMLDoc.XML = "" Then
objXMLDoc.LoadXML objRequest.responseText
If objXMLDoc.parseError.ErrorCode <> 0 Then
MsgBox objXMLDoc.parseError.reason
End If
End If
Dim rootNode As IXMLDOMElement
Set rootNode = objXMLDoc.DocumentElement
MsgBox rootNode.SelectNodes("text").Item(0).text
End Sub

Resources