Read large file line-by-line with ADO Stream? - asp-classic

I want to use ADO Stream to read lines from a local large text file with UTF-8 encoding so I try
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Type = 2
objStream.Open
objStream.LoadFromFile = strFile
objStream.LineSeparator = 10
Do Until objStream.EOS
strLine = objStream.ReadText(-2)
Loop
However the result is that the script takes lots of RAM and CPU usages. So is there any way to tell the script not to load all the file contents into memory, but just open it and read until it encounters any line separator?

As you work with Stream object, I think it's obvious, however, .LoadFromFile fill current stream with the whole file content, and no any cutomize option to load parial data from file.
As for reading 1 line, you done this already with .ReadText(-2), (-2 = adReadLine).
Set objStream = CreateObject("ADODB.Stream")
With objStream
.Charset = "utf-8"
.Type = 2
.Open
'objStream.LoadFromFile = strFile ''I see a typo here
.LoadFromFile strFile
.LineSeparator = 10 ''that's Ok
'Do Until objStream.EOS ''no need this
strLine = .ReadText(-2)
'Loop
.Close ''add this though!
End with
Set objStream = Nothing
For .LineSeparator you can use just 3 constants:
Constant Value Description
adCRLF -1 Default. Carriage return line feed
adLF 10 Line feed only
adCR 13 Carriage return only
If you need to break your Do..Loop at other letter, as .ReadText is the only choice for reading text stream, you may use it in conjunction with InStr function and Exit Do then you find your custom separator.
Const cSeparator = "_" 'your custom separator
Dim strLine, strTotal, index
Do Until objStream.EOS
strLine = objStream.ReadText(-2)
index = InStr(1, strLine, cSeparator)
If index <> 0 Then
strTotal = strTotal & Left(strLine, index-1)
Exit Do
Else
strTotal = strTotal & strLine
End If
Loop
Shortly, this is the whole optimization you can do (or at least as far as I know).

If you look at this snippet from J. T. Roff's ADO book, you'll see that in theory you can read from a file line by line (without loading it completely into memory). I tried using the file: protocoll in the source parameter, but did not succeed.
So let's try another approach: To treat the .txt file as a UTF8 encoded trivial (one column) ADO database table, you need a schema.ini file in the source directory:
[linesutf8.txt]
ColNameHeader=False
CharacterSet=65001
Format=TabDelimited
Col1=SampleText CHAR WIDTH 100
Then you can do:
Dim sTDir : sTDir = "M:/lib/kurs0705/testdata"
Dim sFName : sFName = "[linesutf8.txt]"
Dim oDb : Set oDb = CreateObject("ADODB.Connection")
Dim sCs : sCs = Join(Array( _
"Provider=MSDASQL" _
, "Driver={Microsoft Text Driver (*.txt; *.csv)}" _
, "DBQ=" + sTDir _
), ";")
oDb.open sCs
WScript.Stdin.Readline
Dim oRs : Set oRs = oDb.Execute("SELECT * FROM " & sFName)
WScript.Stdin.Readline
Do Until oRS.EOF
WScript.Echo oRS.Fields(0).Value
oRs.MoveNext
Loop
oRs.Close
oDb.Close
For some background look here.

Related

Loop through website links and get PDF's to my computer

This topic is related to Loop through links and download PDF's
I am trying to convert my current VBA code into VBScript. I have already understood that I have to remove the variable types (As ... part of Dim statements) and use CreatObject to get those objects but otherwise everything should port as-is. DoEvents will also have to be replaced with something like Wscript.sleep.
I came up with some problems. Currently while running VBS file I am getting an error saying "Object required: 'MSHTML'". Pointing to line 65, where I have Set hDoc = MSHTML.HTMLDocument. I have tried to search on Google but got nothing helpful for this one.
How I should proceed with this one?
DownloadFiles("https://www.nordicwater.com/products/waste-water/")
Sub DownloadFiles(p_sURL)
Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim xHttp
Dim hDoc
Dim Anchors
Dim Anchor
Dim sPath
Dim wholeURL
Dim internet
Dim internetdata
Dim internetlink
Dim internetinnerlink
Dim arrLinks
Dim sLink
Dim iLinkCount
Dim iCounter
Dim sLinks
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = False
internet.navigate (p_sURL)
Do Until internet.ReadyState = 4
Wscript.Sleep 100
Loop
Set internetdata = internet.document
Set internetlink = internetdata.getElementsByTagName("a")
i = 1
For Each internetinnerlink In internetlink
If Left(internetinnerlink, 36) = "https://www.nordicwater.com/product/" Then
If sLinks <> "" Then sLinks = sLinks & vbCrLf
sLinks = sLinks & internetinnerlink.href
i = i + 1
Else
End If
Next
wholeURL = "https://www.nordicwater.com/"
sPath = "C:\temp\"
arrLinks = Split(sLinks, vbCrLf)
iLinkCount = UBound(arrLinks) + 1
For iCounter = 1 To iLinkCount
sLink = arrLinks(iCounter - 1)
'Get the directory listing
xHttp.Open "GET", sLink
xHttp.send
'Wait for the page to load
Do Until xHttp.ReadyState = 4
Wscript.Sleep 100
Loop
'Put the page in an HTML document
Set hDoc = MSHTML.HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
'Loop through the hyperlinks on the directory listing
Set Anchors = hDoc.getElementsByTagName("a")
For Each Anchor In Anchors
'test the pathname to see if it matches your pattern
If Anchor.pathname Like "*.pdf" Then
xHttp.Open "GET", wholeURL & Anchor.pathname, False
xHttp.send
With CreateObject("Adodb.Stream")
.Type = 1
.Open
.write xHttp.responseBody
.SaveToFile sPath & getName(wholeURL & Anchor.pathname), 2 '//overwrite
End With
End If
Next
Next
End Sub
Function:
Function getName(pf)
getName = Split(pf, "/")(UBound(Split(pf, "/")))
End Function
Instead of Set hDoc = MSHTML.HTMLDocument, use:
Set hDoc = CreateObject("htmlfile")
In VBA/VB6 you can specify variable and object types but not with VBScript. You have to use CreateObject (or GetObject: GetObject function) to instantiate objects like MSHTML.HTMLDocument, Microsoft.XMLHTTP, InternetExplorer.Application, etc instead of declaring those using Dim objIE As InternetExplorer.Application for example.
Another change:
If Anchor.pathname Like "*.pdf" Then
can be written using StrComp function:
If StrComp(Right(Anchor.pathname, 4), ".pdf", vbTextCompare) = 0 Then
or using InStr function:
If InStr(Anchor.pathname, ".pdf") > 0 Then
Also, at the beginning of your sub, you do the following:
Set xHttp = CreateObject("Microsoft.XMLHTTP")
Dim xHttp
You should declare your variables before assigning them values or objects. In VBScript this is very relaxed, your code will work because VBScript will create undefined variables for you but it's good practice to Dim your variables before using them.
Except for Wscript.sleep commands, your VBScript code will work in VB6/VBA so you can debug your script in VB6 or VBA apps (like Excel).

extract attachments from DB to separate folders for each document

Have an assignment to do - it's to extract data from Lotus Notes DB including documents and their attachments. The purpose of this is to put it and store on the Sharepoint as a library.
So far I have managed to create a view and export the data for it to structure in Excel.
Also, I have looked up some Agents examples for extracting the attachments. With implementation of the below script, I managed to export the attachments:
Dim sDir As String
Dim s As NotesSession
Dim w As NotesUIWorkspace
Dim db As NotesDatabase
Dim dc As NotesDocumentCollection
Dim doc As NotesDocument
Sub Initialize
Set s = New NotesSession
Set w = New NotesUIWorkspace
Set db = s.CurrentDatabase
Set dc = db.UnprocessedDocuments
Set doc = dc.GetFirstDocument
Dim rtItem As NotesRichTextItem
Dim RTNames List As String
Dim DOCNames List As String
Dim itemCount As Integer
Dim sDefaultFolder As String
Dim x As Integer
Dim vtDir As Variant
Dim iCount As Integer
Dim j As Integer
Dim lngExportedCount As Long
Dim attachmentObject As Variant
x = MsgBox("This action will extract all attachments From the " & CStr(dc.Count) & _
" document(s) you have selected, And place them into the folder of your choice." & _
Chr(10) & Chr(10) & "Would you like To continue?", 32 + 4, "Export Attachments")
If x <> 6 Then Exit Sub
sDefaultFolder = s.GetEnvironmentString("LPP_ExportAttachments_DefaultFolder")
If sDefaultFolder = "" Then sDefaultFolder = "F:"
vtDir = w.SaveFileDialog( False, "Export attachments To which folder?", "All files|*.*", sDefaultFolder, "Choose Folder and Click Save")
If IsEmpty(vtDir) Then Exit Sub
sDir = StrLeftBack(vtDir(0), "\")
Call s.SetEnvironmentVar("LPP_ExportAttachments_DefaultFolder", sDir)
While Not (doc Is Nothing)
iCount = 0
itemCount = 0
lngExportedCount = 0
Erase RTNames
Erase DocNames
'Scan all items in document
ForAll i In doc.Items
If i.Type = RICHTEXT Then
Set rtItem = doc.GetfirstItem(i.Name)
If Not IsEmpty(rtItem.EmbeddedObjects) Then
RTNames(itemCount) = CStr(i.Name)
itemCount = itemCount +1
End If
End If
End ForAll
For j = 0 To itemCount-1
Set rtItem = Nothing
Set rtItem = doc.GetfirstItem(RTNames(j))
ForAll Obj In rtItem.EmbeddedObjects
If ( Obj.Type = EMBED_ATTACHMENT ) Then
Call ExportAttachment(Obj)
Call doc.Save( False, True )
'creates conflict doc if conflict exists
End If
End ForAll
Next
'Scan all items in document
ForAll i In doc.Items
If i.Type = ATTACHMENT Then
DOCNames(lngExportedCount) = i.Values(0)
lngExportedCount = lngExportedCount + 1
End If
End ForAll
For j% = 0 To lngExportedCount-1
Set attachmentObject = Nothing
Set attachmentObject = doc.GetAttachment(DOCNames(j%))
Call ExportAttachment(attachmentObject)
Call doc.Save( False, True )
'creates conflict doc if conflict exists
Next
Set doc = dc.GetNextDocument(doc)
Wend
MsgBox "Export Complete.", 16, "Finished"
End Sub
Sub ExportAttachment(o As Variant)
Dim sAttachmentName As String
Dim sNum As String
Dim sTemp As String
sAttachmentName = sDir & "\" & o.Source
While Not (Dir$(sAttachmentName, 0) = "")
sNum = Right(StrLeftBack(sAttachmentName, "."), 2)
If IsNumeric(sNum) Then
sTemp = StrLeftBack(sAttachmentName, ".")
sTemp = Left(sTemp, Len(sTemp) - 2)
sAttachmentName = sTemp & Format$(CInt(sNum) + 1, "##00") & _
"." & StrRightBack(sAttachmentName, ".")
Else
sAttachmentName = StrLeftBack(sAttachmentName, ".") & _
"01." & StrRightBack(sAttachmentName, ".")
End If
Wend
Print "Exporting " & sAttachmentName
'Save the file
Call o.ExtractFile( sAttachmentName )
End Sub
So the issue I do have right now is that these attachments are being saved to the same folder, which means that I would manually have to put them into right folders of library (several thousands). Could anyone help on how should I change the above code to have the attachments saved to separate folder for each document from DB?
Also for some reason that I cant find out below line is causing error pop up with "Object Variable not set":
sAttachmentName = sDir & "\" & o.Source
Would anyone know why it causes failure and stops the whole process?
You need to use the MkDir statement to create directory and extract attachments in the folder. Probably write something like:
MkDir sDir
You need to write code that create a new directory for each document (make sure you check if the directory exists, and preferably you make sure each directory has a unique name).
I wrote a tool like that, that exports all the fields of a document into XML, as well as attachments and embedded images. It can be set to separate each document into it's own directory.
You can read more about it ate the link below, perhaps you can get some ideas from the description. I use the UniversalID of teh document to get a unique folder name.
http://www.texasswede.com/websites/texasswede.nsf/Page/Notes%20XML%20Exporter

Non-printable characters in file names break my recursive file listing VB Script

I created a VB script to recursively list all of its file and subfolder files. The script begins fine but eventually crashes in any folder containing a file with a non-printable character/s in their filenames, i.e. I see little squares when I browse the folder in Explorer. I'm not sure how to change my below error handling to continue when it finds a file with such characters.
Any advice or solutions would be appreciated. Thank you.
Set objFSO = CreateObject("Scripting.FileSystemObject")
strFolder = "C:\Input\"
Set objFolder = objFSO.GetFolder(strFolder)
Set NewFile = objFSO.CreateTextFile("C:\Output\" & objFolder.Name & " FileList.txt", True)
Set colFiles = objFolder.Files
On Error Resume Next
For Each objFile In colFiles
NewFile.WriteLine(objFile.Path)
If Err Then
Err.Clear
End If
Next
ShowSubFolders(objFolder)
Sub ShowSubFolders(objFolder)
Set colFolders = objFolder.SubFolders
For Each objSubFolder In colFolders
Set colFiles = objSubFolder.Files
For Each objFile In colFiles
NewFile.WriteLine(objFile.Path)
If Err Then
Err.Clear
End If
Next
ShowSubFolders(objSubFolder)
Next
End Sub
NewFile.Close
Create the output text file as unicode so it can handle "non printable" characters. Third parameter of CreateTextFile.
Set NewFile = objFSO.CreateTextFile(" ... ", True, True)
EDITED
If you can not work with unicode files, then file/folder names should be converted from unicode to ansi before writing to output file. This will do the conversion
Function Unicode2Ansi( text )
Unicode2Ansi = text
With (WScript.CreateObject("ADODB.Stream"))
' Put data into stream
.Type = 2 '( adTypeText )
.Charset = "x-ansi"
.Open
.WriteText text
'Retrieve data from stream
.Position = 0
Unicode2Ansi = .ReadText
.Close
End With
End Function
And adapt code to call it NewFile.WriteLine Unicode2Ansi(objFile.Path)

Getting access to a binary response byte-by-byte in classic asp/JScript

I asked this question a few days ago but it seems to have gone cold fairly quickly. What I want to do is pretty simple and I can't believe someone hasn't figured it out.
Solution needs to be JScript classic ASP. I am reading a file from a remote server and I want to process that (binary) file on my server and spit the results back to the client as XML.
Here's a simplified version of what I am trying to do. This code runs, or will if the URL is filled in for your site. This test file is readbin.asp. It reads a file called test.bin, and writes the result to a stream. I used a stream because that makes it easier to read the file and parse the contents. Basically I want to:
while not end of stream
read byte from stream
process byte
here is readbin.asp:
<%# LANGUAGE = JScript %>
<%
var url = "http:// (... your URL to the file test.bin goes here...) " ;
var xmlhttp = Server.CreateObject ("MSXML2.ServerXMLHTTP") ;
xmlhttp.open ("GET", url, false) ;
xmlhttp.send () ;
var BinaryInputStream = Server.CreateObject ("ADODB.Stream") ;
BinaryInputStream.Type = 1 ; // binary
BinaryInputStream.Open ;
BinaryInputStream.Write (xmlhttp.responseBody) ;
BinaryInputStream.Position = 0 ;
Response.Write ("BinaryInputStream.size = " + BinaryInputStream.size + "<br>") ;
Response.Write ("BinaryInputStream = " + BinaryInputStream + "<br>") ;
var ByteValue = BinaryInputStream.read (1) ;
Response.Write ("ByteValue = " + ByteValue + "<br>") ;
Response.Write ("typeof (ByteValue) = " + typeof (ByteValue) + "<br>") ;
%>
My problem is: how do I get ByteValue as a number 0..255? typeof (ByteValue) is "unknown".
Ord?? Byte()?? Asc?? Chr??
You may want to take a look at this piece of code:
http://docs.hyperweb.no/source/asplib1.2/util/fileupload.asp
This code is for handling uploaded files. It is really quite similar:
- On line 224 the binary request is being read into a stream object.
- On line 232 the data is read back as ISO-8859-1 text, which is almost what you want
- You can then read each byte of this string by using the getByte() function on line 48.
This function uses the lookup table on line 33 to fix ceratin characters that get converted to unicode.
I am much more experienced with vbscript than jscript but I will give it a shot since not many takers on this question.
states there are six possible values that typeof returns: "number," "string," "boolean," "object," "function," and "undefined."
http://msdn.microsoft.com/en-us/library/259s7zc1(VS.85).aspx
The ADODB.Stream .Read object and method return a variant data type. I suspect typeof does not like the variant datatype.
http://www.w3schools.com/ado/ado_ref_stream.asp
The posting from this guy seems to explain it a bit further.
http://blogs.msdn.com/jaiprakash/archive/2007/01/09/jscript-supports-safearrays-of-variants-only.aspx
I would try casting the return stream before applying typeof to it.
Maybe not quite on the topic, but using VBScript I wrote this:
option explicit
dim fso, wshSHell, objShellApp, args, stdin,stdout
set fso = CreateObject("Scripting.FileSystemObject")
Set wshShell = CreateObject("WScript.Shell")
set objShellApp = CreateObject("Shell.Application")
Set args = Wscript.Arguments
set stdin = wscript.stdin
set stdout = wscript.stdout
dim filename, txtFile
filename = args(0)
Const adTypeBinary = 1
'Create Stream object
Dim BinaryStream, data
Set BinaryStream = CreateObject("ADODB.Stream")
'Specify stream type - we want To get binary data.
BinaryStream.Type = adTypeBinary
'Open the stream
BinaryStream.Open
'Load the file data from disk To stream object
BinaryStream.LoadFromFile filename
'Open the stream And get binary data from the object
data = BinaryStream.Read
BinaryStream.close
dim i, item, strLine, hexLine
hexLine = ""
strLine = ""
stdout.writeline "Decimal |Hex |Data | ASCII 33-254"
for i = 0 to lenb(data)-1
item = ascb(midb(data,i+1,1))
if ((i MOD 16) = 0) and (i<>0) then
stdout.writeLine right("00000000" & i,8) & "|" & right("00000000" & hex(i),8) & "|" & hexLine & " | " & strLine
hexLine = ""
strLine = ""
end if
hexLine = hexLine & right("0" & hex(item),2) & " "
if (item <= 32) or (item > 254) then
strLine=strLine + "."
else
strLine = strLine & chr(item)
end if
next
Key to this solution is to know that the variable 'data' contains an array of bytes. You can handle that by using the function lenb (length of byte array) and midb (to extract one or more bytes).
Run the script as follows:
cscript dumphex.vbs my_binary_file.bin > my_binary_file.hex.txt
This will output to standard out the hex code of all the binary file data. Each line of 16 hex codes is prefixed by a decimal + hex counter of the byte number. THe last column displays readable ascii between 33 and 254.
Also great to circumvent that annoying editor that interprets your UTF-8 codes, if you want to see just the exact codes in your ascii files.

ASP SaveToDisk method takes an incredible amount of time

This is a method in ASP Classic that saves a file to disk. It takes a very long time but I'm not sure why. Normally, I wouldn't mind so much, but the files it handles are pretty large so need this needs to faster than 100kB a second save. Seriously slow. (old legacy system, band aid fix till it gets replaced...)
Public Sub SaveToDisk(sPath)
Dim oFS, oFile
Dim nIndex
If sPath = "" Or FileName = "" Then Exit Sub
If Mid(sPath, Len(sPath)) <> "\" Then sPath = sPath & "\" '"
Set oFS = Server.CreateObject("Scripting.FileSystemObject")
If Not oFS.FolderExists(sPath) Then Exit Sub
Set oFile = oFS.CreateTextFile(sPath & FileName, True)
For nIndex = 1 to LenB(FileData)
oFile.Write Chr(AscB(MidB(FileData,nIndex,1)))
Next
oFile.Close
End Sub
I'm asking because there are plenty of WTF's in this code so I'm fighting those fires while getting some help on these ones.
I don't see your definition for "FileData" anywhere in your code - where is this coming from? Is there a reason you're writing it to disk a single character at a time? I'd suspect this is your problem - writing 100K of data takes 100K trips through this loop, which could be the reason for your slowdown. Why can't you replace the write loop at the bottom:
For nIndex = 1 to LenB(FileData)
oFile.Write Chr(AscB(MidB(FileData,nIndex,1)))
Next
with a single statement to write the file all at once?
oFile.Write FileData
What you should do is read the binary request into an ADODB.Stream object and convert it to plain ASCII text in a single fast step.
Set objStream = Server.CreateObject("ADODB.Stream")
objStream.Type = 1
objStream.Open
objStream.Write Request.BinaryRead(Request.TotalBytes)
objStream.Position = 0
objStream.Type = 2
objStream.Charset = "ISO-8859-1"
FormData = objStream.ReadText
objStream.Close
Set objStream = Nothing
Notice how the variable FormData now contains the form data as text. Then you parse this text and locate the start and length of each file, and use ADODB.Stream CopyTo method to extract the specific portion of the file and save it do disk.

Resources