regex to parse csv - asp.net

I'm looking for a regex that will parse a line at a time from a csv file. basically, what string.readline() does, but it will allow line breaks if they are within double quotes.
or is there an easier way to do this?

Using regex to parse CSV is fine for simple applications in well-controlled CSV data, but there are often so many gotchas, such as escaping for embedded quotes and commas in quoted strings, etc. This often makes regex tricky and risky for this task.
I recommend a well-tested CSV module for your purpose.
--Edit:-- See this excellent article, Stop Rolling Your Own CSV Parser!

The FileHelpers library is pretty good for this purpose.
http://www.filehelpers.net/

Rather than relying on error prone regular expressions, over simpified "split" logic or 3rd party components, use the .NET framework's built in functionality:
Using Reader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\MyFile.csv")
Reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
Dim MyDelimeters(0 To 0) As String
Reader.HasFieldsEnclosedInQuotes = False
Reader.SetDelimiters(","c)
Dim currentRow As String()
While Not Reader.EndOfData
Try
currentRow = Reader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Using

Related

Loop through all characters in XML file and replace certain characters

After finally getting my XmlReader to work correctly on a project at work, I am now getting certain parsing errors when trying to create new Reader objects for certain XML files. For instance, this one that keeps occurring is an error trying to parse a hyphen (-). This slightly baffles me because I manually go in and replace that character with something else (like an underscore), and it reads fine - even when there are hyphens elsewhere in the document that are not changed.
So, unless there is a explanation to fix this (maybe some XmlReaderSettings? Have yet to use any so I don't know what they are capable of), what is the best syntax/method to cycle through every character and replace with ones that will parse correctly?
This program will run automatically once per day on a daily-added XML and length of run-time is not an issue.
Edit: Error Message:
System.Xml.XmlException: An error occurred while parsing EntityName. Line 2896, position 89.
Code:
FN = Path.GetFileName(file1).ToString()
xmlFile = XmlReader.Create(Path.Combine(My.Settings.Local_Meter_Path, FN), New XmlReaderSettings())
ds.ReadXml(xmlFile)
Dim dt As DataTable = ds.Tables(13)
Dim filecreatedate As String = IO.File.GetLastWriteTime(file1)
If the problem occurs in ONLY ONE HYPHEN in entire file, even if the file contains more hyphens, the problem may be related to:
1) The HYPHEN is really not an HYPHEN but a control-character or even be accomplished of a hidden control character.
2) The link has other interesting thinhs, like an ampersand ("&"), which in strings may cause some problems. Are you sure the problem is the Hyphen?

IO.File.Exists() always returns false

Consider the following code in Button1_Click
Dim stFile as String = IO.Path.Combine(Server.MapPath("~/"), "uploads/text/file1.txt")
If IO.File.Exists(stFile) Then
' Do some processing
End If
Exists always returns false in the above code block
And here is Button2_Click code block
Dim stFile as String = IO.Path.Combine(Server.MapPath("~/"), "uploads/text/file1.txt")
Response.Clear()
Response.ContentType = "text/plain"
Response.AppendHeader("content-disposition", "attachment;filename=abc.txt")
Response.TransmitFile(stFile)
Response.Flush()
End If
This always downloads the same file. What could be the problem?
I also crumbled with this issue a while ago and found that the use of "/" and special chars may produce this scenario.
Path.Combine always returns paths with "\".
Try changing uploads/text/file1.txt to uploads\text\file1.txt
If you are generating dynamic file names then try to avoid including any special characters which may require url encoding such as %, (, [space] etc.
(Some concepts may seem illogical in this post but using the combination of \, / and special chars wasted almost 8-10 hours of mine)

Write less than greater than to text file in asp vbscript

I am having a very hard time trying to write out an xml file from asp vbscript to a text file using the Scripting.FileSystemObject. The issue is the less than and greater than chars. In order for me to add these characters to variables in the code i need to use &lt ; &gt ;. This causes a problem when writing the text. The results look like this
<copyright>request copyright</copyright>
<lastBuildDate>10/26/2012</lastBuildDate>
proper format should be as such
<copyright>request copyright</copyright>
<lastBuildDate>10/26/2012</lastBuildDate>
Is there some sort of trick to converting those segments while writing the text file, or do i need to do something a bit more extravagant?
Thanks in advance!
When writing in the TextStream, you could just surround your variables with two calls to Replace
TextStream.Write Replace(Replace(myString, "<","<"),">",">")
This way the variables aren't altered, but the written out data uses the right characters.
Try this:
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.CharSet = "utf-8"
objStream.Open
objStream.WriteText "testdata"
objStream.SaveToFile "C:\test.txt", 2

Restrict user input to characters in IBM System i 00280 code page

We need to restrict user input in a classic ASP web site to the characters allowed by the 00280 code page of IBM System i.
Is there a way to do it in a sane way besides having a (JavaScript|VBScript) function checking every character of an input string against a string of allowed characters?
A basic classic ASP function I thought of:
Function CheckInput(text, replacement)
Dim output : output = ""
Dim haystack : haystack = "abcd.. " ' Insert here the allowed characters.
Dim i : i = 0
For i = 1 To Len(text)
Dim needle : needle = Mid(text, i, 1)
If InStr(haystack, needle) = 0 Then
needle = replacement
End If
output = output & needle
Next
CheckInput = output
End Function
Would - in my function - a RegExp be an overkill?
The short answer to your first question is: No. To your second question: RegEx might not help you here because not all RegEx implementation in browsers will support the characters you need to test and neither does VBScript version of RegEx.
Even using the code approach you are proposing would need some very careful thought. In order to be able to place the set of characters you want to support in as string literal the codepage that you save the ASP file would need to be one that covers all the characters needed or alternatively you would need to use AscW to help you build a string containing those characters.
One slightly simpler approach would be to use Javascript and have the page charset and codepage set to UTF-8. This would allow you to create a string literal containing anyset of characters.
Since it is generally not considered secure to rely on browser validation, you should consider changing your IBM i (formerly OS/400) application interface to accept UCS-2 data, and perform any necessary validation and conversion at the server side.

ASP Readline non-standard Line Endings

I'm using the ASP Classic ReadLine() function of the File System Object.
All has been working great until someone made their import file on a Mac in TextEdit.
The line endings aren't the same, and ReadLine() reads in the entire file, not just 1 line at a time.
Is there a standard way of handling this? Some sort of page directive, or setting on the File System Object?
I guess that I could read in the entire file, and split on vbLF, then for each item, replace vbCR with "", then process the lines, one at a time, but that seems a bit kludgy.
I have searched all over for a solution to this issue, but the solutions are all along the lines of "don't save the file with Mac[sic] line endings."
Anyone have a better way of dealing with this problem?
There is no way to change the behaviour of ReadLine, it will only recognize CRLF as a line terminator. Hence the only simply solution is the one you have already described.
Edit
Actually there is another library that ought to be available out of the box on an ASP server that might offer some help. That is the ADODB library.
The ADODB.Stream object has a LineSeparator property that can be assigned 10 or 13 to override the default CRLF it would normally use. The documentation is patchy because it doesn't describe how this can be used with ReadText. You can get the ReadText method to return the next line from the stream by passing -2 as its parameter.
Take a look at this example:-
Dim sLine
Dim oStreamIn : Set oStreamIn = CreateObject("ADODB.Stream")
oStreamIn.Type = 2 '' # Text
oStreamIn.Open
oStreamIn.CharSet = "Windows-1252"
oStreamIn.LoadFromFile "C:\temp\test.txt"
oStreamIn.LineSeparator = 10 '' # Linefeed
Do Until oStreamIn.EOS
sLine = oStreamIn.ReadText(-2)
'' # Do stuff with sLine
Loop
oStreamIn.Close
Note that by default the CharSet is unicode so you will need to assign the correct CharSet being used by the file if its not Unicode. I use the word "Unicode" in the sense that the documentation does which actually means UTF-16. One advantage here is that ADODB Stream can handle UTF-8 unlike the Scripting library.
BTW, I thought MACs used a CR for line endings? Its Unix file format that uses LFs isn't it?

Resources