Why is this looping infinitely? - asp.net

So I just got my site kicked off the server today and I think this function is the culprit. Can anyone tell me what the problem is? I can't seem to figure it out:
Public Function CleanText(ByVal str As String) As String
'removes HTML tags and other characters that title tags and descriptions don't like
If Not String.IsNullOrEmpty(str) Then
'mini db of extended tags to get rid of
Dim indexChars() As String = {"<a", "<img", "<input type=""hidden"" name=""tax""", "<input type=""hidden"" name=""handling""", "<span", "<p", "<ul", "<div", "<embed", "<object", "<param"}
For i As Integer = 0 To indexChars.GetUpperBound(0) 'loop through indexchars array
Dim indexOfInput As Integer = 0
Do 'get rid of links
indexOfInput = str.IndexOf(indexChars(i)) 'find instance of indexChar
If indexOfInput <> -1 Then
Dim indexNextLeftBracket As Integer = str.IndexOf("<", indexOfInput) + 1
Dim indexRightBracket As Integer = str.IndexOf(">", indexOfInput) + 1
'check to make sure a right bracket hasn't been left off a tag
If indexNextLeftBracket > indexRightBracket Then 'normal case
str = str.Remove(indexOfInput, indexRightBracket - indexOfInput)
Else
'add the right bracket right before the next left bracket, just remove everything
'in the bad tag
str = str.Insert(indexNextLeftBracket - 1, ">")
indexRightBracket = str.IndexOf(">", indexOfInput) + 1
str = str.Remove(indexOfInput, indexRightBracket - indexOfInput)
End If
End If
Loop Until indexOfInput = -1
Next
End If
Return str
End Function

Wouldn't something like this be simpler? (OK, I know it's not identical to posted code):
public string StripHTMLTags(string text)
{
return Regex.Replace(text, #"<(.|\n)*?>", string.Empty);
}
(Conversion to VB.NET should be trivial!)
Note: if you are running this often, there are two performance improvements you can make to the Regex.
One is to use a pre-compiled expression which requires re-writing slightly.
The second is to use a non-capturing form of the regular expression; .NET regular expressions implement the (?:) syntax, which allows for grouping to be done without incurring the performance penalty of captured text being remembered as a backreference. Using this syntax, the above regular expression could be changed to:
#"<(?:.|\n)*?>"

This line is also wrong:
Dim indexNextLeftBracket As Integer = str.IndexOf("<", indexOfInput) + 1
It's guaranteed to always set indexNextLeftBracket equal to indexOfInput, because at this point the character at the position referred to by indexOfInput is already always a '<'. Do this instead:
Dim indexNextLeftBracket As Integer = str.IndexOf("<", indexOfInput+1) + 1
And also add a clause to the if statement to make sure your string is long enough for that expression.
Finally, as others have said this code will be a beast to maintain, if you can get it working at all. Best to look for another solution, like a regex or even just replacing all '<' with <.

In addition to other good answers, you might read up a little on loop invariants a little bit. The pulling out and putting back stuff to the string you check to terminate your loop should set off all manner of alarm bells. :)

Just a guess, but is this like the culprit?
indexOfInput = str.IndexOf(indexChars(i)) 'find instance of indexChar
Per the Microsoft docs, Return Value -
The index position of value if that string is found, or -1 if it is not. If value is Empty, the return value is 0.
So perhaps indexOfInput is being set to 0?

What happens if your code tries to clean the string <a?
As I read it, it finds the indexChar at position 0, but then indexNextLeftBracket and indexRightBracket both equal 0, you fall into the else condition, and then you insert a ">" at position -1, which will presumably insert at the beginning, giving you the string ><a. The new indexRightBracket then becomes 0, so you delete from position 0 for 0 characters, leaving you with ><a. Then the code finds the <a in the code again, and you're off to the races with an infinite memory-consuming loop.
Even if I'm wrong, you need to get yourself some unit tests to reassure yourself that these edge cases work properly. That should also help you find the actual looping code if I'm off-base.
Generally speaking though, even if you fix this particular bug, it's never going to be very robust. Parsing HTML is hard, and HTML blacklists are always going to have holes. For instance, if I really want to get a <input type="hidden" name="tax" tag in, I'll just write it as <input name="tax" type="hidden" and your code will ignore it. Your better bet is to get an actual HTML parser involved, and to only allow the (very small) subset of tags that you actually want. Or even better, use some other form of markup, and strip all HTML tags (again using a real HTML parser of some description).

I'd have to run it through a real compiler but the mindpiler tells me that the str = str.Remove(indexOfInput, indexRightBracket - indexOfInput) line is re-generating an invalid tag such that when you loop through again it finds the same mistake "fixes" it, tries again, finds the mistake "fixes" it, etc.
FWIW heres a snippet of code that removes unwanted HTML tags from a string (It's in C# but the concept translates)
public static string RemoveTags( string html, params string[] allowList )
{
if( html == null ) return null;
Regex regex = new Regex( #"(?<Tag><(?<TagName>[a-z/]+)\S*?[^<]*?>)",
RegexOptions.Compiled |
RegexOptions.IgnoreCase |
RegexOptions.Multiline );
return regex.Replace(
html,
new MatchEvaluator(
new TagMatchEvaluator( allowList ).Replace ) );
}
MatchEvaluator class
private class TagMatchEvaluator
{
private readonly ArrayList _allowed = null;
public TagMatchEvaluator( string[] allowList )
{
_allowed = new ArrayList( allowList );
}
public string Replace( Match match )
{
if( _allowed.Contains( match.Groups[ "TagName" ].Value ) )
return match.Value;
return "";
}
}

That doesn't seem to work for a simplistic <a<a<a case, or even <a>Test</a>. Did you test this at all?
Personally, I hate string parsing like this - so I'm not going to even try figuring out where your error is. It'd require a debugger, and more headache than I'm willing to put in.

Related

How do I delete characters in a string up to a certain point in classic asp?

I have a string that at any point may or may not contain one or more / characters. I'd like to be able to create a new string based on this string. The new string would include every character after the very last / in the original string.
Sounds like you're wanting the file name from a URL. In any case, it's the same function. The key is using the InStrRev function to find the first / char, but starting from the right. Here's the function:
Function GetFilename(URL)
Dim I
I = InStrRev(URL, "/")
If I > 0 Then
GetFilename = Mid(URL, I + 1)
Else
GetFilename = URL
End If
End Function
Split it up into parts and get the last part:
a = split("my/string/thing", "/")
wscript.echo a(ubound(a))
note: Not safe when the string is empty.

Adding comma separators to numbers, asp.net

I'm trying to add comma separators to a number. I've tried the advice here: add commas using String.Format for number and and here: .NET String.Format() to add commas in thousands place for a number but I can't get it to work - they just return the number without commas. The code I'm using is here:
public static string addCommas(string cash) {
return string.Format("{0:n0}", cash).ToString();
}
Where am I going wrong?
Thanks.
Update: Hi all, thanks for your help, but all of those methods are returning the same error: "error CS1502: The best overloaded method match for 'BishopFlemingFunctions.addCommas(int)' has some invalid arguments" (or variations therof depending on what number type I'm using.) Any ideas?
Well, you are sending in a string. it looks like you want a currency back
Why are you passing in a string to the method if it is a numeric value?
String.Format will return a string so there is not need to .ToString() it again.
{0:c} = Currency format if you do not want the $ then use {0:n}
Not sure you have to but you may need to do an explicit conversion if you pass it in as a string to (decimal)cash
return String.Format("{0:c}", (decimal)cash);
or
return String.Format("{0:n}", (decimal)cash);
but i think it should be something like:
public static string addCommas(decimal cash)
{
return String.Format("{0:c}", cash);
}
but this is such a simple statement i do not see the logic in making it a method, if you method is one line, in most cases, its not a method.
In order to apply number formatting you have to pass cash as a number type (int, double, float etc)
Note the cash parameter is of type double and the .## at the end of the formatted string for cents.
EDIT
Here is the code in its entirety:
static class Program {
static void Main() {
double d = 123456789.7845;
string s = addCommas(d);
System.Console.WriteLine(s);
}
public static string addCommas(double cash) {
return string.Format("${0:#,###0.##}", cash);
}
}
This prints "$123,456,789.78" to console. If you're getting
error CS1502: The best overloaded
method match for 'addCommas(double)'
has some invalid arguments
check to make sure that you're calling the function properly and that you're actually passing in the correct data type. I encourage you to copy/paste the code I have above and run it - BY ITSELF.
i have a method on my custom class to convert any numbers
public static string ConvertToThosandSepratedNumber(object number)
{
string retValue = "";
retValue = string.Format("{0:N0}", Convert.ToDecimal(number));
return retValue;
}
Here is a fairly efficient way to Add commas for thousands place, etc.
It is written in VB.net.
It does not work for negative numbers.
Public Function AddCommas(number As Integer) As String
Dim s As String = number.ToString()
Dim sb As New StringBuilder(16)
Dim countHead As Integer = s.Length Mod 3
If countHead = 0 Then countHead = 3
sb.Append(s.Substring(0, countHead))
For I As Integer = countHead To s.Length - 1 Step 3
sb.Append(","c)
sb.Append(s.Substring(I, 3))
Next
Return sb.ToString()
End Function

How to insert complex strings into Actionscript?

How to insert complex strings into Actionscript?
So I have a string
-vvv -I rc w:// v dv s="60x40" --ut="#scode{vcode=FV1,acode=p3,ab=128,ch=2,rate=4400}:dup{dt=st{ac=http{mime=v/x-flv},mux=mpeg{v},dt=:80/sm.fv}}"
How to insert it into code like
public var SuperPuperComplexString:String = new String();
SuperPuperComplexString = TO_THAT_COMPLEX_STRING;
That string has so many problems like some cart of it can happen to be like regexp BUTI DO NOT want it to be parsed as any kind of reg exp - I need It AS IT IS!)
How to put that strange string into variable (put it not inputing it thru UI - hardcode it into AS code)?
SInce the only problem characters in your string are the double quotes ( " " ), just enclose your String in single quotes ( ' ' ). That will solve any problems.
It also depends on how you are loading that string into your code, as well.
You could even go so far as to encase that String in XML CDATA to ensure it's all delimited for when you want to use it.
var myString:XML = new XML();
myString = "<string><![CDATA[-vvv -I rc w:// v dv s="60x40" --ut="#scode{vcode=FV1,acode=p3,ab=128,ch=2,rate=4400}:dup{dt=st{ac=http{mime=v/x-flv},mux=mpeg{v},dt=:80/sm.fv}}"]]></string>"
Then, you can access it as a string from anywhere by just referencing myString.
var myString:String = '-vvv -I rc w:// v dv s="60x40" --ut="#scode{vcode=FV1,acode=p3,ab=128,ch=2,rate=4400}:dup{dt=st{ac=http{mime=v/x-flv},mux=mpeg{v},dt=:80/sm.fv}}"'
Not sure where your problem is..

Best Regular Expression for Email Format Validation with ASP.NET 3.5 Validation

I've used both of the following Regular Expressions for testing for a valid email expression with ASP.NET validation controls. I was wondering which is the better expression from a performance standpoint, or if someone has better one.
- \w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
- ^([0-9a-zA-Z]([-\.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$
I'm trying avoid the "exponentially slow expression" problem described on the BCL Team Blog.
UPDATE
Based on feedback I ended up creating a function to test if an email is valid:
Public Function IsValidEmail(ByVal emailString As String, Optional ByVal isRequired As Boolean = False) As Boolean
Dim emailSplit As String()
Dim isValid As Boolean = True
Dim localPart As String = String.Empty
Dim domainPart As String = String.Empty
Dim domainSplit As String()
Dim tld As String
If emailString.Length >= 80 Then
isValid = False
ElseIf emailString.Length > 0 And emailString.Length < 6 Then
'Email is too short
isValid = False
ElseIf emailString.Length > 0 Then
'Email is optional, only test value if provided
emailSplit = emailString.Split(CChar("#"))
If emailSplit.Count <> 2 Then
'Only 1 # should exist
isValid = False
Else
localPart = emailSplit(0)
domainPart = emailSplit(1)
End If
If isValid = False OrElse domainPart.Contains(".") = False Then
'Needs at least 1 period after #
isValid = False
Else
'Test Local-Part Length and Characters
If localPart.Length > 64 OrElse ValidateString(localPart, ValidateTests.EmailLocalPartSafeChars) = False OrElse _
localPart.StartsWith(".") OrElse localPart.EndsWith(".") OrElse localPart.Contains("..") Then
isValid = False
End If
'Validate Domain Name Portion of email address
If isValid = False OrElse _
ValidateString(domainPart, ValidateTests.HostNameChars) = False OrElse _
domainPart.StartsWith("-") OrElse domainPart.StartsWith(".") OrElse domainPart.Contains("..") Then
isValid = False
Else
domainSplit = domainPart.Split(CChar("."))
tld = domainSplit(UBound(domainSplit))
' Top Level Domains must be at least two characters
If tld.Length < 2 Then
isValid = False
End If
End If
End If
Else
'If no value is passed review if required
If isRequired = True Then
isValid = False
Else
isValid = True
End If
End If
Return isValid
End Function
Notes:
IsValidEmail is more restrictive about characters allowed then the RFC, but it doesn't test for all possible invalid uses of those characters
If you're wondering why this question is generating so little activity, it's because there are so many other issues that should be dealt with before you start thinking about performance. Foremost among those is whether you should be using regexes to validate email addresses at all--and the consensus is that you should not. It's much trickier than most people expect, and probably pointless anyway.
Another problem is that your two regexes vary hugely in the kinds of strings they can match. For example, the second one is anchored at both ends, but the first isn't; it would match ">>>>foo#bar.com<<<<" because there's something that looks like an email address embedded in it. Maybe the framework forces the regex to match the whole string, but if that's the case, why is the second one anchored?
Another difference is that the first regex uses \w throughout, while the second uses [0-9a-zA-Z] in many places. In most regex flavors, \w matches the underscore in addition to letters and digits, but in some (including .NET) it also matches letters and digits from every writing system known to Unicode.
There are many other differences, but that's academic; neither of those regexes is very good. See here for a good discussion of the topic, and a much better regex.
Getting back to the original question, I don't see a performance problem with either of those regexes. Aside from the nested-quantifiers anti-pattern cited in that BCL blog entry, you should also watch out for situations where two or more adjacent parts of the regex can match the same set of characters--for example,
([A-Za-z]+|\w+)#
There's nothing like that in either of the regexes you posted. Parts that are controlled by quantifiers are always broken up by other parts that aren't quantified. Both regexes will experience some avoidable backtracking, but there are many better reasons than performance to reject them.
EDIT: So the second regex is subject to catastrophic backtracking; I should have tested it thoroughly before shooting my mouth off. Taking a closer look at that regex, I don't see why you need the outer asterisk in the first part:
[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*
All that bit does is make sure the first and last characters are alphanumeric while allowing some additional characters in between. This version does the same thing, but it fails much more quickly when no match is possible:
[0-9a-zA-Z][-.\w]*[0-9a-zA-Z]
That would probably suffice to eliminate the backtracking problem, but you could also make the part after the "#" more efficient by using an atomic group:
(?>(?:[0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+)[a-zA-Z]{2,9}
In other words, if you've matched all you can of substrings that look like domain components with trailing dots, and the next part doesn't look like a TLD, don't bother backtracking. The first character you would have to give up is the final dot, and you know [a-zA-Z]{2,9} won't match that.
We use this RegEx which has been tested in-house against 1.5 million addresses. It correctly identifies better than 98% of ours, but there are some formats that I'm aware of that it would error on.
^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$
We also make sure that there are no EOL characters in the data since an EOL can fake out this RegEx. Our Function:
Public Function IsValidEmail(ByVal strEmail As String) As Boolean
' Check An eMail Address To Ensure That It Is Valid
Const cValidEmail = "^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$" ' 98% Of All Valid eMail Addresses
IsValidEmail = False
' Take Care Of Blanks, Nulls & EOLs
strEmail = Replace(Replace(Trim$(strEmail & " "), vbCr, ""), vbLf, "")
' Blank eMail Is Invalid
If strEmail = "" Then Exit Function
' RegEx Test The eMail Address
Dim regEx As New System.Text.RegularExpressions.Regex(cValidEmail)
IsValidEmail = regEx.IsMatch(strEmail)
End Function
I am a newbie, but I tried the following and it seemed to have limited the ".xxx" to only two occurrences or less, after the symbol '#'.
^([a-zA-Z0-9]+[a-zA-Z0-9._%-]*#(?:[a-zA-Z0-9-])+(\.+[a-zA-Z]{2,4}){1,2})$
Note: I had to substitute single '\' with double '\\' as I am using this reg expr in R.
These don't check for all allowable email addresses according to the email address RFC.
I let MS to do the work for me:
Public Function IsValidEmail(ByVal emailString As String) As Boolean
Dim retval As Boolean = True
Try
Dim address As New System.Net.Mail.MailAddress(emailString)
Catch ex As Exception
retval = False
End Try
Return retval
End Function
For server side validation, I found Phil Haack's solution to be one of the better ones. His attempt was to stick to the RFC:
string pattern = #"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|"
+ #"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)"
+ #"#[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
return regex.IsMatch(emailAddress);
Details:
http://blog.degree.no/2013/01/email-validation-finally-a-net-regular-expression-that-works/
Just to contribute, I am using this regex.
^([a-zA-Z0-9]+[a-zA-Z0-9._%-]*#(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,4})$
The thing about it is the specifications are changing with each domain extension that is introduced.
You sit here mod your regex, test, test, test, and more testing. You finally get what you "think" is accurate then the specification changes... You update your regex to account for what the new requirements are..
Then someone enters aa#aa.aa and you've done all that work for what? It walks through your fancy regex.. bummer!
You may as well just check for a single #, and a "." and move on. I assure you, you will not get someones email if they do not want to give it up. You'll get garbage or their hotmail account they never check and couldn't care less about.
I've seen in many cases this goes horribly wrong and a client calls up because their own email address is rejected because of a poorly crafted regex check. Which as mentioned shouldn't have even been attempted.
TextBox :-
<asp:TextBox ID="txtemail" runat="server" CssClass="form-control pantxt" Placeholder="Enter Email Address"></asp:TextBox>
Required Filed validator:
<asp:RequiredFieldValidator ID="RequiredFieldValidator9" runat="server" ControlToValidate="txtemail" ErrorMessage="Required"></asp:RequiredFieldValidator>
Regular Expression for email validation :
<asp:RegularExpressionValidator ID="validateemail" runat="server" ControlToValidate="txtemail" ValidationExpression="\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*" ErrorMessage="Invalid Email"></asp:RegularExpressionValidator>
Use this regular expression for email validation in asp.net

How do you call a method from a variable in ASP Classic?

For example, how can I run me.test below?
myvar = 'test'
me.myvar
ASP looks for the method "myvar" and doesn't find it. In PHP I could simply say $me->$myvar but ASP's syntax doesn't distinguish between variables and methods. Suggestions?
Closely related to this, is there a method_exists function in ASP Classic?
Thanks in advance!
EDIT: I'm writing a validation class and would like to call a list of methods via a pipe delimited string.
So for example, to validate a name field, I'd call:
validate("required|min_length(3)|max_length(100)|alphanumeric")
I like the idea of having a single line that shows all the ways a given field is being validated. And each pipe delimited section of the string is the name of a method.
If you have suggestions for a better setup, I'm all ears!
You can achieve this in VBScript by using the GetRef function:-
Function Test(val)
Test = val & " has been tested"
End Function
Dim myvar : myvar = "Test"
Dim x : Set x = GetRef(myvar)
Response.Write x("Thing")
Will send "Thing has been tested" to the client.
So here is your validate requirement using GetRef:-
validate("Hello World", "min_length(3)|max_length(10)|alphanumeric")
Function required(val)
required = val <> Empty
End Function
Function min_length(val, params)
min_length = Len(val) >= CInt(params(0))
End Function
Function max_length(val, params)
max_length = Len(val) <= CInt(params(0))
End Function
Function alphanumeric(val)
Dim rgx : Set rgx = New RegExp
rgx.Pattern = "^[A-Za-z0-9]+$"
alphanumeric = rgx.Test(val)
End Function
Function validate(val, criterion)
Dim arrCriterion : arrCriterion = Split(criterion, "|")
Dim criteria
validate = True
For Each criteria in arrCriterion
Dim paramListPos : paramListPos = InStr(criteria, "(")
If paramListPos = 0 Then
validate = GetRef(criteria)(val)
Else
Dim paramList
paramList = Split(Mid(criteria, paramListPos + 1, Len(criteria) - paramListPos - 1), ",")
criteria = Left(criteria, paramListPos - 1)
validate = GetRef(criteria)(val, paramList)
End If
If Not validate Then Exit For
Next
End Function
Having provided this I have to say though that if you are familiar with PHP then JScript would be a better choice on the server. In Javascript you can call a method like this:-
function test(val) { return val + " has been tested"; )
var myvar = "test"
Response.Write(this[myvar]("Thing"))
If you are talking about VBScript, it doesn't have that kind of functionality. (at least not to my knowledge) I might attempt it like this :
Select myvar
case "test":
test
case "anotherSub":
anotherSub
else
defaultSub
end select
It's been a while since I wrote VBScript (thank god), so I'm not sure how good my syntax is.
EDIT-Another strategy
Personally, I would do the above, for security reasons. But if you absolutely do not like it, then you may want to try using different languages on your page. I have in the past used both Javascript AND VBScript on my Classic ASP pages (both server side), and was able to call functions declared in the other language from my current language. This came in especially handy when I wanted to do something with Regular Expressions, but was in VBScript.
You can try something like
<script language="vbscript" runat="server">
MyJavascriptEval myvar
</script>
<script language="javascript" runat="server">
function MyJavascriptEval( myExpression)
{
eval(myExpression);
}
/* OR
function MyJavascriptEval( myExpression)
{
var f = new Function(myExpression);
f();
}
*/
</script>
I didn't test this in a classic ASP page, but I think it's close enough that it will work with minor tweaks.
Use the "Execute" statement in ASP/VBScript.
Execute "Response.Write ""hello world"""
PHP's ability to dynamically call or create functions are hacks that lead to poor programming practices. You need to explain what you're trying to accomplish (not how) and learn the correct way to code.
Just because you can do something, doesn't make it right or a good idea.
ASP does not support late binding in this manner. What are you trying to do, in a larger sense? Explain that, and someone can show you how to accomplish it in asp.
Additionally, you might consider "objectifying" the validation functionality. Making classes is possible (though not widely used) in VB Script.
<%
Class User
' declare private class variable
Private m_userName
' declare the property
Public Property Get UserName
UserName = m_userName
End Property
Public Property Let UserName (strUserName)
m_userName = strUserName
End Property
' declare and define the method
Sub DisplayUserName
Response.Write UserName
End Sub
End Class
%>

Resources