Why do URL parameters use %-encoding instead of a simple escape character - http

For example, in Unix, a backslash (\) is a common escape character. So to escape a full stop (.) in a regular expression, one does this:
\.
But with % encoding URL parameters, we have an escape character, %, and a control code, so an ampersand (&) doesn't become:
%&
Instead, it becomes:
%26
Any reason why? Seems to just make things more complicated, on the face of it, when we could just have one escape character and a mechanism to escape itself where necessary:
%%
Then it'd be:
simpler to remember; we just need to know which characters to escape, not which to escape and what to escape them to
encoding-agnostic, as we wouldn't be sending an ASCII or Unicode representation explicitly, we'd just be sending them in the encoding the rest of the URL is going in
easy to write an encoder: s/[!\*'();:#&=+$,/?#\[\] "%-\.<>\\^_`{|}~]/%&/g (untested!)
better because we could switch to using \ as an escape character, and life would be simpler and it'd be summer all year long
I might be getting carried away now. Someone shoot me down? :)
EDIT: replaced two uses of "delimiter" with "escape character".

Percent encoding happens not only to escape delimiters, but also so that you can transport bytes that are not allowed inside URIs (such as control characters or non-ASCII characters).

I guess it's because the URL Specification and specifically the HTTP part of it, only allow certain characters so to escape those one must replace them with characters that are allowed.
Also some allowed characters have special meanings like & and ? etc
so replacing them with a control code seems the only way to solve it
If you find it hard to recognize them, bookmark this page
http://www.w3schools.com/tags/ref_urlencode.asp

Related

Escaping backslash (\) in string or paths in R

Windows copies path with backslash \, which R does not accept. So, I wanted to write a function which would convert \ to /. For example:
chartr0 <- function(foo) chartr('\','\\/',foo)
Then use chartr0 as...
source(chartr0('E:\RStuff\test.r'))
But chartr0 is not working. I guess, I am unable to escape /. I guess escaping / may be important in many other occasions.
Also, is it possible to avoid the use chartr0 every time, but convert all path automatically by creating an environment in R which calls chartr0 or use some kind of temporary use like using options
From R 4.0.0 you can use r"(...)" to write a path as raw string constant, which avoids the need for escaping:
r"(E:\RStuff\test.r)"
# [1] "E:\\RStuff\\test.r"
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
Your fundamental problem is that R will signal an error condition as soon as it sees a single back-slash before any character other than a few lower-case letters, backslashes themselves, quotes or some conventions for entering octal, hex or Unicode sequences. That is because the interpreter sees the back-slash as a message to "escape" the usual translation of characters and do something else. If you want a single back-slash in your character element you need to type 2 backslashes. That will create one backslash:
nchar("\\")
#[1] 1
The "Character vectors" section of _Intro_to_R_ says:
"Character strings are entered using either matching double (") or single (') quotes, but are printed using double quotes (or sometimes without quotes). They use C-style escape sequences, using \ as the escape character, so \ is entered and printed as \, and inside double quotes " is entered as \". Other useful escape sequences are \n, newline, \t, tab and \b, backspace—see ?Quotes for a full list."
?Quotes
chartr0 <- function(foo) chartr('\\','/',foo)
chartr0('E:\\RStuff\\test.r')
You cannot write E:\Rxxxx, because R believes R is escaped.
The problem is that every single forward slash and backslash in your code is escaped incorrectly, resulting in either an invalid string or the wrong string being used. You need to read up on which characters need to be escaped and how. Take a look at the list of escape sequences in the link below. Anything not listed there (such as the forward slash) is treated literally and does not require any escaping.
http://cran.r-project.org/doc/manuals/R-lang.html#Literal-constants

RegEx for Client-Side Validation of FileUpload

I'm trying to create a RegEx Validator that checks the file extension in the FileUpload input against a list of allowed extensions (which are user specified). The following is as far as I have got, but I'm struggling with the syntax of the backward slash (\) that appears in the file path. Obviously the below is incorrect because it just escapes the (]) which causes an error. I would be really grateful for any help here. There seems to be a lot of examples out there, but none seem to work when I try them.
[a-zA-Z_-s0-9:\]+(.pdf|.PDF)$
To include a backslash in a character class, you need to use a specific escape sequence (\b):
[a-zA-Z_\s0-9:\b]+(\.pdf|\.PDF)$
Note that this might be a bit confusing, because outside of character classes, \b represents a word boundary. I also assumed, that -s was a typo and should have represented a white space. (otherwise it shouldn't compile, I think)
EDIT: You also need to escape the dots. Otherwise they will be meta character for any character but line breaks.
another EDIT: If you actually DO want to allow hyphens in filenames, you need to put the hyphen at the end of the character class. Like this:
[a-zA-Z_\s0-9:\b-]+(\.pdf|\.PDF)$
You probably want to use something like
[a-zA-Z_0-9\s:\\-]+\.[pP][dD][fF]$
which is same as
[\w\s:\\-]+\.[pP][dD][fF]$
because \w = [a-zA-Z0-9_]
Be sure character - to put as very first or very last item in the [...] list, otherwise it has special meaning for range or characters, such as a-z.
Also \ character has to be escaped by another slash, even inside of [...].

HttpServerUtility.UrlPathEncode vs HttpServerUtility.UrlEncode

What's the difference between HttpServerUtility.UrlPathEncode and HttpServerUtility.UrlEncode? And when should I choose one over the other?
UrlEncode is useful for query string values (so to the left or especially, right, of each =).
In this url, foo, fooval, bar, and barval should EACH be UrlEncode'd separately:
http://www.example.com/whatever?foo=fooval&bar=barval
UrlEncode encodes everything, such as ?, &, =, and /, accented or other non-ASCII characters, etc, into %-style encoding, except space which it encodes as a +. This is form-style encoding, and is best for something you intend to put in the querystring (or maybe between two slashes in a url) as a parameter without it getting all jiggy with the url's control characters (like &). Otherwise an unfortunately placed & or = in a user's form input or db value value could break things.
EDIT: Uri.EscapeDataString is a very close match to UrlEncode, and may be preferable, though I don't know the exact differences.
UrlPathEncode is useful for the rest of the query string, it affects everything to the left of the ?.
In this url, the entire url (from http to barval) should be run through UrlPathEncode.
http://www.example.com/whatever?foo=fooval&bar=barval
UrlPathEncode does NOT encode ?, &, =, or /. It DOES, however, like UrlEncode, encode accented/non-ASCII characters with % notation, and space also becomes %20. This is useful to make sure the url is valid, since spaces and accented characters are not. It won't touch your querystring (everything to the right of ?), so you have to encode that with UrlEncode, above.
Update: as of 4.5, per MSDN reference, Microsoft recommends to only use UrlEncode. Also, the information previously listed in MSDN does not fully describe behavior of the two methods - see comments.
The difference is all in the space escaping - UrlEncode escapes them into + sign, UrlPathEncode escapes into %20. + and %20 are only equivalent if they are part of QueryString portion per W3C. So you can't escape whole URL using + sign, only querystring portion. Bottom line is that UrlPathEncode is always better imho
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
To explain it as simply as possible:
HttpUtility.UrlPathEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http://www.foo.com/a%20b/?eggs=ham&bacon=1
and
HttpUtility.UrlEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http%3a%2f%2fwww.foo.com%2fa+b%2f%3feggs%3dham%26bacon%3d1

URL encoding yes/or no?

I have a restful webservice which receives some structured data which is put straight into a database.
The data is send from an OS using wget. I am just wondering whether I actually need to URL encode the data and if so why? Please note that it is no problem to do it but it might be uneccessary in this scenario.
If your data has characters that aren't allowed in urls, you should url encode it.
The following characters are either reserved (like &) or just present the possibility of confusing code. If your data contains these characters, urlencode it. Remember if you are using any extended ascii characters, unicode characters or non-printable characters you should url-encode your data.
Dollar ("$")
Ampersand ("&")
Plus ("+")
Comma (",")
Forward slash/Virgule ("/")
Colon (":")
Semi-colon (";")
Equals ("=")
Question mark ("?")
'At' symbol ("#")
Space
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
'Pound' character ("#")
Percent character ("%")
Left Curly Brace ("{")
Right Curly Brace ("}")
Vertical Bar/Pipe ("|")
Backslash ("\")
Caret ("^")
Tilde ("~")
Left Square Bracket ("[")
Right Square Bracket ("]")
Grave Accent ("`")
More info can be found here: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm

Regular Expression to limit string length

I have an issue where I need to use a RegularExpressionValidator to limit the length of a string to 400 Characters.
My expression was .{0,400}
My question: Is there a way to limit the length of characters to 400 without taking into consideration blank spaces?
I want to be able to accept blank spaces in the string but not count it in the length. Is this possible?
I pretty much agree with Greg, but here's the regex you want:
^\s*([^\s]\s*){0,400}$
#Boopid: If you really meant only the space character, replace \s with a space in the regex.
It sounds like you might want to write your own validator class instead of using the RegularExpressionValidator. Regular expressions certainly have their uses, but this doesn't sound like one of them.
Your custom validator could remove all the spaces, then check the length of the string. Ultimately, the code will be more readable than a regular expression that does the same thing.

Resources