I have a div which will receive a CSS background image from user chosen URL, like so:
background-image: url("/* user specified URL here*/")
How should I escape the URL so that it's safe to embed in the CSS? Is escaping the quotes enough?
If you are setting the background url through JS, then the correct and safe ways is using encodeURI() and wrapping in quotes.
node.style.backgroundImage = 'url("' + encodeURI(url) + '")';
Is escaping the quotes enough?
No, you also should worry about backslashes and newlines.
Here is the CSS grammar for a double quoted URI:
http://www.w3.org/TR/CSS21/grammar.html#scanner
"([^\n\r\f\\"]|\\{nl}|{escape})"
where {nl} is
\n|\r\n|\r|\f
and {escape} is a backslash-escaped character. So a trailing backslash will break your CSS. A non-escaped newline likewise.
I would strongly recommend to remove all whitespace and finally escape " and \
Since the user data that you need to insert into CSS can be treated like a URL, and not just a string, you only need to ensure that it is properly URL-encoded.
This is safe because a well-formed URL does not contain any characters that are unsafe in CSS strings; except for apostrophe ('), which is not a problem as long as you use double quotes for your CSS string: url("...")
A simple way to do this is to URL-encode all characters that are not "reserved" or "unreserved" in URLs. According to RFC 3986, that would be all characters except for these:
A-Z a-z 0-9 ; , / ? : # & = + $ - _ . ! ~ * ' ( ) # [ ]
That is what encodeURI() does in Mārtiņš Briedis's JavaScript answer. (With one exception: encodeURI() encodes [ and ], which is mostly inconsequential.)
In addition to that, you might consider only allowing URLs that begin with https: or data:. By doing this you can prevent mixed content warnings if the page is served over HTTPS, and also avoid the javascript: issue Alexander O'Mara commented on.
There might be other URL parsing and validation that you want to do, but that is outside the scope of this question.
If you need to insert user data into a CSS string that cannot be treated like a URL, then you would need to do CSS backslash escaping. See user123444555621's answer for more on that.
const style = "background-image: url(\"" + CSS.escape(imageUrl) + "\")";
See https://developer.mozilla.org/en-US/docs/Web/API/CSS/escape
It is an experimental new thing, but it seems to be quite well supported (as of 2021).
Related
In ReStructuredText, is it possible to have emphasis and no emphasis in the same word? For example:
*emph*not-emph
leading to "emph no-emph", but with no white space in between? I can't find a way to do it, not even with a substitution.
What you are looking for is Character-Level Inline Markup. The description from the reStructuredText specification is (emphasis mine):
It is possible to mark up individual characters within a word with backslash escapes [...] Backslash escapes can be used to allow arbitrary text to immediately follow inline markup.
The two examples provided in the specification are:
For a single character immediately following inline markup:
Python ``list``\s use square bracket syntax.
For arbitrary text immediately following inline markup:
Possible in *re*\ ``Structured``\ *Text*, though not encouraged.
So to achieve the output you want, you need to use the backslash-escaped whitespace pattern:
*emph*\ not-emph
The reason this is required is because the inline markup recognition rules require that:
Inline markup end-strings must end a text block or be immediately followed by
whitespace,
one of the ASCII characters - . , : ; ! ? \ / ' " ) ] } > or
a non-ASCII punctuation character with Unicode category Pd (Dash), Po (Other), Pe (Close), Pf (Final quote), or Pi (Initial quote).
Note that the use of that pattern above is discouraged in the reStructuredText specification:
The use of backslash-escapes for character-level inline markup is not encouraged. Such use is ugly and detrimental to the unprocessed document's readability. Please use this feature sparingly and only where absolutely necessary.
For example, in Unix, a backslash (\) is a common escape character. So to escape a full stop (.) in a regular expression, one does this:
\.
But with % encoding URL parameters, we have an escape character, %, and a control code, so an ampersand (&) doesn't become:
%&
Instead, it becomes:
%26
Any reason why? Seems to just make things more complicated, on the face of it, when we could just have one escape character and a mechanism to escape itself where necessary:
%%
Then it'd be:
simpler to remember; we just need to know which characters to escape, not which to escape and what to escape them to
encoding-agnostic, as we wouldn't be sending an ASCII or Unicode representation explicitly, we'd just be sending them in the encoding the rest of the URL is going in
easy to write an encoder: s/[!\*'();:#&=+$,/?#\[\] "%-\.<>\\^_`{|}~]/%&/g (untested!)
better because we could switch to using \ as an escape character, and life would be simpler and it'd be summer all year long
I might be getting carried away now. Someone shoot me down? :)
EDIT: replaced two uses of "delimiter" with "escape character".
Percent encoding happens not only to escape delimiters, but also so that you can transport bytes that are not allowed inside URIs (such as control characters or non-ASCII characters).
I guess it's because the URL Specification and specifically the HTTP part of it, only allow certain characters so to escape those one must replace them with characters that are allowed.
Also some allowed characters have special meanings like & and ? etc
so replacing them with a control code seems the only way to solve it
If you find it hard to recognize them, bookmark this page
http://www.w3schools.com/tags/ref_urlencode.asp
I'm trying to create a RegEx Validator that checks the file extension in the FileUpload input against a list of allowed extensions (which are user specified). The following is as far as I have got, but I'm struggling with the syntax of the backward slash (\) that appears in the file path. Obviously the below is incorrect because it just escapes the (]) which causes an error. I would be really grateful for any help here. There seems to be a lot of examples out there, but none seem to work when I try them.
[a-zA-Z_-s0-9:\]+(.pdf|.PDF)$
To include a backslash in a character class, you need to use a specific escape sequence (\b):
[a-zA-Z_\s0-9:\b]+(\.pdf|\.PDF)$
Note that this might be a bit confusing, because outside of character classes, \b represents a word boundary. I also assumed, that -s was a typo and should have represented a white space. (otherwise it shouldn't compile, I think)
EDIT: You also need to escape the dots. Otherwise they will be meta character for any character but line breaks.
another EDIT: If you actually DO want to allow hyphens in filenames, you need to put the hyphen at the end of the character class. Like this:
[a-zA-Z_\s0-9:\b-]+(\.pdf|\.PDF)$
You probably want to use something like
[a-zA-Z_0-9\s:\\-]+\.[pP][dD][fF]$
which is same as
[\w\s:\\-]+\.[pP][dD][fF]$
because \w = [a-zA-Z0-9_]
Be sure character - to put as very first or very last item in the [...] list, otherwise it has special meaning for range or characters, such as a-z.
Also \ character has to be escaped by another slash, even inside of [...].
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Which type of quotes we should use in css background url (“…”)? Single, double or no quote needed?
Simple question. What quotation marks should I use in CSS?
Option #1:
background: url( 'foo.png' );
Option #2:
background: url( "foo.png" );
Both works on "normal browsers". I just want to follow the standards.
The standards say:
The format of a URI value is 'url(' followed by optional white space followed by an optional single quote (') or double quote (") character followed by the URI itself, followed by an optional single quote (') or double quote (") character followed by optional white space followed by ')'. The two quote characters must be the same.
i.e. none, single or double. If you care about IE5/Mac (which you probably don't these days) avoid ', otherwise use whatever makes you comfortable.
The standard is that you can use either for strings, and you can use either or neither for URLs. For the most part there's no real difference, but if the value you're trying to quote has double-quotes in it then it's easier to use single-quotes, and vice-versa.
Besides that, you're welcome to prefer one or the other consistently if you think it makes your files nicer and easier to edit, but nobody will shoot you for using the "wrong" one.
I use double quotes...That's how I've seen it in books and tutorials. Just be consistent in your own work
What's the difference between HttpServerUtility.UrlPathEncode and HttpServerUtility.UrlEncode? And when should I choose one over the other?
UrlEncode is useful for query string values (so to the left or especially, right, of each =).
In this url, foo, fooval, bar, and barval should EACH be UrlEncode'd separately:
http://www.example.com/whatever?foo=fooval&bar=barval
UrlEncode encodes everything, such as ?, &, =, and /, accented or other non-ASCII characters, etc, into %-style encoding, except space which it encodes as a +. This is form-style encoding, and is best for something you intend to put in the querystring (or maybe between two slashes in a url) as a parameter without it getting all jiggy with the url's control characters (like &). Otherwise an unfortunately placed & or = in a user's form input or db value value could break things.
EDIT: Uri.EscapeDataString is a very close match to UrlEncode, and may be preferable, though I don't know the exact differences.
UrlPathEncode is useful for the rest of the query string, it affects everything to the left of the ?.
In this url, the entire url (from http to barval) should be run through UrlPathEncode.
http://www.example.com/whatever?foo=fooval&bar=barval
UrlPathEncode does NOT encode ?, &, =, or /. It DOES, however, like UrlEncode, encode accented/non-ASCII characters with % notation, and space also becomes %20. This is useful to make sure the url is valid, since spaces and accented characters are not. It won't touch your querystring (everything to the right of ?), so you have to encode that with UrlEncode, above.
Update: as of 4.5, per MSDN reference, Microsoft recommends to only use UrlEncode. Also, the information previously listed in MSDN does not fully describe behavior of the two methods - see comments.
The difference is all in the space escaping - UrlEncode escapes them into + sign, UrlPathEncode escapes into %20. + and %20 are only equivalent if they are part of QueryString portion per W3C. So you can't escape whole URL using + sign, only querystring portion. Bottom line is that UrlPathEncode is always better imho
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
To explain it as simply as possible:
HttpUtility.UrlPathEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http://www.foo.com/a%20b/?eggs=ham&bacon=1
and
HttpUtility.UrlEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http%3a%2f%2fwww.foo.com%2fa+b%2f%3feggs%3dham%26bacon%3d1