For the base64-user-pass value in the Authorization header, is there a standard/de facto way to base64-encode usernames and passwords which contain code points which don't fit into an OCTET, or usernames which contain a colon (which is explicitly prohibited by RFC 2617)?
1) Yes, but only proposed and not implemented (AFAIK): https://greenbytes.de/tech/webdav/rfc7617.html#charset
2) No.
Related
I want to know whether X509Certificate CN(commonname) support with i18n characters and which are all the supported character set
I assume you are talking about the CN in the distinguished name of the issuer or subject of the X509 certificate in question.
RFC 5280 on "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" contains a definition of the allowed value for a common name AttributeTypeAndValue in a distinguished name
-- Naming attributes of type X520CommonName:
-- X520CommonName ::= DirectoryName (SIZE (1..ub-common-name))
--
-- Expanded to avoid parameterized type:
X520CommonName ::= CHOICE {
teletexString TeletexString (SIZE (1..ub-common-name)),
printableString PrintableString (SIZE (1..ub-common-name)),
universalString UniversalString (SIZE (1..ub-common-name)),
utf8String UTF8String (SIZE (1..ub-common-name)),
bmpString BMPString (SIZE (1..ub-common-name)) }
At the same time, though, it says
CAs
conforming to this profile MUST use either the PrintableString or
UTF8String encoding of DirectoryString
(DirectoryName in the ASN.1 comment above should actually be DirectoryString, cf. the errata.)
There are certain exceptions to this for the sake of backward compatibility but let's consider the general case.
Thus, the common name may either be a PrintableString or an UTF8String. The former allows only to use a small subset of the characters the latter does. So you effectively are limited to what can be represented in UTF-8.
This does not mean, though, that you can go to a CA of your choice and insist on getting a certificate with a subject common name containing the wildest Unicode characters. CAs may have limited the set of characters they allow in the subjects of certificates they issue. This might be accidental (their software for some reason may be limited to that set), intentional to allow interoperability with other legacy software, or a deliberate security measure, e.g. to prevent misuse of similar looking Unicode characters.
Such restriction may even be documented in their CA certificates by use of name constraint extensions; in that case the CA cannot circumvent the restrictions in any way.
In a URI, spaces can be encoded as +. Since this is the case, should the leading plus be encoded when creating tel URIs with international prefix?
Which is better? Do both work in practice?
Call me
Call me
No.
From section 3 of RFC 3966 (The tel URI for Telephone Numbers):
If the reserved characters "+", ";", "=", and "?" are used as delimiters between components of the "tel" URI, they MUST NOT be percent encoded.
You would only percent-encode a + if it’s part of a parameter value:
These characters ["+", ";", "=", and "?"] MUST be percent encoded if they appear in tel URI parameter values.
I’m not sure if the leading +, which indicates that it’s a global number, counts as delimiter, but the definition of a global number says:
Globally unique numbers are identified by the leading "+" character.
So it refers to +, not to something percent-encoded.
And also the examples make clear that it’s not supposed to be percent-encoded, e.g.:
tel:+1-201-555-0123
Note that spaces in tel URIs (e.g., in parameter values) may not be encoded with a +. Using + instead of %20 for a space character is not something that may be done in any URI; it’s only possible in URIs whose URI scheme explicitly defines that.
The tel: URI scheme doesn't have a provision for encoding spaces - see RFC 3966:
5.1.1. Separators in Phone Numbers
...
even though ITU-T E.123 [E.123] recommends the use of space
characters as visual separators in printed telephone numbers, "tel"
URIs MUST NOT use spaces in visual separators to avoid excessive
escaping.
The plus sign encodes a space specifically only in application/x-www-form-urlencoded (default content type for form submission - see W3C info re: forms). There's no valid way to encode a space in tel: URIs. See again RFC 3966 (page 5) for valid visual separators.
I know that URI supports the following syntax:
http://[user]:[password]#[domain.tld]
When there is no password or if the password is empty, is there a colon?
In other words, should I accept this:
http://[user]:#[domain.tld]
Or this:
http://[user]#[domain.tld]
Or are they both valid?
The current URI standard (STD 66) is RFC 3986, and the relevant section is 3.2.1. User Information.
There it’s defined that the userinfo subcomponent (which gets followed by #) can contain any combination of
the character :,
percent-encoded characters, and
characters from the sets unreserved and sub-delims.
So this means that both of your examples are valid.
However, note that the format user:password is deprecated. Anyway, they give recommendations how applications should handle such URIs, i.e., everything after the first : character should not be displayed by applications, unless
the data after the colon is the empty string (indicating no password).
So according to this recommendation, the userinfo subcomponent user: indicates that there is the username "user" and no password.
This is more like convenience and both are valid. I would go with http://[user]#[domain.tld] (and prompt for a password.) because it's simple and not ambiguous. It does not give any chance for user to think if he has to add anything after :
If I have a URL with query parameters, is it valid to "escape" the & query parameter delimiter?
Ex.
go
vs
go
RFC 2396 clearly states that use of "&" is proper, but i cant find anything on the (in)validity of using escaped versions of the reserved characters.
One thing i noticed is Chrome seems to forgive them when clicking on the link in the browser, however when i view source of the page, and click on the link (/foo.html?cat=meow&dog=woof) from the view-source view, it doesn't work.
I'd love to know if there is any spec/section i can point to that says "only use & and dont use & or %26 (which is & URL encoded).
(Note: this question arises as I started working w a code base that structures their URLs in this fashion, I would personally use '&')
RCF 2396: http://www.ietf.org/rfc/rfc2396.txt
UPDATE 1
Correct - the actual URL that the server writes to the page is: < a href="/foo.html?cat=meow&dog=woof" >go< /a > .. is there a spec that speaks to the validity of using & as a query param delimiter? Im not looking for "what works mostly" in browsers, but what is the correct way(s) to delimit query params.
TLDR; All formulations that evaluate to & are equally valid.
From OP's link:
Unlike many specifications that use a BNF-like grammar to define the bytes (octets) allowed by a protocol, the URI grammar is defined in terms of characters. Each literal in the grammar corresponds to the character it represents, rather than to the octet encoding of that character in any particular coded character set. How a URI is represented in terms of bits and bytes on the wire is dependent upon the character encoding of the protocol used to transport it, or the charset of the document which contains it.
-- RFC: 2396 - Uniform Resource Identifiers (URI): Generic Syntax August 1998
by
T. Berners-Lee*
MIT/LCS
R. Fielding
U.C. Irvine
L. Masinter
Xerox Corporation
*: how cool is that!
The escaping is happening in HTML - when you click on such a link, the browser will treat & as &.
To encode & on the URL you can percent encode it to %26.
First, some quick background... As part of an integration with a third party vendor, I have a C# .Net web application that receives a URL with a bunch of information in the query string. That URL is signed with an MD5 hash and a shared secret key. Basically, I pull in the query string, remove their hash, perform my own hash on the remaining query string, and make sure mine matches the one that was supplied.
I'm retrieving the Uri in the following way...
Uri uriFromVendor = new Uri(Request.Url.ToString());
string queryFromVendor = uriFromVendor.Query.Substring(1); //Substring to remove question mark
My issue is stemming from query strings that contain special characters like an umlaut (ü). The vendor is calculating their hash based on the RFC 2396 representation which is %FC. My C# .Net app is calculating it's hash based on the RFC 3986 representation which is %C3%BC. Needless to say, our hashes don't match, and I throw my errors.
Strangely, the documentation for the Uri class in .Net says that it should follow RFC 2396 unless otherwise set to RFC 3986, but I don't have the entry in my web.config file that they say is required for this behavior.
How can I force the Uri constructor to use the RFC 2396 convention?
Failing that, is there an easy way to convert the RFC 3986 octet pairs to RFC 2396 octets?
Nothing to do with your question, but why are you creating a new Uri here? You can just do string queryFromVendor = Request.Url.Query.Substring(1); – atticae
+1 for atticae! I went back to try removing the extraneous Uri I was creating and suddenly, the string had the umlaut encoded as UTF-8 instead of UTF-16.
At first, I didn't think this would work. Somewhere along the line, I had tried retrieving the url using Request.QueryString, but this was causing the umlaut to come through as %ufffd which is the � character. In the interest of taking a fresh perspective, I tried atticae's suggestion and it worked.
I'm pretty sure the answer has to do with something I read here.
C# uses UTF-16 in all its strings, with tools to encode when it comes to dealing with streams and files that bring us onto...
ASP.NET uses UTF-8 by default, and it's hard to think of a time when it isn't a good choice...
My problems stemmed from here...
Uri uriFromVendor = new Uri(Request.Url.ToString());
By taking the Request.Url uri and creating another uri, it was encoding as the C# standard UTF-16. By using the original uri, it remained in the .Net standard UTF-8.
Thanks to all for your help.
I'm wondering if this is a bit of a red herring:
I say this because FC is the UTF16 representation of the u with umlaut; C2BC is the UTF8 representation.
I wonder if one of the System.Text.Encoding methods to convert the source data into a normal .Net string might help.
This question might be of interest too: Encode and Decode rfc2396 URLs
I don't know about the standard encoding for Uri constructors, but if everything else fails you could always decode the URL yourself and encode it in whatever encoding you like.
The HttpUtility-Class has an UrlDecode() and UrlEncode() method, which lets you specify the System.Text.Encoding as second parameter.
For example:
string decodedQueryString = HttpUtility.UrlDecode(Request.Url.Query.Substring(1));
string encodedQueryString = HttpUtility.UrlEncode(decodedQueryString, System.Text.Encoding.GetEncoding("utf-16"));
// calc hash here