x509certificate CN supported characters - x509certificate

I want to know whether X509Certificate CN(commonname) support with i18n characters and which are all the supported character set

I assume you are talking about the CN in the distinguished name of the issuer or subject of the X509 certificate in question.
RFC 5280 on "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" contains a definition of the allowed value for a common name AttributeTypeAndValue in a distinguished name
-- Naming attributes of type X520CommonName:
-- X520CommonName ::= DirectoryName (SIZE (1..ub-common-name))
--
-- Expanded to avoid parameterized type:
X520CommonName ::= CHOICE {
teletexString TeletexString (SIZE (1..ub-common-name)),
printableString PrintableString (SIZE (1..ub-common-name)),
universalString UniversalString (SIZE (1..ub-common-name)),
utf8String UTF8String (SIZE (1..ub-common-name)),
bmpString BMPString (SIZE (1..ub-common-name)) }
At the same time, though, it says
CAs
conforming to this profile MUST use either the PrintableString or
UTF8String encoding of DirectoryString
(DirectoryName in the ASN.1 comment above should actually be DirectoryString, cf. the errata.)
There are certain exceptions to this for the sake of backward compatibility but let's consider the general case.
Thus, the common name may either be a PrintableString or an UTF8String. The former allows only to use a small subset of the characters the latter does. So you effectively are limited to what can be represented in UTF-8.
This does not mean, though, that you can go to a CA of your choice and insist on getting a certificate with a subject common name containing the wildest Unicode characters. CAs may have limited the set of characters they allow in the subjects of certificates they issue. This might be accidental (their software for some reason may be limited to that set), intentional to allow interoperability with other legacy software, or a deliberate security measure, e.g. to prevent misuse of similar looking Unicode characters.
Such restriction may even be documented in their CA certificates by use of name constraint extensions; in that case the CA cannot circumvent the restrictions in any way.

Related

Why is extnValue in X.509 Extensions always encapsulated in an OCTET_STRING?

I'm curious, and I was not able to find an explanation so far.
In RFC 5280 Extensions define the following:
Extension ::= SEQUENCE {
extnID OBJECT IDENTIFIER,
critical BOOLEAN DEFAULT FALSE,
extnValue OCTET STRING
-- contains the DER encoding of an ASN.1 value
-- corresponding to the extension type identified
-- by extnID
}
What is the reason for defining the encapsulating OCTET_STRING for extnValue, instead of directly defining extnValue as the "DER encoding of an ASN.1 value corresponding to the extension type identified by extnID".
Thank you.
Not an authoritative answer, but my thoughts are: this is because extension values may have arbitrary enclosing tags and can be defined in external modules:
Most extensions use SEQUENCE, but some are not, like in a given example, Subject Key Identifier is just another OCTET_STRING, Key Usages is a BIT_STRING. And in base type definition you have to use fixed tag to represent variable content (ANY).
In addition, parsers may not know how to parse particular extension, so they read it as octet string without having to dig deeper if extension type is unknown to parser.
update 13.02.2023 (based on comments):
Regarding the type / tag, from my understanding, each different type can be easily identified by the leading tag byte, such as SEQUENCE=0x10, OCTET_STRING=0x04 or BIT_STRING=0x03
you cannot define the field with variable tag, because you introduce type ambiguity. That is, extnValue ANY field definition is not valid, because its type is indeterminate. When you define a type (in this case, it is Extension type), all fields must have deterministic tag.

Distinguish between email address and IRI

I have a string that can contain either an email address or an IRI (internationalized URI). The strings do not contain additional surrounding whitespace or any HTTP linefolding characters. Moreover they do not contain any elements marked as "obsolete" in their corresponding specifications. I need a simple way to distinguish which of these things the string contains.
I'm looking at what I believe to be the latest respective specifications: RFC 5322 § 3.4.1. Addr-Spec Specification for emails, and RFC 3987 § 2.2. ABNF for IRI References and IRIs for IRIs. I've come up with the following algorithm, with explanations in parentheses:
If the string begins with a quote " character, it is an email address. (Email address local-part may be a quoted string, but an IRI scheme may not.)
Otherwise find the first at # sign or colon : character.
If the character encountered is an at # sign, the string contains an email address.
Otherwise, if it is a colon : character, the string contains an IRI.
Is that approach correct? Is there another simpler approach? Lastly for bonus, how would I expand this algorithm to also distinguish those two things from an IP address (including both IPv4 and IPv6)?
I would think the rules as specified are correct and fast to determine the type (email or IRI). To extend this to IP addresses their corresponding grammar should be added: https://datatracker.ietf.org/doc/html/draft-main-ipaddr-text-rep-00.
So then your rules could be extended to:
Rules: (I assumed well formed input)
First char " => email
First char : => IpV6 (because an IRI the scheme has to contain at least one char)
First of : or #
# => email
: =>
If it does not match the grammar for IpV6 => IRI
Otherwise: ambiguous, also in the grammar, some options
Use as IpV6 => it will be valid, likely to be the thing intended
Use it as IRI => the first part (before the ':') will be a scheme the later part will be one 'segment' in the protocol
So ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff will lead to scheme ffff and 'segment' ffff:ffff:ffff:ffff:ffff:ffff:ffff
I would find this situation very unlikely
Raise an exception, depending on the environment this could be a valid option
Both not in the string => IpV4
ipchar := hex / ':'
hex := [0-9A-Fa-f]

Basic Access Authentication: Encoding of ':' and non-ASCII code points

For the base64-user-pass value in the Authorization header, is there a standard/de facto way to base64-encode usernames and passwords which contain code points which don't fit into an OCTET, or usernames which contain a colon (which is explicitly prohibited by RFC 2617)?
1) Yes, but only proposed and not implemented (AFAIK): https://greenbytes.de/tech/webdav/rfc7617.html#charset
2) No.

Bad production of 2.5.4.5 oid into X509IssuerName, change proposal

I noticed that durnig a xades signature with xades4j the element X509IssuerName presents a bad formatted serialnumber issuer value, it shows a PrintableString Hex encoded, i search into xades4j code and i found that the problem is into the DataGenBaseCertRefs class, if you set
cert.getIssuerX500Principal().getName(X500Principal.RFC1779)
into the generate method you can resolve this problem and procuce an issuer value from this:
2.5.4.5=#130b3037393435323131303036
to this
OID.2.5.4.5=07945211006
I'm not sure that change is correct. XML-DSIG states that RFC 4514 should be used when encoding the distinguished names. Regarding the attribute type, on that RFC one reads:
If the AttributeType is defined to have a short name (...) that short name, a descr, is used. Otherwise the AttributeType is encoded as the dotted-decimal encoding, a numericoid, of its OBJECT IDENTIFIER.
In turn, numericoid is defined on RFC 4512 as follows:
numericoid = number 1*( DOT number )
Regarding the attribute value, one reads:
If the AttributeType is of the dotted-decimal form, the AttributeValue is represented by an number sign ('#' U+0023) character followed by the hexadecimal encoding of each of the octets of the BER e ncoding of the X.500 AttributeValue.
My understanding is that, since a short name was not known, the hex value should be used. What do you think?
This actually makes me realize that xades4j is using RFC 2253, since it is the default on getName().
Are you also including a X509IssuerSerial element on KeyInfo/X509Data? Is that one different from the cert ref?
Can you send me, on another channel, a certificate with those characteristics for tests?

Why is URN one of more popular formats used to uniquely identify the resource?

I somewhat understand that URNs are used to provide unique and location independent name for the resource. Yet I fail to see their usefulness and how exactly they work:
a) In order for URN to really be unique, there would have to be some central authority (similar to authority for domain names) where we could register URNs and that way ensure they are unique.
Since there isn’t any such authority, how else do we make sure that our URNs are unique? And if we can’t. then what’s the point of having them?
b) Also,I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
c) And finally, how can URN help us locate a resource on the internet?
EDIT:
I'm not sure what you mean. NID is the Namespace Identifier and NSS is the Namespace Specific String Are you proposing a system of sub-namespaces?
I’m just trying to make sense of why the format URN uses is “superb” to other formats, such as urn:NID:NID1:NSS
a) In order for URN to really be unique, there would have to be some central authority... Since there isn’t any such authority, how else do we make sure that our URNs are unique?
There is a central authority, called IANA, to register namespaces (the NID part), and each namespace is responsible for ensuring uniqueness.
b) Also, I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
The "urn:NID:NSS" description states the interpretation of NSS depends on the value of NID. For example, if NID is "isbn", then we know to interpret the NSS as an ISBN number, as in "urn:isbn:0451450523".
The NSS part can contain colons, so "urn:example:other:more" is valid syntax. (And in-fact is a valid URN as of 2013-04-24.) For example, given "urn:mpeg:mpeg7:schema:2001", the NSS part is "mpeg7:schema:2001" and we interpret that according to the rules for the "mpeg" namespace.
Had "urn:NID:NID1:NSS" been required, it would have been redundant (some namespaces don't need a nested NID1) and superfluous (the authority for a namespace can already divide the NSS part up, as in the above mpeg example).
c) And finally, how can URN help us locate a resource on the internet?
URNs are not about location, that's a URL.
a) In order for URN to really be unique, there would have to be some central authority (similar to authority for domain names) where we could register URNs and that way ensure they are unique. Since there isn’t any such authority, how else do we make sure that our URNs are unique? And if we can’t. then what’s the point of having them?
An ISBN is used an a URN, and is managed by an agency.
b) Also,I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
I'm not sure what you mean. NID is the Namespace Identifier and NSS is the Namespace Specific String Are you proposing a system of sub-namespaces?
c) And finally, how can URN help us locate a resource on the internet?
A URN (Uniform Resource Name) doesn't help you locate something on the Internet. A URL (Uniform Resource Locator) does.
Also see What is the difference between URI and URL?
URNs
a URN ( Uniform Resource Name ), is supposed to be unique across both ( time and space ).
a URL\URI cannot guarantee his uniqueness, unlike a URN that can be a URI in the same time.
Maybe a URI Resource (X) in path (Y) is a valid URL, because the path can be a location, but the same whole Identifier (Z) can be duplicated in many physical, logical or virtual locations in the world.
``
# Unique only in the same actual location
Z = [Y => X];
A = [B => Z];
C = [D => Z];
But if we add A Uniform U (could be a domain name for example) at the beginning it can be more flexible but not unique (domains can get expired).
# Unique only in the same actual location
Z = [ U => Y => X ];
The same format can be extended and extended by other variables trying to make it as Unique as possible.
Because of this last, we have to make sure a more sophisticated and real unique format is here, that can identify more type of Resources across time and space.
``
"URNs" are not a "URLs" ( exception ofUnique persistant URL used as a name ), because they are not locating a resource, in fact they are more then what your think, they can identify [ *ideas, UUIDs, virtual or physical Objects and more* ], but both of them plus "URCs/data URIs" can be "URIs".
Note :
Take a look into a simple and more clear example of URNs here :
https://stackoverflow.com/a/1984274/5405973
And here is a very informative link :
https://stackoverflow.com/a/28865728/5405973

Resources