Bad production of 2.5.4.5 oid into X509IssuerName, change proposal - x509certificate

I noticed that durnig a xades signature with xades4j the element X509IssuerName presents a bad formatted serialnumber issuer value, it shows a PrintableString Hex encoded, i search into xades4j code and i found that the problem is into the DataGenBaseCertRefs class, if you set
cert.getIssuerX500Principal().getName(X500Principal.RFC1779)
into the generate method you can resolve this problem and procuce an issuer value from this:
2.5.4.5=#130b3037393435323131303036
to this
OID.2.5.4.5=07945211006

I'm not sure that change is correct. XML-DSIG states that RFC 4514 should be used when encoding the distinguished names. Regarding the attribute type, on that RFC one reads:
If the AttributeType is defined to have a short name (...) that short name, a descr, is used. Otherwise the AttributeType is encoded as the dotted-decimal encoding, a numericoid, of its OBJECT IDENTIFIER.
In turn, numericoid is defined on RFC 4512 as follows:
numericoid = number 1*( DOT number )
Regarding the attribute value, one reads:
If the AttributeType is of the dotted-decimal form, the AttributeValue is represented by an number sign ('#' U+0023) character followed by the hexadecimal encoding of each of the octets of the BER e ncoding of the X.500 AttributeValue.
My understanding is that, since a short name was not known, the hex value should be used. What do you think?
This actually makes me realize that xades4j is using RFC 2253, since it is the default on getName().
Are you also including a X509IssuerSerial element on KeyInfo/X509Data? Is that one different from the cert ref?
Can you send me, on another channel, a certificate with those characteristics for tests?

Related

Why is extnValue in X.509 Extensions always encapsulated in an OCTET_STRING?

I'm curious, and I was not able to find an explanation so far.
In RFC 5280 Extensions define the following:
Extension ::= SEQUENCE {
extnID OBJECT IDENTIFIER,
critical BOOLEAN DEFAULT FALSE,
extnValue OCTET STRING
-- contains the DER encoding of an ASN.1 value
-- corresponding to the extension type identified
-- by extnID
}
What is the reason for defining the encapsulating OCTET_STRING for extnValue, instead of directly defining extnValue as the "DER encoding of an ASN.1 value corresponding to the extension type identified by extnID".
Thank you.
Not an authoritative answer, but my thoughts are: this is because extension values may have arbitrary enclosing tags and can be defined in external modules:
Most extensions use SEQUENCE, but some are not, like in a given example, Subject Key Identifier is just another OCTET_STRING, Key Usages is a BIT_STRING. And in base type definition you have to use fixed tag to represent variable content (ANY).
In addition, parsers may not know how to parse particular extension, so they read it as octet string without having to dig deeper if extension type is unknown to parser.
update 13.02.2023 (based on comments):
Regarding the type / tag, from my understanding, each different type can be easily identified by the leading tag byte, such as SEQUENCE=0x10, OCTET_STRING=0x04 or BIT_STRING=0x03
you cannot define the field with variable tag, because you introduce type ambiguity. That is, extnValue ANY field definition is not valid, because its type is indeterminate. When you define a type (in this case, it is Extension type), all fields must have deterministic tag.

Distinguish between email address and IRI

I have a string that can contain either an email address or an IRI (internationalized URI). The strings do not contain additional surrounding whitespace or any HTTP linefolding characters. Moreover they do not contain any elements marked as "obsolete" in their corresponding specifications. I need a simple way to distinguish which of these things the string contains.
I'm looking at what I believe to be the latest respective specifications: RFC 5322 § 3.4.1. Addr-Spec Specification for emails, and RFC 3987 § 2.2. ABNF for IRI References and IRIs for IRIs. I've come up with the following algorithm, with explanations in parentheses:
If the string begins with a quote " character, it is an email address. (Email address local-part may be a quoted string, but an IRI scheme may not.)
Otherwise find the first at # sign or colon : character.
If the character encountered is an at # sign, the string contains an email address.
Otherwise, if it is a colon : character, the string contains an IRI.
Is that approach correct? Is there another simpler approach? Lastly for bonus, how would I expand this algorithm to also distinguish those two things from an IP address (including both IPv4 and IPv6)?
I would think the rules as specified are correct and fast to determine the type (email or IRI). To extend this to IP addresses their corresponding grammar should be added: https://datatracker.ietf.org/doc/html/draft-main-ipaddr-text-rep-00.
So then your rules could be extended to:
Rules: (I assumed well formed input)
First char " => email
First char : => IpV6 (because an IRI the scheme has to contain at least one char)
First of : or #
# => email
: =>
If it does not match the grammar for IpV6 => IRI
Otherwise: ambiguous, also in the grammar, some options
Use as IpV6 => it will be valid, likely to be the thing intended
Use it as IRI => the first part (before the ':') will be a scheme the later part will be one 'segment' in the protocol
So ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff will lead to scheme ffff and 'segment' ffff:ffff:ffff:ffff:ffff:ffff:ffff
I would find this situation very unlikely
Raise an exception, depending on the environment this could be a valid option
Both not in the string => IpV4
ipchar := hex / ':'
hex := [0-9A-Fa-f]

x509certificate CN supported characters

I want to know whether X509Certificate CN(commonname) support with i18n characters and which are all the supported character set
I assume you are talking about the CN in the distinguished name of the issuer or subject of the X509 certificate in question.
RFC 5280 on "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" contains a definition of the allowed value for a common name AttributeTypeAndValue in a distinguished name
-- Naming attributes of type X520CommonName:
-- X520CommonName ::= DirectoryName (SIZE (1..ub-common-name))
--
-- Expanded to avoid parameterized type:
X520CommonName ::= CHOICE {
teletexString TeletexString (SIZE (1..ub-common-name)),
printableString PrintableString (SIZE (1..ub-common-name)),
universalString UniversalString (SIZE (1..ub-common-name)),
utf8String UTF8String (SIZE (1..ub-common-name)),
bmpString BMPString (SIZE (1..ub-common-name)) }
At the same time, though, it says
CAs
conforming to this profile MUST use either the PrintableString or
UTF8String encoding of DirectoryString
(DirectoryName in the ASN.1 comment above should actually be DirectoryString, cf. the errata.)
There are certain exceptions to this for the sake of backward compatibility but let's consider the general case.
Thus, the common name may either be a PrintableString or an UTF8String. The former allows only to use a small subset of the characters the latter does. So you effectively are limited to what can be represented in UTF-8.
This does not mean, though, that you can go to a CA of your choice and insist on getting a certificate with a subject common name containing the wildest Unicode characters. CAs may have limited the set of characters they allow in the subjects of certificates they issue. This might be accidental (their software for some reason may be limited to that set), intentional to allow interoperability with other legacy software, or a deliberate security measure, e.g. to prevent misuse of similar looking Unicode characters.
Such restriction may even be documented in their CA certificates by use of name constraint extensions; in that case the CA cannot circumvent the restrictions in any way.

How to determine if a URI is escaped?

I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.

ASN.1 Octet Strings

I'm decoding a X.509 Certificate in ASN.1 format. I'm decoding it successfully, traversing the structure, but there is one thing that I don't understand.
There are some scenarios where I get an octet string and this website that I am playing with (http://lapo.it/asn1js/) shows that these octet strings actually contain more of the ASN.1 tree. This website annotates such octet strings with (encapsulates)
My question is this: how do I know during parsing that an octet string actually encapsulates something more? Do I just try to parse it, looking if I get a tag and valid length? If not then it is pure bytes data? And if yes then it is a valid sub-tree?
Or is this meant to be output as bytes and the consumer should then only try to parse it if he knows that it is encoded data from for certain keys?
Take the example that is already loaded on the site and hit "decode". I am referring for example to offset 332 which is an octet string that encapsulates a bit string.
This is what "extensions" looks like in ASN.1 speak (RFC 2459 §B.2 — I know that RFC is "obsolete", but that useful appendix isn't present in the later versions).
Extensions ::= SEQUENCE OF Extension
Extension ::= SEQUENCE {
extnId OBJECT IDENTIFIER,
critical BOOLEAN DEFAULT FALSE,
extnValue OCTET STRING }
Every extension payload is encapsulated within an OCTET STRING. The OID of the extensions tells you what to expect within that octet string.
In the case of keyUsage it's a BIT STRING (§4.2.1.3).
And now I have an answer about my own question on subjectAltName, it's in §4.2.1.7.
One benefit of using OCTET STRING for the content is that, as per spec, unknown (non-critical) extensions can be identified as such and trivially be skipped over (though I think DER makes it trivial too).
And the way to tell ASN.1 tools to deal with that encapsulation is by using the keyword "CONTAINING". For example (this is not the actual/correct certificate spec, but it should give you an idea):
TstCert DEFINITIONS IMPLICIT TAGS ::=
BEGIN
Sun ::= SEQUENCE {
subjAltType OBJECT IDENTIFIER,
name GenNames
}
GenNames ::= SEQUENCE SIZE (1..5) OF GenName
GenName ::= CHOICE {
otherName [0] OtherName,
rfc822Name [1] UTF8String
}
OtherName ::= OCTET STRING (CONTAINING SEQUENCE {
type-id OBJECT IDENTIFIER,
value [0] EXPLICIT UTF8String
} )
END

Resources