Does IETF recognize the IPFS URI format? - uri

Does any part of the Internet Engineering Task Force recognize (even in draft form) the URI naming specification that IPFS is using (i.e. ipfs:// URIs)?
I have looked for RFCs and not found any.
There seems to be significant built support for IPFS but the specification still seems to be only published by a single organization, Protocol Labs (IPFS) and not by standards bodies. I'm asking because it seems like I am searching about this wrong.

IPFS is listed in the IANA URI Scheme registry as provisionally registered. See https://www.iana.org/assignments/uri-schemes/prov/ipfs

Related

net/http: Does DetectContentType support JavaScript?

DetectContentType, JavaScript support ?
https://github.com/golang/go/blob/c3931ab1b7bceddc56479d7ddbd7517d244bfe17/src/net/http/sniff.go#L21
Is there a genuine reason behind the http Method DetectContentType to not support JavaScript ?
As the doc comment notes, DetectContentType implements the algorithm described at https://mimesniff.spec.whatwg.org/, which does not detect JavaScript. The question then becomes: why doesn't it?
The answer is given in the introduction of the spec:
These security issues are most severe when an "honest" server allows potentially malicious users to upload their own files and then serves the contents of those files with a low-privilege MIME type. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a user agent believes the content to be HTML (and thus privileged to execute any scripts contained therein), an attacker might be able to steal the user’s authentication credentials and mount other cross-site scripting attacks. (Malicious servers, of course, can specify an arbitrary MIME type in the Content-Type header field.)
This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent with the security constraints imposed by existing web content.
Labelling untrusted input as JavaScript when it's not (or even when it is!) could lead to security disasters.

Are URIs case-insensitive?

When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:
I read above Sentence in Http Rfc I think Url is case-insensitive but i dont undrestand what that means
?
RFC 3986 states:
the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>. The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme
RFC 2616 defines the following comparison rule for the HTTP scheme:
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:
However, RFC 7230 locks it down further by stating:
The scheme and host are case-insensitive and normally provided in lowercase; all other components are compared in a case-sensitive manner.
Those rules typically apply to client side comparisons. There are no rules specifically geared for server side comparisons. Once a server breaks up a URI into its components, it should treat them according to the same rules, but I don't see that enforced in the RFCs. Some web servers, like Apache, do follow the rules. IIS doesn't, for compatibility with Windows' case-insensitive file system.
In reality it depends on the web server.
IIS is not case sensitive.
Apache is.
I suspect that the decision regarding IIS is rooted in the fact that the Windows file system is not case sensitive.
IIS still meets that portion of the spec because SHOULD is a recommendation, not a requirement.
The host portion of the URI is not case sensitive:
http://stackoverflow.com
http://StackOverflow.com
Either of the above will get you to this site.
The rest of the URI after the host portion can be case sensitive. It depends on the server.
As mentioned in answer by Remy Lebeau, the rules are set for client side. Actually, this means that client software should not try to make arbitrary case modifications to all parts of URIs, except for specifically stated parts. So, when a browser e.g. sees a relative URL in a page anchor, is should not convert it to lowercase before checking if it is already cached in its cache; neither should it use the URI lowercased to post to server. Also, it should not decide that two URIs that differ in case only point to same resource (thus possibly wrongly skipping a transaction and returning cached result instead).
This means that client should not assume how servers treat the URIs. It does require servers to treat some parts case-insensitive: e.g., scheme and host. But otherwise, it's up to server to decide if two URIs that differ in case point to the same resource, or not. Standard does not impose any restrictions on servers in this regards, there's nothing "server should" or "server should not" besides directly prescribed. If server decides that its URIs are case-insensitive, that's absolutely fine. If they are case-sensitive, that's fine, too.
Whether or not URLs are treated as case-sensitive also depends on the web server. For example, Microsoft IIS servers do not treat URLs as case-sensitive.
The following URLs (hosted on a Microsoft IIS server) are both treated as equivalent:
http://www.microsoft.com/default.aspx
http://www.microsoft.com/Default.aspx
However, Apache servers do treat URLs as case-sensitive are classed as two different resources:
http://httpd.apache.org/index.html
http://httpd.apache.org/Index.html
Technically, Apache is following the standards correctly here, and Microsoft is going against the specification… Oh well – “old habits die hard,” they say!
For a file-based URI, case-sensitivity depends more on the underlying file system, not so much the web server. Apache will happily return index.html for INDEX.html on Windows (FAT, NTFS) and mac (HFS), but not for case-sensitive file systems such as those usually used in Linux (extx and so forth).

Custom HTTP Headers with old proxies

Is it true that some old proxies/caches will not honor some custom HTTP headers? If so, can you prove it with sections from the HTTP spec or some other information online?
I'm designing a REST API interface. For versioning I'm debating whether to use version as a part of the URL like (/path1/path2/v1 OR /path1/path2?ver=1) OR to use a custom Accepts X-Version header.
I was just reading in O'Reilly's Even Faster Websites about how mainly internet security software, but really anything that has to check the contents of a page, might filter the Accept-Encoding header in order to reduce the CPU time used decompressing and reading the file. The books cites that about 15% of user have this issue.
However, I see no reason why other, custom headers would be filtered. On the other hand, there also isn't really any reason to send it as a header and not with GET is there? It's not really part of the HTTP protocol, it's just your API.
Edit: Also, see the actual section of the book I mention.

Is there any downside for using a leading double slash to inherit the protocol in a URL? i.e. src="//domain.example"

I have a stylesheet that loads images from an external domain and I need it to load from https:// from secure order pages and http:// from other pages, based on the current URL. I found that starting the URL with a double slash inherits the current protocol. Do all browsers support this technique?
HTML ex:
<img src="//cdn.domain.example/logo.png" />
CSS ex:
.class { background: url(//cdn.domain.example/logo.png); }
If the browser supports RFC 1808 Section 4, RFC 2396 Section 5.2, or RFC 3986 Section 5.2, then it will indeed use the page URL's scheme for references that begin with "//".
When used on a link or #import, IE7/IE8 will download the file twice per http://paulirish.com/2010/the-protocol-relative-url/
Update from 2014:
Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.
One downside occurs if your URLs are viewed outside the context of a web page. For example, an email message sitting in an email client (say, Outlook) effectively has no URL, and when you're viewing a message containing a protocol-relative URL, there is no obvious protocol context at all (the message itself is independent of the protocol used to fetch it, whether it's POP3, IMAP, Exchange, uucp or whatever) so the URL has no protocol to be relative to. I've not investigated compatibility with email clients to see what they do when presented with a missing protocol handler - I'm guessing that most will take a guess at http. Apple Mail refuses to let you enter a URL without a protocol. It's analogous to the way that relative URLs do not work in email because of a similarly missing context.
Similar problems could occur in other non-HTTP contexts such as in tweets, SMS messages, Word documents etc.
The more general explanation is that anonymous protocol URLs cannot work in isolation; there must be a relevant context. In a typical web page it's thus fine to pull in a script library that way, but any external links should always specify a protocol. I did try one simple test: //stackoverflow.com maps to file:///stackoverflow.com in all browsers I tried it in, so they really don't work by themselves.
The reason could be to provide portable web pages. If the outer page is not transported encrypted (http), why should the linked scripts be encrypted? This seems to be an unnecessary performance loss. In case, the outer page is securely transported encrypted (https), then the linked content should be encrypted, too. If the page is encrypted, the linked content not, IE seems to issue a Mixed Content warning. The reason is that an attacker can manipulate the scripts on the way. See http://ie.microsoft.com/testdrive/Browser/MixedContent/Default.html?o=1 for a longer discussion.
The HTTPS Everywhere campaign from the EFF suggests to use https whenever possible. We have the server capacity these days to serve web pages always encrypted.
Just for completeness. This was mentioned in another thread:
The "two forward slashes" are a common shorthand for "whatever protocol is being used right now"
if (plain http environment) {
use 'http://example.com/my-resource.js'
} else {
use 'https://example.com/my-resource.js'
}
Please check the full thread.
It seems to be a pretty common technique now. There is no downside, it only helps to unify the protocol for all assets on the page so should be used wherever possible.

Can HTTP URIs have non-ASCII characters?

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it.
Do URIs for HTTP allow Unicode, or non-ASCII of any kind?
Can you please cite the section and the RFC that supports your answer.
NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building.
Addendum
I've read section 2.5 of RFC 3986. But RFC 2616, which I believe is the current HTTP protocol, predates 3986, and for that reason I'd suppose it cannot be compliant with 3986. Furthermore, even if or when the HTTP RFC is updated, there still will be the issue of rationalization - in other words, does an HTTP URI support ALL of the RFC3986 provisos, including whatever is appropriate to include non US-ASCII characters?
http://en.wikipedia.org/wiki/Internationalized_domain_name
No, they are not allowed. Just check the ABNF in RFC 3986.
Here is an example: ☃.net.
In terms of the relevant section of RFC 3986, I think you are looking at 2.5.
EDIT:
Apparently stack overflow doesn't detect this as a proper URL. You'll have to copy&paste into your browser.
Used to be that non english characters were not allowed in DNS and URL/URI. There was a hack to allow them by using % encoding in URI. However many countries such us russia and china are starting to implement DNS using non latin characters. Here is a reference to one of these standards
RFC 3986 is being replaced with RFC 3987, which fully supports Unicode, and provides mappings rules to/from RFC 3986 style URIs.
Many browsers are not support URIs with Unicode characters (I've implemented them on a website I've build called -- blogvani.com) and Google duly scans and keeps them intact. I don't think that works on top-level domains though, at least not with the registrar and not directly.
For top-level domains if you have a domain registered in Unicode (for example people can register domains in Hindi), it will be converted to a corresponding code in ASCII (something that may go like jdhfks3243-32434.com)...
It is quite funny to see how this is routed and to realize that you're not actually going to a unicode domain even though it seems like that.

Resources