Browser / DNS process for invalid characters in a domain? - http

I'm wondering how the browser, and/or DNS, handles a user entering an invalid character in a domain name.
Let's say that I own meat&potatoes, a well-known chain of fine dining restaurants. All of our marketing refers to us as meat&potatoes (meat + ampersand + potatoes, no spaces), and it's likely that fairly often, people are typing www.meat&potatoes.com into their browser.
How does the browser, and/or their ISP's DNS, handle this request? Are there any ways I can get the user to the correct domain without requiring them to make additional clicks / keystrokes?
Edit: In my limited testing, I've found that Chrome transforms the character into a URL-encoded version (e.g. %26 for &), and then sends a request somewhere that results in my ISP(RCN) giving me a search results page (because RCN is evil like that): www17.searchresults.rcn.com/… So, something is reaching the ISP.

Host names are limited (RFC1034 section 3.5) to letters (a-z), numbers (0-9) and hyphen (-).
Additionally, international characters are allowed by recent browsers using puny-encoding (RFC3492) - which basically applies to character values above 127.
I don't know specifically how browsers handle this, but I expect that they go by these two sets of rules, and gives the end-user an error/redirect for anything else.
And therefore it never gets as far as DNS / ISPs.
Unfortunately this means that there is currently no way to make "&" in a domain name work...

Related

How to distinguish IPv4 addresses from domain names?

I am wondering how, on a technical level, IPv4s and domains can be distinguished.
An IPv4 takes the form [0-255].[0-255].[0-255].[0-255].
A domain takes the form (a)+.b, where (a)+ denotes that this string occurs at least once and may repeat.
The values of a can be considered arbitrary alphanumericals (so yeah, mathematically, I am not super correct with the expression above), as can the values of b, though b has practically more restrictions because it must usually be registerd Top Level Domain (TLD), but apart from that, may be arbitrary alphanumericals, either.
In theory, the set of ip addresses looks like a subset of the set of domain addresses.
Edge cases like special characters and special addresses are not relevant for this question and can be ignored.
When I enter an IP or domain into my browser address field, the terminal, or an application, how does the system know whether I entered a domain that requires resolution, or an IP address that can be directly contacted?
Can someone, on a technical level, explain how the system handles these strings and what possible interactions can occur and whether (and why) this raises security issues, or not?
I was wondering, whether an attacker would be able to exploit this ambiguity and whether there are cases where exactly this already happened in the past.

Is it possible to create a DNS subdomain containing special characters?

Is it possible to create a DNS subdomain containing special characters?
For example, is *.example.com or $.example.com valid according to the RFC for DNS?
The short answer to your question boils down to "Yes, but no, but sometimes yes".
At the protocol level, DNS strings (including names) are encoded as length+data, so the data can be anything. So in that way * and $ are perfectly fine.
The level above the protocol is the human-name level. On that level there are restrictions on what names you can use. Since the 80s, the main restriction boils down to letters, numbers and - (as long as it's not at the beginning or end of a label). So in that way * and $ are forbidden (except that * as the entire content of a label has a special meaning).
On top of that, these days we have internationalized names. That's a way to encode any Unicode string into a form that conforms to the above rule. This, way we can have names that look like räksmörgås.se to humans while they internally look like xn--rksmrgs-5wao1o.se. That xn-- at the start is a prefix that says that this is an encoded name. You still can't use * or $ in your names, but you can probably find something else in Unicode that looks close enough and that you can use... which is a security problem of its own.
The specification for all this is spread out over far too many RFCs. If you're curious, start here and follow many, many links from there.
According to RFC 1034, domain name label can consist of letter, digit or hyphen (and it must begin with letter and end with letter or digit), so $ is not allowed. The exception that has special treatment is *, used for wildcards, and explained in more detail in RFC 4592

Block emails with large number of recipients of same domain

I have a mailserver with exim4 and spamassassin installed.
We have a problem of (internal) spam to a large number of mailinglists, coming from a few users (which we cannot just educate or block for multiple reasons)
Is there a way to block emails to which an unreasonable amount go to the same domain (e.g. 10) to force these users to BCC?
Yes, you can do this in SpamAssassin. I'm not as much an exim expert, but iirc exim can do this as well (though it may have a hard recipient limitation that is agnostic to To/Cc vs Bcc).
This should do it:
header DTECH_TEN_TOCC_IN_SAME_DOM ToCc =~ /(\#[^,>;]{3,99}[a-z]\b)(?:[^\#.-][^\#]{0,99}\1){10}(?![.-])/
describe DTECH_TEN_TOCC_IN_SAME_DOM Ten consecutive recipients have the same domain
As I've written it, this only catches ten consecutive recipients with the same domain, which must all be in the same header (ToCc means either To xor Cc; it does not merge the headers). If you change the third character class from [^\#]{0,99} to .{0,999} to match any character over a longer period of time, the rule will be good for more than just consecutively listed addresses, but note that this would make the regex far more expensive to compute.
You also have to make sure that SpamAssassin is looking at your internal and outbound mail, which is nonstandard. Finally, you'll have to score the rule. Please test copiously before you do that. Especially since this is not a spam rule (it will hit more non-spam than spam; consider a similar rule with testing stats: __TO_MANY).
You will not, however, be able to tell users why the message was rejected. An SMTP reject (e.g. from Exim) can have a custom "why this was rejected" prompt, which is highly useful for policing attachment sizes or even informing users that they're sending too much mail (perhaps they are infected). You can configure Exim to run SA at SMTP time (e.g. sa-exim), but then every spam rejection would have the same message to the end user. The other option would be to accept the message and then bounce it back, including the SpamAssassin rule hits. Be very very careful with that approach as it often leads to backscatter.

".." (double dots) in otherwise valid IP4 addreses, e.g. 183.60..244.37

My production server recently got a slew of access probes (to try and find a point to break in, to URI's like to /admin.php, /administrator, /wp-login.php, etc.), and I noticed that some of the REMOTE_ADDR's reported by Apache (IP4's) had two dots where there should be one.
What's up with this? Is this some way for servers to hide?
For one, it means that I need to log these to a wider field than expected. Expected would be xxx.xxx.xxx.xxx or 15 characters, but this might make it 16 or even 19.
[Edit: or better yet 50, see this]
The problem is happening in some code somewhere in your application (etc) that is doing formatting.
IP addresses are actually an array of 4 unsigned bytes. They are conventionally represented character-wise (for human consumption) in "ddd.ddd.ddd.ddd" form, but that is not the fundamental representation. The fundamental representation does not have dots in it at all.
It therefore follows that the extra dots you are seeing are some problem with either the way the IP addresses are converted to strings, or the resulting strings are incorporated into messages, or those messages are handled and ultimately displayed. The extra dots do not "mean" anything ... except ... possibly ... to say that some characters have been left out.
Without more information, we can't tell you where those dots come from, or how to stop them.
What's up with this? Is this some way for servers to hide?
Nope.
At the point that your systems first see those IP addresses, they are in 4-byte form, just like other IP addresses. The dots are not a new way to hide. Rather they are just a result of a local problem in the way things are being logged.
UPDATE
Looking at the evidence in your "half answer", one possibility is that you have some progress monitoring or debugging code somewhere that occasionally outputs a "dot" into the output stream. It looks like it would be on a different thread ...
So far my hosting company says only that I can clean up these values.
They are right. But you probably want to find where your application is injecting the garbage and fix that ... rather than massaging the log files.
What are you doing with that variable in your code? I expect it's being translated or parsed in some way that's adding the extra period.
It's extremely unlikely that Apache would report it that way, as that would be invalid as an IPv4 address.
Compare your output with the web server's access logs, which will have recorded the remote IP as Apache saw it.
Half of the answer is that php's $_SERVER['REMOTE_ADDR'] is untrusted because it comes directly from the http request as provided by the server to php it can apparently and from other reports be spoofed.
EDIT2: I have more recently found two more bad variables from $_SERVER with double dots, as follows:
SERVER_ADDR REMOTE_ADDR REQUEST_TIME_FLOAT
184..154.227.128 183.60.244.30 1391788916.198
184.154..227.128 183.60.244.37 1391788913.537
184.154..227.128 183.60.244.37 1391788914.368
184.154..227.128 184.154.227.128 1391086482.1889
184.154.227.128 183..60.244.30 1391788914.1494
184.154.227.128 183..60.244.37 1391788913.0523
184.154.227.128 183.60..244.37 1391788911.5938
184.154.227.128 183.60..244.37 1391788914.3977
184.154.227.128 183.60.244.37 1391788911..9855
So far my hosting company says only that I can clean up these values. That is easy, but cleaning up garbage is still garbage. If dots can and are being added, then the numbers can and possibly are be changed too I think. Humm?
See: this comment from the php manual.
Now that leaves the question where to find a trusted IP from the accessing client? Apache has it I'm guessing from the incoming http packet exchange with the client. (I'll ask this Q: in StackOverflow).

What is the optimum limit for URL length? 100, 200+

I have an ASP.Net 3.5 platform and windows 2003 server with all the updates.
There is a limit with .Net that it cannot handle more than 260 characters. Moreover if you look it up on web, you will find that IE 6 fails to work if it is not patched at above 100 charcters.
I want to have the rewrite path module to be supported on maximum number of browsers, so I am looking for an acceptable limit to which I can create verbose URL's.
A Url is path + querystring, and the linked article only talks about limiting the path. Therefore, if you're using asp.net, don't exceed a path of 260 characters. Less than 260 will always work, and asp.net has no troubles with long querystrings.
http://somewhere.com/directory/filename.aspx?id=1234
^^^^^^^- querystring
^^^^^^^^^^^^^^^^^^^^^^^^ -------- path
Typically the issue is with the browser. Long ago I did tests and recall that many browsers support 4k url's, except for IE which limits it to 2083, so for all practical purposes, limit it to 2083. I don't know if IE7 and 8 have the limitation, but if you're going to broad compatibility, you need to go for the lowest common denominator.
There is no length limit specified by the W3C, but look here for practical limits
http://www.boutell.com/newfaq/misc/urllength.html
pick your own limit from that.
The default limit in IIS is 16,384 characters
But IE doesn't support more than 2083
More info at link
This article gives the limits imposed by various browsers. It seems that IE limits the URL to 2083 chars, so you should probably stay under that if any of your users are on IE.
Define "optimum" for your application.
The HTTP standard has a limit (it depends on your application):
The HTTP protocol does not place any
a priori limit on the length of a URI.
Servers MUST be able to handle the URI
of any resource they serve, and SHOULD
be able to handle URIs of unbounded
length if they provide GET-based forms
that could generate such URIs. A
server SHOULD return 414 (Request-URI
Too Long) status if a URI is longer
than the server can handle (see
section 10.4.15).
Note: Servers ought to be cautious about depending on URI
lengths above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.
So the question is - what is the limit of your program, or what is the maximum resource identifier size your program needs to perform all its functionality?
Your program should have a natural limit.
If it doesn't you might as well stick it as 16k, as you don't have enough information to define the problem.
-Adam
Short ;-)
The problem is that every web server and every browser has own ideas how long the maximum is. The RFC for the HTTP protocol gives no maximum length. IE limits the get to 2083 characters, the path itself may be at most 2,048 characters. However, this limit is not universal. Firefox claims to support at least up to 65,536, however some people verified that on some platforms even 100,000 characters work. Safari is above 80,000 (tested). Apache server on the other hand has a limit of 4,000. Microsofts Internet Information Server has one being 16,384 (but it is configurable).
My recommendation is to stay below 2'000 characters in any case. This is not guaranteed to work with every browser in the world (especially not older ones), but it will work with all modern browsers. Further I recommend to use POST wherever possible (e.g. avoid using GET for FORM submits - if some users want to simulate a FORM submit via GET, make sure your application supports the desired parameters either via POST or via GET, but when you submit the page yourself via a button or JS, prefer POST over GET).
I think the RFC says 4096 chars but IE truncates down to 2083 characters. Stay well under that to be safe.
Practically, shorter URLs are friendlier.
More information is needed but for normal situations I would say try to keep it under 150 for sure. If for nothing else than pure ascetics, I hate when someone sends me a GI-NORMOUS link...
Are you passing values through the query string? I assume that is why you asked, correct?
What is "optimum" anyway?
GET requests can be several kB in length, so this is entirely subjective.
I'd say - stay within the address bar length of a maximized 1024x768 window to be user friendly.
If you're trying to get people to remember the URL, I wouldn't go more than 60. Use words if possible, because it's easier to remember "www.example.com/this-is-the-url" than "www.example.com/179264". If you're trying to get the page indexed, you could probably go more. The spiders look for words in the title too, and some people may be more likely to click on the link if the URL looks readable.
When you say "Optimum", I think "Easily Accessible To Users", in which case, I think the shorter the URL, the better. I would think 20-30 characters maximum, in that case.

Resources