I am wondering how, on a technical level, IPv4s and domains can be distinguished.
An IPv4 takes the form [0-255].[0-255].[0-255].[0-255].
A domain takes the form (a)+.b, where (a)+ denotes that this string occurs at least once and may repeat.
The values of a can be considered arbitrary alphanumericals (so yeah, mathematically, I am not super correct with the expression above), as can the values of b, though b has practically more restrictions because it must usually be registerd Top Level Domain (TLD), but apart from that, may be arbitrary alphanumericals, either.
In theory, the set of ip addresses looks like a subset of the set of domain addresses.
Edge cases like special characters and special addresses are not relevant for this question and can be ignored.
When I enter an IP or domain into my browser address field, the terminal, or an application, how does the system know whether I entered a domain that requires resolution, or an IP address that can be directly contacted?
Can someone, on a technical level, explain how the system handles these strings and what possible interactions can occur and whether (and why) this raises security issues, or not?
I was wondering, whether an attacker would be able to exploit this ambiguity and whether there are cases where exactly this already happened in the past.
Related
When define a subnet, such as 1.2.3.0/24, it means 255 hosts in the subnet start from 1.2.3.0 to 1.2.3.255.
But what if I wrongly define the subnet with 1.2.3.64/24? Does it then mean the range is start from 1.2.3.64 to 1.2.3.255?
I could not find a formal document about this.
EDIT: in Java ipaddress lib (5.2.1), the 1.2.3.64/24 will be treated as range from 1.2.3.64 to 1.2.3.255, instead of 1.2.3.0 to 1.2.3.255!. The code is: new IPAddressString("1.2.3.64/24").getSequentialRange()
EDIT: My apology: the above description of ipaddress lib's behavior is wrong. It was other things that caused me think the the range become 1.2.3.64 to 1.2.3.255.
Indeed, the
new IPAddressString("1.2.3.64/24").getSequentialRange().getLower()
is 1.2.3.64,
and
new IPAddressString("1.2.3.64/24").getSequentialRange().getUpper()
is also 1.2.3.64,
so I think this is a reasonable implementation!. I have no problem of that.
The
new IPAddressString("1.2.3.64/24").getAddress().toPrefixBlock().getLower()`
will be the expect one: 1.2.3.0.
And the `.getUpper()` is 1.2.3.255.
so this is the perfect way to get correct ip range.
There is no formal specification indicating that setting the un-fixed bits (the host bits) to 0 is a convention. However, it is common practice when describing a subnet as opposed to a single address, that you leave the host bits as zero. It is an informal convention. If you are describing a subnet, why would you set those bits to anything other than zero if they are intended to be ignored? So that is what all network engineers and most other people do, they leave the bits as zero when they are describing the full subnet, when the host can take on any value.
The specification for something like 1.2.3.64/24 is that the network part of the address is the first 24 bits, and the host part of the address is the remaining 8 bits. So, the network is 1.2.3.0 and the host is 0.0.0.64. The specification says nothing else.
Likewise, 1.2.3.0/24 indicates the network is 1.2.3.0 and the host is 0.0.0.0. However, like I said, when the host is 0, that is commonly used to refer to the entire subnet, meaning the host can take on any value. When people write down a string for that subnet of 255 addresses, they will write 1.2.3.0/24.
In fact, it is unusual to see 1.2.3.0 used as a single address because many routers do not accept a host of 0.
How you parse those things is specific to any library. Some libraries parse 1.2.3.64/24 as the whole subnet, throwing out the 0.0.0.64. Some libraries parse 1.2.3.64/24 as 1.2.3.64. Some libraries have separate methods to do one or the other.
With the IPAddress library, the same parser parses both addresses and subnets. When it parses 1.2.3.64/24 or 1.2.3.0/24, the parser needs to decide what you mean by each. For the former, it interprets 1.2.3.64/24 as the address 1.2.3.64 with prefix length 24, because why else is the 0.0.0.64 there at all if you do not intend that 64 to mean something? Why throw it out?
For 1.2.3.0/24, it interprets that as the whole subnet, because, like I said, a host of 0 is commonly used to refer to the entire subnet, and it is rare to use 1.2.3.0 as an address.
However, if you wish to choose some other meaning when parsing those things, the library provides other options.
There is no formal specification for this. In fact, there is no formal specification for a lot of the different aspects of IP addressing, a lot of it is informal and simply evolved over the years to the common practices in use.
new IPAddressString("1.2.3.64/24").getSequentialRange() is parsed as the range "1.2.3.64 -> 1.2.3.64". Earlier versions of the library behaved differently, but that is how all the latest releases for the last few years parse it.
Disclaimer: I am the project manager of that library.
It's interesting because I too can't seem to find any formal specification on why setting the un-fixed bits to 0 is the convention. Looking at RFC 4632 says:
[Classless prefixes] make explicit which bits in a 32-bit IPv4
address are interpreted as the network number (or prefix) associated
with a site and which are the used to number individual end systems
within the site.
Even with the convention of setting the un-fixed bits to 0, you may have CIDR notation that has an IP ending in a non-zero number. For example: 192.168.1.254/31 which represents the range of IPs from 192.168.1.254 to 192.168.1.255.
CIDR notation only specifies the number of bits that are fixed, so even if you put .64 it would appear that the /24 still represents only 24 fixed bits of the IP address.
You can see that the CIDR Calculator from ARIN shows that 1.2.3.64/24 still represents all IP addresses from 1.2.3.0 to 1.2.3.255.
Screenshot here.
Now it may be that different systems are implemented to handle this situation differently, so I would personally follow the convention, but from the perspective of CIDR notation (and what I can seem to find in the RFC) it should still represent the entire range.
I have a mailserver with exim4 and spamassassin installed.
We have a problem of (internal) spam to a large number of mailinglists, coming from a few users (which we cannot just educate or block for multiple reasons)
Is there a way to block emails to which an unreasonable amount go to the same domain (e.g. 10) to force these users to BCC?
Yes, you can do this in SpamAssassin. I'm not as much an exim expert, but iirc exim can do this as well (though it may have a hard recipient limitation that is agnostic to To/Cc vs Bcc).
This should do it:
header DTECH_TEN_TOCC_IN_SAME_DOM ToCc =~ /(\#[^,>;]{3,99}[a-z]\b)(?:[^\#.-][^\#]{0,99}\1){10}(?![.-])/
describe DTECH_TEN_TOCC_IN_SAME_DOM Ten consecutive recipients have the same domain
As I've written it, this only catches ten consecutive recipients with the same domain, which must all be in the same header (ToCc means either To xor Cc; it does not merge the headers). If you change the third character class from [^\#]{0,99} to .{0,999} to match any character over a longer period of time, the rule will be good for more than just consecutively listed addresses, but note that this would make the regex far more expensive to compute.
You also have to make sure that SpamAssassin is looking at your internal and outbound mail, which is nonstandard. Finally, you'll have to score the rule. Please test copiously before you do that. Especially since this is not a spam rule (it will hit more non-spam than spam; consider a similar rule with testing stats: __TO_MANY).
You will not, however, be able to tell users why the message was rejected. An SMTP reject (e.g. from Exim) can have a custom "why this was rejected" prompt, which is highly useful for policing attachment sizes or even informing users that they're sending too much mail (perhaps they are infected). You can configure Exim to run SA at SMTP time (e.g. sa-exim), but then every spam rejection would have the same message to the end user. The other option would be to accept the message and then bounce it back, including the SpamAssassin rule hits. Be very very careful with that approach as it often leads to backscatter.
I have a general question about IP Addresses. I am not sure if this question is better suited for another S/O Network (like Server Fault), but I thought I'd ask it here.
I want to try to hone in on the relationship between an IP Address and a Country. Is it fair or accurate to say that an IP Address like 100.*.*.* relates to ISPs in the US solely or is it possible that one of the octets with the 100.*.*.* range gets assigned to other Countries?
I am looking for a way to relate IP Address ranges, at their highest level, to Countries on a one-for-one basis.
Thanks.
I don't think there's an explicit rule for that. Check here.
Strictly-speaking, it is my understanding that location roughly correlates with location via IPv4 address blocks. There's a Wikipedia reference for these here.
However, more often than not this isn't particularly accurate - from personal experience relying on these results in more false results than positive. Part of the problem is that these addresses tend to shift with time and use.
MaxMind offer a free geoIP database called GeoLite 2 (link here) which I've used on a few occasions to detect an IP's origin country with a really high success rate, you just have to make sure that you update the database fairly regularly to keep up-to-date.
My production server recently got a slew of access probes (to try and find a point to break in, to URI's like to /admin.php, /administrator, /wp-login.php, etc.), and I noticed that some of the REMOTE_ADDR's reported by Apache (IP4's) had two dots where there should be one.
What's up with this? Is this some way for servers to hide?
For one, it means that I need to log these to a wider field than expected. Expected would be xxx.xxx.xxx.xxx or 15 characters, but this might make it 16 or even 19.
[Edit: or better yet 50, see this]
The problem is happening in some code somewhere in your application (etc) that is doing formatting.
IP addresses are actually an array of 4 unsigned bytes. They are conventionally represented character-wise (for human consumption) in "ddd.ddd.ddd.ddd" form, but that is not the fundamental representation. The fundamental representation does not have dots in it at all.
It therefore follows that the extra dots you are seeing are some problem with either the way the IP addresses are converted to strings, or the resulting strings are incorporated into messages, or those messages are handled and ultimately displayed. The extra dots do not "mean" anything ... except ... possibly ... to say that some characters have been left out.
Without more information, we can't tell you where those dots come from, or how to stop them.
What's up with this? Is this some way for servers to hide?
Nope.
At the point that your systems first see those IP addresses, they are in 4-byte form, just like other IP addresses. The dots are not a new way to hide. Rather they are just a result of a local problem in the way things are being logged.
UPDATE
Looking at the evidence in your "half answer", one possibility is that you have some progress monitoring or debugging code somewhere that occasionally outputs a "dot" into the output stream. It looks like it would be on a different thread ...
So far my hosting company says only that I can clean up these values.
They are right. But you probably want to find where your application is injecting the garbage and fix that ... rather than massaging the log files.
What are you doing with that variable in your code? I expect it's being translated or parsed in some way that's adding the extra period.
It's extremely unlikely that Apache would report it that way, as that would be invalid as an IPv4 address.
Compare your output with the web server's access logs, which will have recorded the remote IP as Apache saw it.
Half of the answer is that php's $_SERVER['REMOTE_ADDR'] is untrusted because it comes directly from the http request as provided by the server to php it can apparently and from other reports be spoofed.
EDIT2: I have more recently found two more bad variables from $_SERVER with double dots, as follows:
SERVER_ADDR REMOTE_ADDR REQUEST_TIME_FLOAT
184..154.227.128 183.60.244.30 1391788916.198
184.154..227.128 183.60.244.37 1391788913.537
184.154..227.128 183.60.244.37 1391788914.368
184.154..227.128 184.154.227.128 1391086482.1889
184.154.227.128 183..60.244.30 1391788914.1494
184.154.227.128 183..60.244.37 1391788913.0523
184.154.227.128 183.60..244.37 1391788911.5938
184.154.227.128 183.60..244.37 1391788914.3977
184.154.227.128 183.60.244.37 1391788911..9855
So far my hosting company says only that I can clean up these values. That is easy, but cleaning up garbage is still garbage. If dots can and are being added, then the numbers can and possibly are be changed too I think. Humm?
See: this comment from the php manual.
Now that leaves the question where to find a trusted IP from the accessing client? Apache has it I'm guessing from the incoming http packet exchange with the client. (I'll ask this Q: in StackOverflow).
I'm wondering how the browser, and/or DNS, handles a user entering an invalid character in a domain name.
Let's say that I own meat&potatoes, a well-known chain of fine dining restaurants. All of our marketing refers to us as meat&potatoes (meat + ampersand + potatoes, no spaces), and it's likely that fairly often, people are typing www.meat&potatoes.com into their browser.
How does the browser, and/or their ISP's DNS, handle this request? Are there any ways I can get the user to the correct domain without requiring them to make additional clicks / keystrokes?
Edit: In my limited testing, I've found that Chrome transforms the character into a URL-encoded version (e.g. %26 for &), and then sends a request somewhere that results in my ISP(RCN) giving me a search results page (because RCN is evil like that): www17.searchresults.rcn.com/… So, something is reaching the ISP.
Host names are limited (RFC1034 section 3.5) to letters (a-z), numbers (0-9) and hyphen (-).
Additionally, international characters are allowed by recent browsers using puny-encoding (RFC3492) - which basically applies to character values above 127.
I don't know specifically how browsers handle this, but I expect that they go by these two sets of rules, and gives the end-user an error/redirect for anything else.
And therefore it never gets as far as DNS / ISPs.
Unfortunately this means that there is currently no way to make "&" in a domain name work...