I want to ensure people who use my site are who they say they are. Not who they say, they say, they are.
How do I ensure my data is going back to the IP given by $_SERVER['REMOTE_ADDR']?
Or is it automatic that the http response is sent there?
How do I ensure my data is going back to the IP given by
$_SERVER['REMOTE_ADDR']?
The client IP is handled by the TCP protocol. The REMOTE_ADDR property is populated by the client address of the TCP connection. It's not part of the HTTP protocol. So it is guaranteed that your application is talking to this IP.
This doesn't mean that the IP that you are seeing is the actual IP of the end-user (as attributed for example by his internet provider). There could be proxies or intermediate devices between him and your application. So basically what you will be seeing is the closest IP to your web server in this chain.
Related
A HTTP request indicates the resource on the server by the URL in the start line and the HOST header.
Does a HTTP response indicate the address of the receiver or something similar?
If not, why is it not necessary?
Thanks.
Internet protocols are layered.
HTTP requests are wrapped in TCP packages, which are wrapped in IP.
The outside IP packet contains information about who is the receipient and who is the sender of a message. Based on this information a TCP/IP service knows where to send the message back to.
The Host header was actually a later addition to HTTP. It wasn't really needed before because it was safer to assume that a single ip address would have a single HTTP service. The Host header was added because people needed many different domains to be served from a smaller set of ip addresses and send different responses based on what the domain was.
Without the Host header it wouldn't have been possible to know which domain the user wanted because the ip packet only encodes the ip address, not which domain was used to find the ip.
As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?
The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.
As some one mentioned in other forum that interviewer has asked the question given below.
I dont know exact answer but I would say HTTP request ? Any suggestion and explainations
Imagine a user sitting at an Ethernet-connected PC. He has a browser open. He types "www.google.com" in the address bar and hits enter.
Now tell me what the first packet to appear on the Ethernet is .
Thanks
There's no guaranteed always-correct answer, but there are a few likely possibilities.
If the client is configured for DNS over UDP, then the first packet will be a UDP datagram containing a DNS query to resolve www.google.com to an IP address.
If the client is configured for DNS over TCP and the browser hasn't already got an established TCP connection to the DNS server, the first packet will be part of the connection handshake to DNS, and therefore the answer will be that a SYN packet is first out of the gate.
If the browser has been coded to maintain a long-lived TCP connection to the DNS server and assuming the DNS server has allowed the connection to stay alive, the first packet will be a DNS query, sent across the existing connection to that DNS server.
Finally, if the browser had recently visited www.google.com recently and is built to do some smart local caching of DNS query results then the first packet will be a SYN to establish a new connection to Google's web server.
If you want to be glib but absolutely precise about it, drop down a layer for your answer and say, "The first packet out will be an Ethernet frame containing a payload which supports whatever higher-level protocol is needed for the browser to serve up www.google.com". In fairness, the question is about the Ethernet layer...
Strictly speaking, with a completely blank slate, the first packet sent will be an ARP broadcast request ("Who has?") from the client PC attempting to discover the MAC address of its default gateway (or of its DNS server if that is on the same subnet as the client).
Interesting :) I just wiresharked it:
Client sends a SYN
Server replies with a SYN,ACK
Client sends an ACK
Client sends an HTTP GET
(like you mention in your comments the first is obviously the DNS lookup)
Am I able to depend on a requestor's IP coming through on all web requests?
I have an asp.net application and I'd like to use the IP to identify unauthenticated visitors. I don't really care if the IP is unique as long as there is something there so that I don't get an empty value.
If not I guess I would have to handle the case where the value is empty.
Or is there a better identifier than IP?
You can get this from Request.ServerVariables["REMOTE_ADDR"].
It doesn't hurt to be defensive. If you're worried about some horrible error condition where this isn't set, check for that case and deal with it accordingly.
There could be many reasons for this value not to be useful. You may only get the address of the last hop, like a load balancer or SSL decoder on the local network. It might be an ISP proxy, or some company NAT firewall.
On that note, some proxies may provide the IP for which they're forwarding traffic in an additional HTTP header, accessible via
Request.ServerVariables["HTTP_X_FORWARDED_FOR"]. You might want to check this first, then fall back to Request.ServerVariables["REMOTE_ADDR"] or Request.UserHostAddress.
It's certainly not a bad idea to log these things for reference/auditing.
I believe that this value is set by your web sever and there is really no way to fake it as your response to there request wouldn't be able to get back to them if they set there IP to something else.
The only thing that you should worry about is proxies. Everyone from a proxy will get the same IP.
You'll always get an IP address, unless your web server is listening on some sort of network that is not an IP network. But the IP address won't necessarily be unique per user.
Well, web request is an http connection, which is a tcp connection and all tcp connections have two endpoints. So, it always exists. But that's about as much as you know about it. It's neither unique nor reliably accurate (with all the proxies and stuff).
Yes, every request must have an IP address, but as stated above, some ISP's use proxies, NAT or gateways which may not give you the individual's computer.
You can easily get this IP (in c#) with:
string IP = Context.Request.ServerVariables["REMOTE_ADDR"].ToString();
or in asp/vbscript with
IP = request.servervariables("REMOTE_ADDR")
IP address is not much use for identifying users. As mentioned already corporate proxies and other private networks can appear as a single IP address.
How are you authenticating users? Typically you would have them log in and then store that state in their session in your app.
Imagine the following:
User goes to script (http://sample.org/test.php),
Script sends an HTTP request to some other page (http://google.com/). For this example, we'll say using curl.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
I know already that the requesting script will not receive the response, as the remote-host will send any responses to the IP address given in the request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine? If it did, would there be any way to ensure that it was handled by the user's browser? And if so, how would the browser handle this, typically? Would it open a new window with Google in it?
I definitely have a follow up to this question, but I am very curious what goes on at this level, before I experiment further.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
Usually, this won't work. Your ISP knows which IP address you are supposed to have and will not forward traffic coming from "fake" IP addresses.
In particular, since you can only communicate one-way with a fake IP (since the answer won't reach you), you would not be able to establish a working TCP connection, since TCP requires a three-way handshake. Thus, you wouldn't be able to submit your web request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine?
If the user's PC has an internal IP address and uses NAT, the router will not know which LAN machine to forward the packet to (since it did not see any outgoing request to which it could match that response). Therefore, the answer would be dropped.
Even if you could get the response to reach the client:
If it did, would there be any way to ensure that it was handled by the user's browser?
No. As stated above, a TCP request consists of a three-way handshake. This handshake has not been completed, so the operating system would just drop the packet.
CURLOPT_INTERFACE is for use on computers that have multiple IP addresses assigned to them, to specify which of those addresses should be used as the source IP for the connection. You can't use it to spoof some other computer's IP address. Most likely you'll either get an error, or the option will be ignored and the OS will choose a source interface automatically (the default behavior).
The response will be returned on the same TCP connection as the request.