How does client-machine/browser handle unrequested HTTP response? - http

Imagine the following:
User goes to script (http://sample.org/test.php),
Script sends an HTTP request to some other page (http://google.com/). For this example, we'll say using curl.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
I know already that the requesting script will not receive the response, as the remote-host will send any responses to the IP address given in the request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine? If it did, would there be any way to ensure that it was handled by the user's browser? And if so, how would the browser handle this, typically? Would it open a new window with Google in it?
I definitely have a follow up to this question, but I am very curious what goes on at this level, before I experiment further.

The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
Usually, this won't work. Your ISP knows which IP address you are supposed to have and will not forward traffic coming from "fake" IP addresses.
In particular, since you can only communicate one-way with a fake IP (since the answer won't reach you), you would not be able to establish a working TCP connection, since TCP requires a three-way handshake. Thus, you wouldn't be able to submit your web request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine?
If the user's PC has an internal IP address and uses NAT, the router will not know which LAN machine to forward the packet to (since it did not see any outgoing request to which it could match that response). Therefore, the answer would be dropped.
Even if you could get the response to reach the client:
If it did, would there be any way to ensure that it was handled by the user's browser?
No. As stated above, a TCP request consists of a three-way handshake. This handshake has not been completed, so the operating system would just drop the packet.

CURLOPT_INTERFACE is for use on computers that have multiple IP addresses assigned to them, to specify which of those addresses should be used as the source IP for the connection. You can't use it to spoof some other computer's IP address. Most likely you'll either get an error, or the option will be ignored and the OS will choose a source interface automatically (the default behavior).
The response will be returned on the same TCP connection as the request.

Related

Does a HTTP response indicate the address of the receiver or something similar?

A HTTP request indicates the resource on the server by the URL in the start line and the HOST header.
Does a HTTP response indicate the address of the receiver or something similar?
If not, why is it not necessary?
Thanks.
Internet protocols are layered.
HTTP requests are wrapped in TCP packages, which are wrapped in IP.
The outside IP packet contains information about who is the receipient and who is the sender of a message. Based on this information a TCP/IP service knows where to send the message back to.
The Host header was actually a later addition to HTTP. It wasn't really needed before because it was safer to assume that a single ip address would have a single HTTP service. The Host header was added because people needed many different domains to be served from a smaller set of ip addresses and send different responses based on what the domain was.
Without the Host header it wouldn't have been possible to know which domain the user wanted because the ip packet only encodes the ip address, not which domain was used to find the ip.

Why is it not possible to spoof an ip address (without using a proxy) and still receive a response?

I understand that if I tell my computer to send TCP packets from a fake ip address - say 128.5.32.3 - then my computer will happily send the packets out but not receive them in response.
But why is no response received? At which point in the chain is the return packet dropped?
Or, to give the same question asked another way - if my internet provider assigns me some arbitrary IP address, why can't my computer tell the internet provider to give me a different, arbitrary, IP address?
It's like sending a letter with a return address in it that is invalid. The mail will still get there, but if they send it back the postman (router) will at best be able to deliver it to a fake return address.
Your internet provider gives you an address on internet that isn't arbitrary rather one of it's internet addresses it has allocated. You can't 'move house' by wishing it.
If you do move house by getting another valid address you still need to receive a response using address supplied.
The postmen (routers) are incorruptible AFAIK :)
To start with your question about why no response is received, it is because the response goes to the person whose IP you spoofed. This can be abused, and an example if this is a "smurf attack". You would need to control the spoofed IP in order to receive the response, and there would be no point to spoofing if you had this control.
As for your question about why you cannot make your ISP assign you an IP is because, firstly, your ISP has control of a range of IPs and cannot assign IPs out of its permitted range. Secondly, most ISPs won't take into account the IP that your device wants. It has full control and will control your IP how it wants, so you cannot change your external IP at will.
There are many reasons why an ISP will not give an 'arbitrary' IP address. These include
They themselves only have a block of IP addresses they are allowed to allocate to users, if the IP address you want to use is not in this block there's nothing they can do (even if they want to, which they probably don't)
You are mostly likely being assigned an IP through DHCP (unless your provider is very generous or you are paying for a static IP). This also means that your IP is frequently changing.
The reason you receive no response is, as you put it, because the spoofed address is not your IP address. You are in essence telling the receiver of the TCP packets to respond to a different user (e.g., you send a packet, and they respond to some random stranger).

Ensure http response goes to $_SERVER['REMOTE_ADDR']

I want to ensure people who use my site are who they say they are. Not who they say, they say, they are.
How do I ensure my data is going back to the IP given by $_SERVER['REMOTE_ADDR']?
Or is it automatic that the http response is sent there?
How do I ensure my data is going back to the IP given by
$_SERVER['REMOTE_ADDR']?
The client IP is handled by the TCP protocol. The REMOTE_ADDR property is populated by the client address of the TCP connection. It's not part of the HTTP protocol. So it is guaranteed that your application is talking to this IP.
This doesn't mean that the IP that you are seeing is the actual IP of the end-user (as attributed for example by his internet provider). There could be proxies or intermediate devices between him and your application. So basically what you will be seeing is the closest IP to your web server in this chain.

What happens when my browser does a search? (ARP,DNS,TCP specifics)

I'm trying to learn the basics of ARP/TCP/HTTP (in sort of a scatter-shot way).
As an example, what happens when I go to google.com and do a search?
My understanding so far:
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the
MAC address in the ARP cache)
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows
where the DNS server is? Or is it the gateway that knows?)
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
Does this seem about right? / Did I get anything wrong or leave out anything crucial?
Firstly try to understand that your home router is two devices: a switch and a router.
Focus on these facts:
The switch connects all the devices in your LAN together(including the router).
The router merely connects your switch(LAN) with the ISP(WAN).
Your LAN is essentially an Ethernet network which works with MAC addresses.
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the MAC
address in the ARP cache)
Correct.
When you want to send a file from your dekstop to your laptop, you do not want to go through the router. You want to go through the switch, as that is faster(lower layer). However you only know the IP of the laptop in your network. For that reason you need to get its MAC address. That's where ARP kicks in.
In this case you would broadcast the ARP request in the LAN until someone responds to you. This could be the router or any other device connected to the switch.
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows where
the DNS server is? Or is it the gateway that knows?)
If you use DHCP, then that has already provided you with the IP of the DNS server. If not, then it means that you manually provided the IP of the DNS. So the IP of the DNS server is stored locally on your computer.
Making a DNS request is just about putting its IP in the packet with the request and forwarding the packet to the network.
Sidenote: DHCP also provides the IP address of the router.
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
Yes. To clarify things: When your computer sends the request
FRAME[IP[TCP[GET www.google.com]]]
The frame is being sent to your LAN's switch which forwards it to the MAC of the router. Your router will open the frame to check the destination IP and route it accordingly(in this case to the WAN). Finally when the frame arrives at the server, the server will open the TCP segment and read the payload, which is the HTTP message. The ACK/SYN etc. messages are being processed just by your computer and the server and not any router or switch.
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
Yes. An HTML file is essentially a tree structure which can have embedded resources like images, javafiles, CSS etc. For each such resource a new request has to be sent.
Once your browser gets all these recourses, it will render the webpage.
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
When you type a single character, it is being sent to the server. The server then responds with its suggestions. Easy as that.
References(good reads):
http://www.tcpipguide.com/free/t_TheNeedForAddressResolution.htm
http://www.howtogeek.com/99001/htg-explains-routers-and-switches/
http://www.eventhelix.com/realtimemantra/networking/ip_routing.htm#.UsrYAvim3yO
http://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol

what packet will arrive first when send request

As some one mentioned in other forum that interviewer has asked the question given below.
I dont know exact answer but I would say HTTP request ? Any suggestion and explainations
Imagine a user sitting at an Ethernet-connected PC. He has a browser open. He types "www.google.com" in the address bar and hits enter.
Now tell me what the first packet to appear on the Ethernet is .
Thanks
There's no guaranteed always-correct answer, but there are a few likely possibilities.
If the client is configured for DNS over UDP, then the first packet will be a UDP datagram containing a DNS query to resolve www.google.com to an IP address.
If the client is configured for DNS over TCP and the browser hasn't already got an established TCP connection to the DNS server, the first packet will be part of the connection handshake to DNS, and therefore the answer will be that a SYN packet is first out of the gate.
If the browser has been coded to maintain a long-lived TCP connection to the DNS server and assuming the DNS server has allowed the connection to stay alive, the first packet will be a DNS query, sent across the existing connection to that DNS server.
Finally, if the browser had recently visited www.google.com recently and is built to do some smart local caching of DNS query results then the first packet will be a SYN to establish a new connection to Google's web server.
If you want to be glib but absolutely precise about it, drop down a layer for your answer and say, "The first packet out will be an Ethernet frame containing a payload which supports whatever higher-level protocol is needed for the browser to serve up www.google.com". In fairness, the question is about the Ethernet layer...
Strictly speaking, with a completely blank slate, the first packet sent will be an ARP broadcast request ("Who has?") from the client PC attempting to discover the MAC address of its default gateway (or of its DNS server if that is on the same subnet as the client).
Interesting :) I just wiresharked it:
Client sends a SYN
Server replies with a SYN,ACK
Client sends an ACK
Client sends an HTTP GET
(like you mention in your comments the first is obviously the DNS lookup)

Resources