Can someone tell me what is the difference between active and passive FTP?
Which one is preferable?
Active and passive are the two modes that FTP can run in.
For background, FTP actually uses two channels between client and server, the command and data channels, which are actually separate TCP connections.
The command channel is for commands and responses while the data channel is for actually transferring files.
This separation of command information and data into separate channels a nifty way of being able to send commands to the server without having to wait for the current data transfer to finish. As per the RFC, this is only mandated for a subset of commands, such as quitting, aborting the current transfer, and getting the status.
In active mode, the client establishes the command channel but the server is responsible for establishing the data channel. This can actually be a problem if, for example, the client machine is protected by firewalls and will not allow unauthorised session requests from external parties.
In passive mode, the client establishes both channels. We already know it establishes the command channel in active mode and it does the same here.
However, it then requests the server (on the command channel) to start listening on a port (at the servers discretion) rather than trying to establish a connection back to the client.
As part of this, the server also returns to the client the port number it has selected to listen on, so that the client knows how to connect to it.
Once the client knows that, it can then successfully create the data channel and continue.
More details are available in the RFC: https://www.ietf.org/rfc/rfc959.txt
I recently run into this question in my work place so I think I should say something more here. I will use image to explain how the FTP works as an additional source for previous answer.
Active mode:
Passive mode:
In an active mode configuration, the server will attempt to connect to a random client-side port. So chances are, that port wouldn't be one of those predefined ports. As a result, an attempt to connect to it will be blocked by the firewall and no connection will be established.
A passive configuration will not have this problem since the client will be the one initiating the connection. Of course, it's possible for the server side to have a firewall too. However, since the server is expected to receive a greater number of connection requests compared to a client, then it would be but logical for the server admin to adapt to the situation and open up a selection of ports to satisfy passive mode configurations.
So it would be best for you to configure server to support passive mode FTP. However, passive mode would make your system vulnerable to attacks because clients are supposed to connect to random server ports. Thus, to support this mode, not only should your server have to have multiple ports available, your firewall should also allow connections to all those ports to pass through!
To mitigate the risks, a good solution would be to specify a range of ports on your server and then to allow only that range of ports on your firewall.
For more information, please read the official document.
Redacted version of my article FTP Connection Modes (Active vs. Passive):
FTP connection mode (active or passive), determines how a data connection is established. In both cases, a client creates a TCP control connection to an FTP server command port 21. This is a standard outgoing connection, as with any other file transfer protocol (SFTP, SCP, WebDAV) or any other TCP client application (e.g. web browser). So, usually there are no problems when opening the control connection.
Where FTP protocol is more complicated comparing to the other file transfer protocols are file transfers. While the other protocols use the same connection for both session control and file (data) transfers, the FTP protocol uses a separate connection for the file transfers and directory listings.
In the active mode, the client starts listening on a random port for incoming data connections from the server (the client sends the FTP command PORT to inform the server on which port it is listening). Nowadays, it is typical that the client is behind a firewall (e.g. built-in Windows firewall) or NAT router (e.g. ADSL modem), unable to accept incoming TCP connections.
For this reason the passive mode was introduced and is mostly used nowadays. Using the passive mode is preferable because most of the complex configuration is done only once on the server side, by experienced administrator, rather than individually on a client side, by (possibly) inexperienced users.
In the passive mode, the client uses the control connection to send a PASV command to the server and then receives a server IP address and server port number from the server, which the client then uses to open a data connection to the server IP address and server port number received.
Network Configuration for Passive Mode
With the passive mode, most of the configuration burden is on the server side. The server administrator should setup the server as described below.
The firewall and NAT on the FTP server side have to be configured not only to allow/route the incoming connections on FTP port 21 but also a range of ports for the incoming data connections. Typically, the FTP server software has a configuration option to setup a range of the ports, the server will use. And the same range has to be opened/routed on the firewall/NAT.
When the FTP server is behind a NAT, it needs to know it's external IP address, so it can provide it to the client in a response to PASV command.
Network Configuration for Active Mode
With the active mode, most of the configuration burden is on the client side.
The firewall (e.g. Windows firewall) and NAT (e.g. ADSL modem routing rules) on the client side have to be configured to allow/route a range of ports for the incoming data connections. To open the ports in Windows, go to Control Panel > System and Security > Windows Firewall > Advanced Settings > Inbound Rules > New Rule. For routing the ports on the NAT (if any), refer to its documentation.
When there's NAT in your network, the FTP client needs to know its external IP address that the WinSCP needs to provide to the FTP server using PORT command. So that the server can correctly connect back to the client to open the data connection. Some FTP clients are capable of autodetecting the external IP address, some have to be manually configured.
Smart Firewalls/NATs
Some firewalls/NATs try to automatically open/close data ports by inspecting FTP control connection and/or translate the data connection IP addresses in control connection traffic.
With such a firewall/NAT, the above configuration is not necessary for a plain unencrypted FTP. But this cannot work with FTPS, as the control connection traffic is encrypted and the firewall/NAT cannot inspect nor modify it.
Active mode:
-server initiates the connection.
Passive mode:
-client initiates the connection.
Active Mode—The client issues a PORT command to the server signaling that it will “actively” provide an IP and port number to open the Data Connection back to the client.
Passive Mode—The client issues a PASV command to indicate that it will wait “passively” for the server to supply an IP and port number, after which the client will create a Data Connection to the server.
There are lots of good answers above, but this blog post includes some helpful graphics and gives a pretty solid explanation: https://titanftp.com/2018/08/23/what-is-the-difference-between-active-and-passive-ftp/
Related
Let's assume that I start a server at one of the computers in my private network (192.168.10.10:9900).
Now when making a request from some other computer in the same network, how does the client computer (OS?) knows which protocol to use / which protocol the server follows ? [TCP or UDP]
EDIT: As mentioned in the answers, I was basically looking for a default protocol which will be used by the client in the absence of any transport protocol information.
TCP / UDP protocols work at the transport layer level (TCP / IP MODEL) and its main difference is that TCP has a method to ensure the arrival of messages while UDP is lighter because of its virtue is to be faster in Information delivery. The use of one protocol or another is always defined by the application that will use it.
So the reference you put on the private server with ip: port 192.168.10.10:9900 is very vague to be more precise we could say that we have an Apache web server running on the ip: port 192.168.10.10:9900 (the port for default is 80 when installing the server, but it can be changed in the configuration).
Now the web servers (apache, IIS, etc.) work using the TCP protocol because when a client (computer, cell phone, etc.) consults a page through a browser (Chrome, Firefox, etc.), the ideal thing is that all the website and not just some pieces. This is why this type of servers chose and use this protocol in the first instance since they seek that in the end the result is that the user obtains the complete page regardless of whether a few more milliseconds are sacrificed in the validations involved in using TPC.
Now going to the client side. The user when visiting a web page from any browser (Chrome, Firefox, etc.) will use TCP since this protocol is already configured in the browser to send the query messages and subsequently receive the messages with the same form Website information.
Now this behavior is going to be repeated for any client / server application. For example, to change the type of application on the UDP side, we can observe the operation of DHCP services which are used to receive an IP when connecting any device to a Wi-Fi network. In this case, this service seeks to be as fast as possible (instead of the most reliable) since you want the device to connect as quickly as possible to the network, so use the UDP protocol and in this case any equipment when connecting To a WIFI network you will send your messages using this protocol.
Finally, if you want to know promptly about the type of TCP / UDP protocol used by a specific application, you can search on the Wireshark application which allows you to scan the messages that leave the device or show the protocol used in the different layers of the application.
There is no reason any client would make a request to your server, so why would it care what protocol it follows? Clients don't just randomly connect to things to see if there's a server there. So it doesn't make any difference to any client.
Normally, the client computer will use the TCP protocol as default. If you start the server using UDP protocol mode, then when you use curl -XGET 192.168.10.10:9900/test-page, it will give you back an curl: (7) Failed to connect to 192.168.10.10 port 9900: Connection refused error. You can try it, use the nc -lvp 9900 -u, it will give you that result.
The answers here are pointing to some default protocol. Its' not that, Whenever you start an application let say HTTP server, the server's internal has code to open a socket(which can be TCP or UDP), since HTTP:80 is a TCP protocol the code creates a TCP socket. Similarly for other network application it depends on their requirement what kind of transport layer protocol to use (TCP Or UDP). Like a DNS client will create a UDP socket to connect to DNS server, since DNS:53 is mostly over UDP. Both TCP and UDP have different use cases, advantages and disadvantages. Depending on there uses cases / advantages / disadvantages of UDP/TCP decision is taken to implement server using either of them.
When we open a TCP Listening, we use a fixed port, like "9870".
But the clients which connect to this listening, use different ports like "1024, 1025" or other. I don't know what is the name of this port, "client port", "dynamic port" or "ephemeral port"... But I need to know if is possible to change this client port.
Because, like in the second image, it shows the error "Port numbers reused", and I think this is related to this port configuration.
I think if I could configure these ports, the connections of the equipments on my network will be stabilized.
TL;TR: there is usually no need to configure the clients source ports and you can definitely not set the clients source port at the server.
The client can bind to a address+port the same way the server can do and this port is then used as the source port for the connection. But usually this is not done and instead the socket is not specifically bound and a free source port is automatically assigned by the system. The client source port can only be set by the client itself and can not be changed by the server.
Usually it is not possible that a port number gets reused by the client since the OS will not let the client do this. But what you see can happen if the client crashes . After the restart the client is not aware of any connections which were established (and never closed) before the reboot so it will happily use the same source port again. In this case it gets a RST from the server since the new data do not match the old connection.
This can also happen if the client is connected with some router doing NAT and the router crashes. After restart the router is not aware of any previous connections and will thus create new translations which might conflict with old connections.
Can someone describe from a network point of view what RPC (SUN and/or DCE) is and why it deviates from standard TCP behavior?
The way that I understand it is a client reaches out to a server with a unique source port and then switches the source port after the TCP three way handshake finishes. I work with ASA firewalls so this behavior becomes very apparent when the inspection of DCE RPC is not enabled since the firewall will block it because it sees it as a threat. I have read a few MS TechNet articles and other website definitions to include watching about five Youtube videos which all seem to explain it from a programmers perspective but I have yet to fully understand this concept since I am not a programmer.
Note that there is nothing that deviates from standard TCP regarding the RPC protocols.
SunRPC or DCE RPC works on top of UDP(at least SunRPC can use UDP) or on top of TCP.
Typically in order for an RPC client to contact/call an RPCserver, it first contacts some sort of lookup server (called portmapper or rpcbind in the case of SunRPC), which replies with the location (IP address and port number) of where the actual server is running.
So from a networking perspective:
RPC Servers listens on a random port number, which may change each time that server program is (re)started.
At startup the RPC server connects to the portmapper, which runs on a well known port and register itself with which IP address and port number it's listening on.
Normally the portmapper service runs on the same machine as the RPC server programs.
When a client wants to connect to or call an RPC service it performs these actions:
Connects to the portmapper, on a well known/standard destination port and asks it where the particular service it wants to connect to is.
portmapper replies with the IP address and port number of the service the client asked for.
client tears down the connection to the portmapper
client establishes a new connection to the service using the IP address and port number that pormapper gave it.
client calls the RPC service over this new connection, which the client may use for multiple RPC calls.
These RPC calls are just application message exchanged on top of a TCP connection.
(In the case UDP is used instead of TCP, it works much the same, but there's no naturally no connection setup/teardown being performed over the network)
This presents a problem for firewalls, since the servers listens on randomly chosen port numbers, one cannot administratively allow access to a particular port number. Instead a firewall wanting to support this kind of setup would need to open up the portmapper port, catch the RPC messages going to that well known port of portmapper, inspect the message content exchanged with the portmapper to extract the IP address and port number from the RPC messages(the portmapper is itself implemented as an RPC server) in order to dynamically open a port between the RPC server and client.
Based on my understanding, port numbers are just like telephone extensions. Just as a business telephone switchboard can use a main phone number and assign each employee an extension number (like x100, x101, etc.), so a computer has a main address and a set of port numbers to handle incoming and outgoing connections.
But the question is:
On what basis is a port number assigned? A process or an application?
Based on my experience with firewall, I usually open a port for a specific application. So port number should be assigned on an application's basis. But what if there're multiple instances of the same application running on a single machine. Each of the instances uses the same port number. So if a message is arrived at that port number, how could the system tell which instance should the message go?
And another question also related to port.
If a web server is setup to listen on port 80, client browser should always contact the 80 port. I am not sure if the following illustration of the communication between a web browser and the web server is correct.
Client Browser sent request to Server, the message should contain info like this:
To: < ServerAddress:80 >
From: < ClientAddress:XXX >
Server sent reponse to Client Browser like this:
To: < ClientAddress:XXX >
From: < ServerAddress:80 >
So the question is, will the server pick other port numbers for sending messages to client? Because I think a single 80 port doesn't look enough.
Add - 1 - 21:16 2010/12/19
In my above post, the word "application" represents a static program file that the system knows. Multiple instnaces of this application could be launched, which are multiple "processes"
Each client connection will be represented by a socket on the server. Sockets are uniquely represented by the combination of the following 4 pieces of information:
Peer IP address
Peer port
Local IP address
Local port
The client chooses a random port, so if there are multiple connections from one client to the same server/port, the connections will still differ by the client's port.
If there are multiple web server applications running on the same server, they will have to listen on different ports or the server will need to have multiple IP addresses.
On a computer, only one process can be listening on a specific port number. For example, if an Apache process is listening on port 80, no other application can also listen on port 80.
Apache usually pre-forks several processes, only one of those is listening on port 80. The job of that process is to hand over the processing for any connection to one of the pool of other Apache processes as quickly and efficiently as it can.
Each of many concurrent connections to port 80 is distinguished by it's source IP-address and by the source TCP port number (which the client computer chooses randomly from the set not in use).
(Edit)
I was pretty sure that webservers have one process (or thread) listening which accepts incoming connections and passes corresponding filehandles to the worker processes (or threads). EJP advises that this is not so.
Apache seems to have several different multi-processing modules that affect how it spreads the load of responding to multiple concurrent requests. For example: MPM Prefork and MPM Worker
Jeff Pozkaner wrote an overview of HTTP server design that I found interesting:
The basic operation of a web server is to accept a request and send back a response. The first web servers were probably written to do exactly that. Their users no doubt noticed very quickly that while the server was sending a response to someone else, they couldn't get their own requests serviced. There would have been long annoying pauses.
The second generation of web servers addressed this problem by forking off a child process for each request. …
A slight variant of this type of server uses "lightweight processes" or "threads" instead of full-blown Unix processes. …
The third generation of servers is called "pre-forking". Instead of starting a new subprocess for each request, they have a pool of subprocesses that they keep around and re-use. …
The fourth generation. One process only. No non-portable threads/LWPs. Sends multiple files concurrently using non-blocking I/O, calling select()/poll()/kqueue() to tell which ones are ready for more data. …
Network stack distinguishes TCP connections by triple <source IP,source port,destination port>, so knowing client address and port is enough to work correctly.
What is the application, if it is not a process? In firewalls you open ports for executables. It may be considered as an application, and it is a process when it is running.
Multiple listeners cannot listen to the same port. The same process can listen to multiple ports.
Ports are assigned to the listeners. Depending on the firewall (and its configuration) you can allow the process (executable) to listen several ports, or to create several exceptions for the same process listening to multiple ports.
I'm not sure what you mean by the difference between a "process" and an "application". Everything is just code executing on your box.
Anyway, a process/application will listen/bind to whatever port number the authors of the application have configured. By convention, many port numbers are reserved for particular types of application - that is applications which communicate using a particular protocol. So for example web servers which use HTTP typically run on port 80. SMTP servers run on port 22. HTTPS is 443 and so on.
Of course you can configure your web server (e.g apache httpd) to run on whatever port you like - but your client needs to know else it will assume port 80.
Two processes/applications may not bind to the same port. If you try to start another process/application on a port already in use you'll get an error: cannot bind to port or something to that effect.
will the server pick other port
numbers for sending messages to
client?
No. All the accepted sockets use the same server-side port number as the original listening socket. The identifying tuple mentioned above disambiguates this so as to make each connection unique.
To my understanding by serverSocket = new ServerSocket(portNumber) we create an object which potentially can "listen" to the indicated port. By clientSocket = serverSocket.accept() we force the server socket to "listen" to its port and to "accept" a connection from any client which tries to connect to the server through the port associated with the server. When I say "client tries to connect to the server" I mean that client program executes "nameSocket = new Socket(serverIP,serverPort)".
If client is trying to connect to the server, the server "accepts" this client (i.e. creates a "client socket" associated with this client).
If a new client tries to connect to the server, the server creates another client socket (associated with the new client). But how the server knows if it is a "new" client or an "old" one which has already its socket? Or, in other words, how the clients are identified? By their IP? By their IP and port? By some "signatures"?
What happens if an "old" client tries to use Socket(serverIP,serverIP) again? Will server create the second socket associated with this client?
The server listens on an address and port. For example, your server's IP address is 10.0.0.1, and it is listening on port 8000.
Your client IP address is 10.0.0.2, and the client "connects" to the server at 10.0.0.1 port 8000. In the TCP connect, you are giving the port of the server that you want to connect to. Your client will actually get its own port number, but you don't control this, and it will be different on each connection. The client chooses the server port that it wants to connect to and not the client port that it is connecting from.
For example, on the first connection, your client may get client-side port 12345. It is connecting from 10.0.0.2 port 12345 to the server 10.0.0.1 port 8000. Your server can see what port the client is connecting from by calling getpeername on its side of the connection.
When the client connects a second time, the port number is going to be different, say port 12377. The server can see this by calling getpeername on the second connection -- it will see a different port number on the client side. (getpeername also shows the client's IP address.)
Also, each time you call accept on the server, you are getting a new socket. You still have the original socket listening, and on each accept you get a new socket. Call getpeername on the accepted socket to see which client port the connection is coming from. If two clients connect to your server, you now have three sockets -- the original listening socket, and the sockets of each of the two clients.
You can have many clients connected to the same server port 8000 at the same time. And, many clients can be connected from the same client port (e.g. port 12345), only not from the same IP address. From the same client IP address, e.g. 10.0.0.2, each client connection to the server port 8000 will be from a unique client port, e.g. 12345, 12377, etc. You can tell the clients apart by their combination of IP address and port.
The same client can also have multiple connections to the server at the same time, e.g. one connection from client port 12345 and another from 12377 at the same time. By client I mean the originating IP address, and not a particular software object. You'll just see two active connections having the same client IP address.
Also, eventually over time, the combination of client-address and client-port can be reused. That is, eventually, you may see a new client come in from 10.0.0.2 port 12345, long after the first client at 10.0.0.2 port 12345 has disconnected.
Every TCP connection has as identifier the quadruple (src port, src address, dest port, dest address).
Whenever your server accepts a new client, a new Socket is created and it's indipendent from every other socket created so far. The identification of clients is not implictly handled somehow..
You don't have to think sockets as associated to "clients", they are associated with an ip and a port, but there is not direct correlation between these two.
If the same client tries to open another socket by creating a new one you'll have two unrelated sockets (because ports will be different for sure). This because the client cannot use the same port to open the new connection so the quadruple will be different, same client ip, same server ip, same server port but different client port.
EDIT for your questions:
clients don't specify a port because it's randomly choosen from the free ones (> 1024 if I'm not wrong) from the underlying operating system
a connection cannot be opened from a client using the same port, the operating system won't let you do that (actually you don't specify any port at all) and in any case it would tell you that port is already bound to a socket so this issue cannot happen.
whenever the server receives a new connection request it's is considered new, because also if ip is the same port will be different for sure (in case of old packet resend or similar caveats I think that the request will be discarded)
By the way all these situations are clearly explained in TCP RFC here.
I think the question here is why do you care if the client is new or old. What is new and old?
For example, a web browser could connect to a web server to request a web page. This will create a connection so serverSocket.accept() will return a new Socket. Then the connection is closed by the web browser.
Afer a couple of minutes, the end used click on a link in the web page and the browser request a new page to the server. This will create a connection so serverSocket.accept() will return a new Socket.
Now, the web server do not care if this is a new or old client. It just need to server the requested page. If the server do care if the "client" already requested a page in the past, it should do so using some information in the protocol used on the socket. Check out http://en.wikipedia.org/wiki/OSI_model
In this case, the ServerSocket and Socket ack on the transport level. The question "does this client already requested a page on the server" should be answered by information on the session or even application layer.
In the web browser/server example, the http protocol (which is an application) protocol hold information about who is this browser in the parameters of the request (the browser transmit cookie informations with every request). The http server can then set/read cookie information to known if the browser connected before and eventually maintain a server side session for that browser.
So back to your question: why do you care if it's a new or old client?
A socket is identified by:
(Local IP,Local Port, Remote IP,
Remote Port,IP Protocol(UDP/TCP/SCTP/etc.)
And that's the information the OS uses to map the packets/data to the right handle/file descriptor of your program. For some kinds of sockets,(e.g. an non-connected UDP socket)the remote port/remote IP might be wildcards.
By definition, this is not a Java related question, but about networking in general, since Sockets and SeverSockets apply to any networking-enabled programming language.
A Socket is bounded to a local-port. The client will open a connection to the server (by the Operating System/drivers/adapters/hardware/line/.../line/hardware/adapters/drivers/Server OS). This "connection" is done by a protocol, called the IP (Internet Protocol) when you are connected to the Internet. When you use "Sockets", it will use another protocol, which is the TCP/IP-protocol.
The Internet Protocol will identify nodes on a network by two things: their IP-address and their port. The TCP/IP-protocol will send messages using the IP, and making sure messages are correctly received.
Now; to answer your question: it all depends! It depends on your drivers, your adapters, your hardware, your line. When you connect to your localhost machine, you will not get further than the adapter. The hardware isn't necessairy, since no data is actually sent over the line. (Though often you need hardware before you can have an adapter.)
By definition, the Internet Protocol defines a connection as pair of nodes (thus four things: two IP-adresses and two ports). Also, the Internet Protocol defines that one node can only use one port at a time to initiate a connection with another node (note: this only applies for the client, not the server).
To answer your second question: if there are two Sockets: the "new" and the "old". Since, by the Internet Protocol, a connection is a pair of nodes, and nodes can only use one port at a time for a connection, the ports of "new" and "old" must be different. And because this is different, the "new" client can be discriminated from the "old", since the port-number is differently.