Understanding the application layer in the osi model - networking

I'm aware there are a lot of protocols at the application layer,
The question is more about when it is ok to not follow any of them,
Lets say i have a client and a server and the client app should send some data to that server, for instance, some statistics about a person using the app,
Now, for a good programming practice, is it ok to just open a tcp socket and send the data as is without the overhead of following a protocol or am i breaking the osi model and should i follow one of the protocols at the application layer?
Am i reinventing the wheel here or is it a practical solution?

There's always an application layer protocol. If your concept is to transmit some statistics to a server ''on a certain TCP or UDP port as a plain, decimal number'' then that is your (implicit) application protocol. The protocol enables the server to receive data and assigns a meaning to the number.
The OSI model is a model, not a law. In your application layer protocol you can do whatever you want.
However, it may be useful to anticipate future expansion of the service, so that you could e.g. transmit value_a:data\0value_b:data in one stream/datagram without having to keep client and server versions in perfect sync (with the server not expecting all values and just ignoring unknown values). Of course, you can also use a different server port each time instead - your choice.

Related

In which layer is HTTP in the OSI model?

Some said HTTP is in the session layer in the OSI model.
But in Tanenbaum's Computer Network, HTTP is said to be in the application layer in the OSI model.
Also some said that HTTP has no concept of session. Does it mean that HTTP can't be in the session layer?
So is HTTP in the session layer? Thanks.
Update: For HTTP/2 what is the layer in OSI model?
In which layer is HTTP in the OSI model?
It's in the application layer. See the following quotes from the RFC 7230, one of the documents that currently defines the HTTP/1.1 protocol:
The Hypertext Transfer Protocol (HTTP) is a stateless application-level request/response protocol that uses extensible semantics and self-descriptive message payloads for flexible interaction with network-based hypertext information systems.
HTTP is a stateless request/response protocol that operates by exchanging messages across a reliable transport- or session-layer "connection".
Also some said that HTTP has no concept of session. Does it mean that HTTP can't be in the session layer?
As previously mentioned in the quotes from the RFC 7230, the HTTP protocol is stateless, where each request from client to server (should) contain all of the information necessary to understand the request, without taking advantage of any stored context on the server.
The RFC 6265 defines some mechanisms for state management in HTTP, such as cookies, allowing session management on server side (but it doesn't make HTTP stateful in any ways).
The concept of session in HTTP is different from the concept of session in the OSI model. Anyways, HTTP is an application layer protocol.
The OSI model
The OSI (Open Systems Interconnection) model is a conceptual model created by the International Organization for Standardization which enables diverse communication systems to communicate using standard protocols.
It provides a standard for different computer systems to be able to communicate with each other and can be seen as a universal language for computer networking. It’s based on the concept of splitting up a communication system into seven abstract layers, each one stacked upon the last.
The following picture borrowed from Cloudflare illustrates pretty well what the OSI model is like:
The application layer is the only layer that directly interacts with data from the user. So software applications like web browsers and email clients rely on the application layer to initiate communications.
But it should be made clear that client software applications are not part of the application layer: rather the application layer is responsible for the protocols (such as HTTP and SMTP) and data manipulation that the software relies on to present meaningful data to the user.
The OSI model vs the TCP/IP model
While the OSI model is comprehensive reference framework for general networking systems, it's important to mention that the modern Internet doesn’t strictly follow the OSI model.
The modern Internet more closely follows the simpler Internet protocol suite, which is commonly known as TCP/IP because the foundational protocols in the suite are the TCP (Transmission Control Protocol) and the IP (Internet Protocol).
The following image illustrates how the OSI and TCP/IP models relate to each other:
Update: This section has been added to address the bounty started by noɥʇʎԀʎzɐɹƆ, who requested to update this answer with HTTP/2 details.
Despite the quotes of the document that defines the HTTP/1.1 protocol, all of the above also applies to HTTP/2. Refer to the following quote from the RFC 7540, the document that defines the HTTP/2 protocol:
An HTTP/2 connection is an application-layer protocol running on top of a TCP connection. The client is the TCP connection initiator.
The HyperText Transfer Protocol (HTTP), is the Web’s application-layer protocol,
is at the heart of the Web. It is defined in [RFC 1945] and [RFC 2616].
HTTP is in the Application layer of the Internet protocol suite model and in the Session Layer of the OSI Model. The Session layer of the OSI Model is responsible for creating and managing sessions and is the first layer that passes data.
HTTP can redirect sessions, reuse them and have persistent connections.

HTTP vs TCP for online games

I am wondering about the difference between HTTP and TCP data transfer protocols for online games.
I have heard many people using TCP or UDP to transfer data between client and server for online games.
But can you use http at all? I know http is mostly used for web browsing, but if I could set up web server and let my game applications use GET and POST methods, I can still send data back and forth right? Is it that this way of communicating is too slow or unnecessary?
And just one thing about TCP transmission protocols, if I were to write some gaming application using TCP, is it that the data are usually transferred using something called "sockets" (like Socket classes in Java)? What about UDP?
Thanks very much!
Appreciate any answer!
HTTP is an additional layer on top of TCP that defines what a request looks like, what a response looks like, and how the connection is closed or maintained across requests. You can either use it or not use it, depending on what you actually need to transport. If your game consists of a series of requests that each get a reply, HTTP might make sense. If it's more like unsolicited messages in each direction, making HTTP work is like putting a square peg in a round hole.
Most platforms provide a socket interface that allows you to work with either TCP or UDP depending on the protocol specified when the socket is allocated. Some higher-level APIs look completely different for different protocols.

Is TCP protocol stateless?

HTTP,the protocol residing over TCP protocol is stateless and also the IP protocol is stateless
But how can we conclude that TCP is stateless or not?
You can't assume that any stacked protocol is stateful or stateless just looking at the other protocols on the stack. Stateful protocols can be built on top of stateless protocols and stateless protocols can be built on top of stateful protocols. One of the points of a layered network model is that the kind of relationship you're looking for (statefulness of any given protocol in function of the protocols it's used in conjunction with) does not exist.
The TCP protocol is a stateful protocol because of what it is, not because it is used over IP or because HTTP is built on top of it. TCP maintains state in the form of a window size (endpoints tell each other how much data they're ready to receive) and packet order (endpoints must confirm to each other when they receive a packet from the other). This state (how much bytes the other guy can receive, and whether or not he did receive the last packet) allows TCP to be reliable even over inherently non-reliable protocols. Therefore, TCP is a stateful protocol because it needs state to be useful.
I would also like to point out that while HTTP and HTTPS (which is just HTTP over SSL/TLS, really) are essentially stateless (each request is a valid standalone request per the protocol), applications built on top of HTTP and HTTPS aren't necessarily stateless. For instance, a website can require you to visit a login page before sending a message. Even though the request where the client sends a message is a valid standalone request, the application will not accept it unless the client authenticated herself before. This means that the application implements state over HTTP.
On a side note, the statefulness of HTTP can be somewhat confusing, as several applications (on a clearly different OSI layer) will leak their state to HTTP. For instance, if a user tries to view a blog post that doesn't exist, the blog application might send back a response with the 404 status code, even though the file handling the blog post search itself was found.
tl;dr TCP is stateful.
While Zneak points out that you can use any communication for stateful purposes, the ACTUAL question being asked is whether the protocol itself is stateful.
Wikipedia:
In computing, a stateless protocol is a communications protocol that
treats each request as an independent transaction that is unrelated to
any previous request so that the communication consists of independent
pairs of requests and responses. A stateless protocol does not require the server to retain
session information or status about each communications partner for
the duration of multiple requests. In contrast, a protocol which
requires keeping of the internal state on the server is known as a
stateful protocol.
TCP's "request" (unit of communication) is a TCP packet.
TCP a stateful protocol since parties must remember what state the other is in, and what bytes the other has. Hence the TCP state diagram.
In contrast, UDP is a stateless protocol. Neither endpoint retains any notion of state. (Though as always, the encapsulated information could be used for stateful purposes.)
Here is a nice explanation :
Consider the phone service to be TCP and consider your relationship with distant family members to be HTTP. You will contact them with the phone service. Each call to them would be a stateful TCP connection. However, you don't constantly stay on the phone with them, as you will disconnect and call them back again at a later time. You would certainly expect them to remember what you talked about on the last call. HTTP in itself does not do that, but it is rather a function of the web server that maintains the state of the overall converstation.
To properly answer the question, we need the concept of a stateless protocol used to manage external stateful resources. Section 2.4 of http://laurel.datsi.fi.upm.es/_media/docencia/asignaturas/ws-modelingresources.pdf is about a service that implements such a protocol:
A Service that acts upon stateful resources may be described
“stateless” if it delegates responsibility for the management of the
state to another component such as a database or file system. ... A
consequence of statelessness is that any dynamic state needed for a
given message-exchange execution must be:
provided explicitly within the request message, whether directly by-value or indirectly by-reference, and/or
maintained implicitly within other system components with which the Web service can interact.
So, the http protocol is stateless, if we consider that the files that are served, the database that is accessed, etc. are separated from the implementation of the protocol itself. A service (which implements a protocol) that is stateless in relation with both sides taken together might not appear stateless on each side, because the other side can carry a state.

Discussion: Chat server via node.js: HTTP or TCP?

I was considering doing a chat server using node.js/socket.io. Should I make it a tcp server or a http server? I'd imagine tcp server would be more efficient, but can you send other stuff to it like file attachments etc? If tcp is more efficient, how much more so? Also, just wondering how many concurrent connections can one node.js server handle? Is it more work to do TCP or HTTP?
You are talking about 2 totally different approaches here - TCP is a transport layer protocol and HTTP is an application layer protocol. HTTP (usually) operates over TCP, so whichever option you choose, it will still be operating over TCP.
The efficiency question is sort of a moot point, because you are talking about different OSI layers. If you went for raw TCP sockets, your solution would probably be more efficient - in bandwidth at least - since HTTP contains a whole bunch of extra data (the headers) that would likely be irrelevant to your purposes (depending on the scale of the chat program). What you are talking about developing there is your own application layer protocol.
You can send anything you like over TCP - after all HTTP can send attachments, and that operates over TCP. FTP also operates over TCP, and that is designed purely for transferring "attachments". In order to do this, you would need to write your protocol so that it was able to tell the remote party that the following data was a file, then send the file data, then tell the remote party that the transfer is complete. Implementations of this are many and varied (the HTTP approach is completely different from the FTP approach) and your options are pretty much infinite.
I don't know for sure about the node.js connection limit, but I can say with a fair amount of confidence that it is limited by the operating system. This might help you get to grips with the answer to that question.
It is debatable whether it is more work to do it with TCP or HTTP - it's a lot of work to do it in both. I would probably lean more toward the TCP option being your best bet. While TCP would require you to design a protocol rather than/as well as an application, HTTP is not particularly suited to live, 2-way applications like chat servers. There are many implementations of chat over HTTP that use AJAX, but I can tell you from painful experience that they are a complete pain in the rear-end.
I would say that you should only be looking at HTTP if you are intending the endpoint (i.e. the client) to be a browser. If you are going to write a desktop app for the endpoint, a direct TCP link would definitely be the way to go. The main reason for this is that HTTP works in a request-response manner, where the client sends a request to the server, and the server responds. Over TCP you can open a single TCP stream, that can be used for bi-directional communication. This means that the server can push an event to the client instantly, while over HTTP you have to wait for the client to send a request, so you can respond with an event. If you were intending to use a browser as the client, it will make the whole file transfer thing much more tricky (the sending at least).
There are ways to implement this over HTTP using long-polling and server push (read this) but it can be a real pain to implement.
If you are going to implement this on a LAN (or possibly even over the internet) it is worth considering UDP over TCP - in a chat application it is not usually absolutely mission critical that messages arrive in the right order, and even if it was, users would probably not be able to type faster than the variations in network latency (probably <100ms). Then for file transfers you could either negotiate a seperate TCP socket for the data exchange (like FTP), or implement some kind of UDP ACK system (like TFTP).
I feel there is a lot more to say on this subject but right now I can't put it into words - I may extend this answer at some point.
Chat servers are the Hello World program in node. Use http.
As far as the question of how many concurrent connections can it handle, that all depends on your system. Set up a simple chat server and then try benchmarking it.
Also, check out http://search.npmjs.org/ and search for chat for a few pointers.

HTTP push over 100,000 connections

I want to use a client-server protocol to push data to clients which will always remain connected, 24/7.
HTTP is a good general-purpose client-server protocol. I don't think the semantics possibly could be very different for any other protocol, and many good HTTP servers exist.
The critical factor is the number of connections: the application will gradually scale up to a very large number of clients, say 100,000. They cannot be servers because they have dynamic IP addresses and may be behind firewalls. So, a socket link must be established and preserved, which leads us to HTTP push. Only rarely will data actually be pushed to a given client, so we want to minimize the connection overhead too.
The server should handle this by accepting the connection, inserting the remote IP and port into a table, and leaving it idle. We don't want 100,000 threads running, just so many table entries and file descriptors.
Is there any way to achieve this using an off-the-shelf HTTP server, without writing at the socket layer?
Use Push Framework : http://www.pushframework.com.
It was designed for that goal of managing a large number of long-lived asynchronous full-duplex connections.
LightStreamer (http://www.lightstreamer.com/) is the tool that is made specifically for PUSH operations of HTTP.
It should solve this problem.
You could also look at Jetty + Continuations.

Resources