What classifies as a Protocol? - uri

I'm trying to explain the difference between schemes and protocol in a URI for a small web video.
I'd gotten up to the point where I can clarify that "Not all schemes are grouped with a protocol by looking at the File: scheme in RFC here.
In the RFC document, it refers us to a discussion of operations that can be done with the file scheme. Which then says:
See the POSIX file and directory operations [POSIX] for examples of
standardized operations that can be performed on files.
If a protocol is "A system of rules or agreed procedures" what do they mean by "standardized operations"? Aren't they also agreed with procedures on how to handle something?
I can't go any further because there is no link in the POSIX section, but what I'm really trying to find out is if I can say this in my video without anyone yelling at me:
"Not all schemes have been given a protocol!
So several different operations that are not protocols happen instead." (But what are they? Is this statement correct or wrong?) <-----
because it sounds to me like those other operations that happen (e.g. on the file: scheme) can also possibly be protocols since they are standard of something.
Or does having a protocol mean that there is only one agreed way to do something and it should not be open to other multiple operations being allowed to happen?
Question:
When there is no protocol coupled with a scheme, is the fallback other protocols or are they just other operations that have standards that we agreed upon? (like a protocol? if so what distinguishes the two?)
Update:
In the end my research for Protocols vs Other scheme operations I've had concluded me to say that Protocols are different as they stand out to be a system of rules or agreed procedures for communication or the transfer of information between two or more entities, computer systems, or tools.
While just being broad and saying other possible actions may happen with schemes such as the file scheme. (I still have no idea if some of these actions qualify as more than one possible protocol or if they really are something different entirely and not protocols at all)
As someone that wants to be extra certain with the definition of a protocol vs standard operation, I'm hoping I can get another expert opinion as to saying my above conclusion is correct or wrong.
(Protocol = Communication)
(+ hopefully an example of a standard operation for File scheme that makes it different from a protocol - or if it is a mix of protocols and other functions that are not considered as protocols)

Scheme is a concrete syntax definition. Scheme is used in different place and context in programming world. But as per the context you are referring to Scheme can have different usages
URI Scheme (https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml#uri-schemes-1)
Authentication Scheme (https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/understanding-http-authentication#http-authentication-schemes)
As you can see from above two Scheme is used while defining a standard way writing URL, it is also used for defining types of activities/approaches/mechanism you can use. Another funny part is in the Wikipedia URI article (https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax)
hierarchical part
┌───────────────────┴─────────────────────┐
authority path
┌───────────────┴───────────────┐┌───┴────┐
abc://username:password#example.com:123/path/data?key=value#fragid1
└┬┘ └───────┬───────┘ └────┬────┘ └┬┘ └───┬───┘ └──┬──┘
scheme user information host port query fragment
urn:example:mammal:monotreme:echidna
└┬┘ └──────────────┬───────────────┘
scheme path
Here you can see even what we call protocol, HTTP Protocol, HTTPS protocol is referred to as scheme in the URI. When we generally talk about HTTP protocol we just talk about a http:// url for the server. But the protocol itself is much more than that
A protocol is a set of rules and guidelines for communicating data. Rules are defined for each step and process during communication between two or more computers. Networks have to follow these rules to successfully transmit data.
There are below recommended WIKIs that you should go through
https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
https://en.wikipedia.org/wiki/Application_layer
About the file URI scheme that you mentioned
https://www.rfc-editor.org/rfc/rfc8089#ref-POSIX
The scheme defines how to write the url. Which is nothing but
file://host/path
Now the set of operations possible on this different than what you have on a file, you can use move, rename, change permission on a local file but you can't do those using a file uri. Now this is not a protocol as such it is possible operations using instance of file uri
Unfortunately I feel both the question and this answer are just a bit broad or vague

Basically protocol is set of rules that governs the communication between machines on a network reference where as scheme provides whether the information(resource) is accessed/transferred over network or not.
ex.
file://tmp/test
news://news.server.example/example.group.this

Related

Must a RESTful service always be based on HTTP?

Lots of websites (e.g. twitter, stackexchange) provide RESTful OPEN APIs based on the HTTP protocol. Can I design a RESTful service based on some other protocol (such as raw TCP)?
The short answer is that a RESTful service does generally imply HTTP, but it's not strictly necessary. The wikipedia entry includes a section on implementations outside the web, though it's pretty brief and really only talks about Common Management Information Protocol (CMIP).
Realistically, to most developers, RESTful services operate over HTTP.
You could surely take inspiration from RESTful protocols on the web and build your own similar protocol over raw TCP, but you may well finding yourself implementing it in the language of HTTP. At that point you may want to ask yourself why you didn't just use HTTP in the first place.
If you look at Roy Fielding's PhD thesis, you'll see that REST is defined in chapter 5, while it's applied to HTTP in chapter 6.
"Representational state transfer" is indeed quite abstract. There's no reason you couldn't apply it to your own adhoc protocol. The aim is to make it stateless, to have safe read methods (that are cacheable), and if possible idempotent write methods.
If you adhere to the true tenants of the architecture where every operation is ignorant of historical operations you could probably drum up something a bit different. Currently the easy Put, get, post and delete operations lend well to http based service calls.

How to author an Internet protocol?

We're all familiar with popular protocols like IMAP and POP, used for email messaging.
I have a plan for a new protocol, but I'm not sure to go about implementing it.
Is the protocol a collection of C source code, for example, that accepts and sends data through ports? Or is a protocol just a thorough description of how data should be sent, which clients then implement?
I'm lost where to start here, and I'm not very familiar with how the protocol system works.
Edit:
Also, if I write a protocol and it isn't made official by the standards group, can people/clients still implement it?
The official way is to write an RFC - a Request for Comments. People will respond to that (that's why it's an RFC) and probably try to implement your protocol.
As soon as two independent implementations exist that completely support the protocol, it's a new standard.
Of course, people aren't going to implement a new protocol for someone just for fun. So you should first find a group who is interested in listening to you. Maybe there already is a protocol which does what you want (or can easily be extended).
But you probably don't want to invent a new standard. Standards are a lot of work and - for some - overrated.
So you should describe how it works and create a library that can read and write the protocol, so developers can use it even though it's not an official standard.
As you are interested in the Replace Email section of the Paul Graham article you linked, then IMHO you will need to both develop a protocol definition, and also provide an example implementation. The protocol definition does not need to be published as an internet protocol standard in order to be useful.
You will need an implementation to so that you can test, refine and improve the ideas. It is extremely unlikely the protocol will be right at the first attempt, and you'll need something to support the initial users.
You don't need a protocol definition to implement an improved email, but you will need one if you expect others to work with you and adopt it, though it very much depends on your 'business model'. I strongly recommend you have a protocol definition from the start, even if only to keep yourself sane when you try to produce the second implementation.
I recommend having a look at some examples of sneaky approaches to protocols and implementation. My favourite is described in the Viewpoints Research 2008 Progress report on a super-compact approach to TCP/IP.
They did not follow the traditional approach to developing the implementation of a protocol (the protocol stack). Instead they wrote code which parsed the human-readable TCP/IP protocol specification, and generated the code of a TCP/IP stack from that protocol document. The usual TCP/IP stack is about 40,000 lines of code, or more. Their program, which read the protocol specification, and generated the code for a TCP/IP stack 'automatically' was only 160 lines of code. They use extremly powerful programming tools.
If you had an approach like that, you could keep the protocol implementation synchronised with the specification, and potentially make it straightforward for others to adopt your protocol.
HTH
You are confusing a protocol standard with the implementation.
These 2 are unrelated.
A protocol is described in a high level but has enough information for someone to undestand how it should be implemented.
The idea is that someone reading the document can understand how/what to implement in any language of preference
To give an example: SIP protocol in the RFC describes the various flows and also has the various messages and how they are supposed to b processed i.e. the semantics well defined.
You can implement a SIP UA or Server in C++ or Java. This is irrelevant to the SIP protocol
For this you don't need to provide any source code (you could though if you think it helps clarify some obscurity of the description).
The most important part is that your protocol is actually reviewed by stakeholders i.e. people that expect it to solve their problems.
This part is the most important not only because it could solve problems in your protocol but because they can actually verify that the concept is solid i.e. can be technically implemented
The only case that one could specify something concrete or imply something is if for example the protocol described something demanding some specific constraints e.g. hard-real time constraint which could serve as "hint" on which implementation/languages to avoid
Also, if I write a protocol and it isn't made official by the
standards group, can people/clients still implement it?
Strange question.What do you mean?How will someone know your protocol exists?
If it is official he can get it from the standards group to implement it.
Otherwise it is obvious that you have some sort of "proprietary" protocol (which is not uncommon e.g. a company can have an internal protocol for its own software) and people have to get the spec from you.

Why do we use HTTP and not remote invocations?

Hey,
first of all this is a conceptional question and I do not know if StackOverflow is the appropriate place - so my apologies if I am wrong.
Nowadays the web is not only used for passing raw informations. Many and especially complex web applications are in use. These web application seem to be so complex that it seems irrational to use the HTTP protocol, which is based on so simple data exchange, plus it is stateless.
Would it not be more convincing to use remote invocations for this web applications? The big advantage to my mind is a unified GUI by using HTML. But there are applications, which have no need for a graphical interfaces and then it comes to a point where the HTTP protocol is really cumbersome.
Short answer: HTTP is allowed through firewalls where other protocols would be blocked.
A short partial answer is: first, for historical reasons - HTTP was used since the dawn of the web as protocol for requesting documents, and has since been used for some different purposes. One reason to keep using it is that it is generally served on port 80 which you can be sure won't be blocked by firewalls between your client and the server. The statelessness of the protocol may not always be what you want, but it has at least the advantage of protecting the server side from very trivial overloading problems.
OS independence
firewall passing
the web server is already a well understood and mostly "solved" problem in terms of load balancing, server fall over, etc.
don't have to reinvent the wheel
Other protocols are being used more and more now, including remote invocations and (the one I am particularly familiar with) WCF (which allows binary TCP/IP data transfer).
This allows data to travel faster for applications which require more bandwidth. For example an n-tier application may use WCF binary transfer between application and presentation tiers. Also public web services allow multiple protocols, including binary.
For data transfer protocols, firewalls should be configured (ie. expose a port specifically for your application), not worked around, I would not recommend using a protocol because firewalls do not block it.
The protocol used really depends on who will consume it and what control you have over the consumption - eg external third parties may need a plain-text version with a commonly agrreed data interface. On the other hand, two tiers in a single web application may be able to utilise binary data transfer for performance and security.

How are network protocols implemented?

I know that a protocol is a set of rules that governs communication between two computers on a network, but how are thoses rules implemented for the computer? Is a protocol basically a piece of code or, in other words, software?
Protocols are generally built upon each other. At the risk of sounding pedantic, here's an example of a protocol and where/how it's implemented:
Application Protocol - the way a particular application talks to another instance of itself or a corresponding server; this is implemented in the application code or a shared library
TCP (or UDP, or another layer) - the way that information is sent at the binary level and split up into usable chunks, then reassembled at the destination; this is usually implemented as part of the operating system, but it is still software code
IP - the way that information (having already been split or truncated by something like TCP or UDP) makes its way from one place to another by routing over one or more "hops"; this is always software code, but is sometimes implemented in the OS and sometimes implemented in the network device (your LAN card, for example)
base-T (ethernet), token ring, etc - Here we are physically getting into how the hardware talks to one another; ie, which wire corresponds to a particular type of signal; this is always implemented in hardware
electricity /photons - the laws that govern (or at least define) how electrons (or photons) flow over a conductive material or over the air; this is usually implemented in hardware ;)
In a sense, these are all "protocols" (a set of rules or expected behaviors that allow communication to take place), and they're built on one another.
Bear in mind that (aside from electricity) this is not an exhaustive list of the sort of protocols that exist at any of these layers!
Edit Thanks to dmckee for pointing out that electricity isn't the only physical process used in networking ;)
Networking protocols are not pieces of code or software, they are only a set of rules. When software uses a specific networking protocol, then the software is known as an implementation. There can be many different software implementations of the same protocol (i.e. Windows and UNIX have different TCP/IP implementations). It is possible to understand networking protocols without any knowledge of programming.
EDIT: How are they implemented? Here's a paper on taking an abstract specification of a protocol and implementing it into C. You'll see that less-strict protocols leave out certain details that programmers have to guess on, which makes some implementations incompatible with others.
A network protocol is basically like a spoken language. It is implemented by code that sends and receives specially prepared messages over the network/internet, much like the vocal chords you need to speak (the network and hardware) and a brain to actually understand what someone said (the protocol stack/software).
Sometimes protocols are implemented directly on the hardware [for speed reasons] (like the Ethernet protocol for LANs) - but it is always software/code required to do something useful with a protocol.
This might be interesting for you:
The OSI Model
Protocol (Computing)
Software implements the rules defined in the protocol, some protocols are formal defined and some informal.
a protocol is a set of rules governing the communication between two entities.
in the computer/programming context, a protocol is a set of rules governing the communication between two programs.
in the computer network context, a protocol is a set of rules governing the communication between two programs, well, over network.
in computers, in the end everything is embodied in code...
Protocols are basically set of rules. The way to implement them is to first of all make a state machine diagram as it completely tells that what is going to be the current state and how the state is going to change on the basis of input and what output actions are going to be performed.
Your answer is a very short one:
BY READING THE RFC.
The main networking problem is to share data between computers. All the networking protocols try to solve is a little part of that major problem. Some of them (the protocols) are implemented as software, some others as hardware. In short, protocols like algorithms, can be implemented it in many programming languages.
Back to the TCP, it is implemented by the operating system.

Why HTTP protocol is designed in plain text way?

Yesterday, I have a discussion with my colleagues about HTTP. It is asked why HTTP is designed in plain text way. Surely, it can be designed in binary way just like TCP protocol, using flags to represents different kinds of method(POST, GET) and variables (HTTP headers). So, why HTTP is designed in such way? Is there any technical or historical reasons?
A reason that's both technical and historical is that text protocols are almost always preferred in the Unix world.
Well, this is not really a reason but a pattern. The rationale behind this is that text protocols allows you to see what's going on on the network by just dumping everything that goes through. You don't need a specialized analyzer as you need for TCP/IP. This makes it easier to debug and easier to maintain.
Not only HTTP, but many protocols are text based (e.g., FTP, POP3, SMTP, IMAP).
You might want to take a look at The Art of Unix Programming for a much more detailed explanation of this Unix thing.
With HTTP, the content of a request is almost always orders of magnitude larger than the protocol overhead. Converting the protocol into a binary one would save very little bandwidth, and the easy debugability that a text protocol offers easily trumps the minor bandwidth savings of a binary protocol.
Many Internet application protocols use more or less plain text for the protocol (see FTP, POP, SMTP, etc.).
It makes interoperability and troubleshooting much easier.
HTTP stands for "Hypertext Transfer Protocol".
It was initially devised as a way to serve text documents, hence the text based protocol.
What we do with HTTP now is far beyond its original intent.
As with RFC 2616 section 3.7.1 for HTTP 1.1, the key identifier to a line of command or header is the text line-break CRLF; text-based application protocols makes it easier to carry out a conversation (for troubleshooting) purely with a Telnet client. It also makes it easier to program with ReadLine() calls and matching text strings.
The CRLF parameter break also gives near-unlimited abitrary header extensions unlike a fixed-size TCP or IP headers where one hard-codes by bit offsets.
So it's easier to "read" the traffic or create a client or server?
You can debate whether it actually makes it easier, but surely that was the intent.
In the case of http ,some people work on a "binary" version of it, they called it Embedded Binary HTTP (EBHTTP)
https://datatracker.ietf.org/doc/html/draft-tolle-core-ebhttp-00
Historically, it all starts from RFC822 (STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES), whose latest version is RFC5322 (Internet Message Format). SMTP (RFC 821) was one of the most popular protocol based on RFC822. And, HTTP was born out of SMTP (your mail protocol).
I like the:
...preferred in the Unix world.
reason, but it doesn't go into any explanation for why.
In order to understand why you need to place yourself into the shoes of a designer that wants to make a usable product.
A) You can document the shit out of meaningless gibberish (binary).
B) Develop or hope others develop tools that portray your meaningless gibberish in a meaningful way.
or
A) You can document the shit out of meaningful text that takes advantage of language as a tool for a self-documenting protocol.
B) There is no immediate need for additional tools, and additional tools will be much easier to write and debug.
It creates staged delivery and creates something that is easier to comprehend & recall when doing future development. It also creates a situation where a higher level abstraction is no longer necessary.
Imagine a world where setting a header value isn't as simple as dictionary/Map somewhere in your framework. When running into errors you'd have to constantly question whether or not your framework is correct or not, because you couldn't easily see it's doing the right thing without additional tools. That would be the world of HTTP if each framework needed to invent/implement it's own higher level abstraction (browsers come to mind).
Many protocol designer's want efficiency, this design focuses on usability, which is paramount in the software development industry. Unusable tools that are prematurely optimized create an unnecessary burden for software developers, and this burden manifests across the board.
Now,HTTP/2 based Binary,it is much less error-prone.
https://http2.github.io/faq/#why-is-http2-binary

Resources