**reading url from browser using qt** - qt

i'm trying to develop an application that listens to a specific port (for example 9999) in localhost. how could i retrieve the URL when user types <127.0.0.1:9999/somedir> in his web browser?

To retrieve the URL you would have to implement some pieces of the HTTP protocol.
This is the official documentation of the HTTP protocol.
If you just want the path of the entered URL, you can parse just some of the request data. The following is an example of a HTTP request made by a browser:
GET /index.html HTTP/1.1
Host: www.example.com
The first word at the first line is the command to be performed. Next, the path at the server, and than the protocol and its version. The next line (at this example) specify the host. This is used for example to a server to provide many web sites. This feature is called virtual host.
It is important to note that each line of the HTTP request and the response are separated by the \r\n characters.
Take a look at the HTTP Protocol on Wikipedia. It is a good start to implement some very basic functionality.

Related

How to act as a middleman server to add HTTP headers between client and remote server?

I have a server which acts as a middle man between an HTTP client that I don't control and a remote file hosting server I don't control. I want to expose a URL through which the client can download a chunk (specified by HTTP range headers my server provides) of a file on the remote server.
There are two important constraints here: I'd like to facilitate this partial download without having the response flow back through my server (response goes straight to client) and without writing a custom client. How can I accomplish this?
One option I tried was having my endpoint send a redirect response with the range headers set on the response, but unfortunately those do not get forwarded onto the subsequent request from the client and as a result the entire file is downloaded. Are there any other hacky tricks / network wizardry I can employ to achieve this end given the constraints?
i am also thinking about this since 5 days it's like the server give you file only when you give required header from your side and without header it will deny your request and middleman if it does get request with required header then file will be accessable through your middleman to client and you are trying to client get file from server not from your custom server which is trying to pass headers to server for your client

URLs and HTTP protocol

I am currently learning about how to transfer messages via URL to a host server. What I have learned so far is how a URL is composed: http://example.com:80/latest/example.jpg?d=400x400 gives me the image example.jpg in the dimension requested from the host via port 80 (which can be left out as HTTP always uses port 80). The request message for this would look like this:
GET latest/example.jpg?d=400x400 HTTP/1.1. The response message would look like this: HTTP/1.1 200 OK.
So it is clear to me how to GET some resource from a Host. But what's with the other HTTP methods like PUT, POST or DELETE? I don't understand where in the URL the HTTP method is carried for the host to read. How do I tell the host to PUT instead of GET?
There seems to be a small misconception about urls and the corresponding requests.
The url http://example.com:80/latest/example.jpg?d=400x400 is composed of 5 pieces:
The used protocol (in your case http)
The use fqdn - fully qualified domain name - (in your case example.com)
The port on the fqdn - in your case 80 - which is in your case unnecessary because your browser will default to 80 for http
your requested resource, in your case /latest/example.jpg
your requested query string parameters, indicated by ?, in your case the parameter d with the value 400x400
Note that the request message only looks like you outlined, because your browser defaults to the GET method of HTTP. As you correctly stated, there are various HTTP methods, such as PUT, POST, PATCH, DELETE, etc.
The HTTP-Method is stated in the HTTP Header, so it's up to the request which HTTP-Method is invoked.
For the "well-known" internet surfing, your typed url will always result in a GET request. For the other HTTP methods, it's up to the application (e.g. your Website or your normal software that uses HTTP requests) to enable the use. As an example, html enables the use of <form> tags where you can specify the http method, e.g. you can say to use POST.
To sum it up: Your url does not specify the HTTP-Methods.
Browsers default to GET, but in the end it's up to your application (and thus the logic behind it) which HTTP-method is used.

How to check the communication options of an entire web server using the HTTP OPTIONS method?

According to the documentation of the HTTP OPTIONS method, one can check the communication options of an entire web server, assuming the server supports such a check. My understanding is that one needs to make an HTTP request to the server to be checked with the first line of the request being OPTIONS * HTTP/1.1. How can one make such a request with a common HTTP client? I wasn't able do it with the Postman client or the Requests HTTP client library for Python. Specifically, specifying the asterisk * along with the server's location, http://<host>/*, for example, didn't work.

What is the correct way to render absolute URLs behind a reverse proxy?

I have a web application running on a server (let's say on localhost:8000) behind a reverse proxy on that same server (on myserver.example:80). Because of the way the reverse proxy works, the application sees an incoming request targeted at localhost:8000 and the framework I'm using therefore tries to generate absolute URLs that look like localhost:8000/some/ressource instead of myserver.example/some/ressource.
What would be "the correct way" of generating an absolute URL (namely, determining what hostname to use) from behind a proxy server like that? The specific proxy server, framework and language don't matter, I mean this more in an HTTP sense.
From my initial research:
RFC7230 explicitly says that proxies MUST change the Host header when passing the request along to make it look like the request came from them, so it would look like using Host to determine what hostname to use for the URL, yet in most places where I have looked, the general advice seems to be to configure your reverse proxy to not change the Host header (counter to the spec) when passing the request along.
RFC7230 also says that "request URI reconstruction" should use the following fields in order to find what "authority component" to use, though that seems to also only apply from the point-of-view of the agent that emitted that request, such as the proxy:
Fixed URI authority component from the server or outbound gateway config
The authority component from the request's firsr line if it's a complete URI instead of a path
The Host header if it's present and not empty
The listening address or hostname, alongside with the incoming port number if it's not the default one for the protocol
HTTP 1.0 didn't have a Host header at all, and that header was added for routing purposes, not for URL authority resolution.
There are headers that are made specifically to let proxies to send the old value of Host after routing, such as Via, Forwarded and the unofficial X-Forwarded-Host, which some servers and frameworks will check, but not all, and it's unclear which one should even take priority given how there's 3 of them.
EDIT: I also don't know whether HTTPS would work differently in that regard, given that the headers are part of the encrypted payload and routing has to be performed another way because of this.
In general I find it’s best to set the real host and port explicitly in the application rather than try to guess these from the incoming request.
So for example Jira allows you to set the Base URL through which Jira will be accessed (which may be different to the one that it is actually run as). This means you can have Jira running on port 8080 and have Apache or Nginx in front of it (on the same or even a different server) on port 80 and 443.

Do resources in a URL path have their own IP address?

So, a DNS server recognizes https://www.google.com as 173.194.34.5
What does, say, https://www.google.com/images/srpr/logo11w.png look like to a server? Or are URL strings machine readable?
Good question!
When you access a url, first a DNS lookup will be done on the host part (www.google.com), after that the browser will look at the protocol and connect using that (https in this case).
After connecting, the browser will tell the server:
"Hi! I'm trying to connect to www.google.com and I would like the resource /images/srpr/logo11w.png). This looks like this on the protocol:
GET /images/srpr/logo11w.png HTTP/1.1
Host: www.google.com
The Host part is a HTTP header. There are usually more headers.
So the short answer is:
The server will get access to both the hostname, and the full path the browser tried to access.
https://www.google.com/images/srpr/logo11w.png
consists of several parts
protocol (https)
address of the server (www.google.com, that gets translated to IP)
path to the resource (/images/srpr/logo11w.png, in this example it seems like it would be an image in a directory srpr, which is in a directory images in the root of the website)
The server processes path to the resource the user requested (via GET method) based on various rules and returns a response.

Resources