How does url rewrite works?

How does url rewrite works? - http

How does web server implements url rewrite mechanism and changes the address bar of browsers?
I'm not asking specific information to configure apache, nginx, lighthttpd or other!
I would like to know what kind of information is sent to clients when servers want rewrite url?

There are two types of behaviour.
One is rewrite, the other is redirect.
Rewrite
The server performs the substitution for itself, making a URL like http://example.org/my/beatuful/page be understood as http://example.org/index.php?page=my-beautiful-page
With rewrite, the client does not see anything and redirection is internal only. No URL changes in the browser, just the server understands it differently.
Redirect
The server detects that the address is not wanted by the server. http://example.org/page1 has moved to http://example.org/page2, so it tells the browser with an HTTP 3xx code what the new page is. The client then asks for this page instead. Therefore the address in the browser changes!
Process
The process remains the same and is well described by this diagram:
Remark Every rewrite/redirect triggers a new call to the rewrite rules (with exceptions IIRC)
RewriteCond %{REDIRECT_URL} !^$
RewriteRule .* - [L]
can become useful to stop loops. (Since it makes no rewrite when it has happened once already).

Are you talking about server-side rewrites (like Apache mod-rewrite)? For those, the address bar does not generally change (unless a redirection is performed).
Or are you talking about redirections? These are done by having the server respond with an HTTP code (301, 302 or 307) and the location in the HTTP header.

There are two forms of "URL rewrite": those done purely within the server and those that are redirections.
If it's purely within the server, it's an internal matter and only matters with respect to the dispatch mechanism implemented in the server. In Apache HTTPD, mod_rewrite can do this, for example.
If it's a redirection, a status code implying a redirection is sent in the response, along with a Location header indicating to which URL the browser should be redirected (this should be an absolute URL). mod_rewrite can also do this, with the [R] flag.
The status code is usually 302 (found), but it could be configured for other codes (e.g. 301 or 307).
Another quite common use (often unnoticed because it's usually on by default in Apache HTTPD) is the redirection to the the URL with a trailing slash on a directory. This is implemented by mod_dir:
A "trailing slash" redirect is issued
when the server receives a request for
a URL http://servername/foo/dirname
where dirname is a directory.
Directories require a trailing slash,
so mod_dir issues a redirect to
http://servername/foo/dirname/.

Jeff Atwood had a great post about this: http://www.codinghorror.com/blog/2007/02/url-rewriting-to-prevent-duplicate-urls.html
How web server implements url rewrite mechanism and changes the address bar of browsers?
URL rewriting and forwarding are two completely different things. A server has no control over your browser so it can't change the URL of your browser, but it can ask your browser to go to a different URL. When your browser gets a response from a server it's entirely up to your browser to determine what to do with that response: it can follow the redirect, ignore it or be really mean and spam the server until the server gives up. There is no "mechanism" that the server uses to change the address, it's simply a protocol (HTTP 1.1) that the server abides by when a particular resource has been moved to a different location, thus the 3xx responses.

URL rewriting can transform URLs purely on the server-side. This allows web application developers the ability to make web resources accessible from multiple URLs.
For example, the user might request http://www.example.com/product/123 but thanks to rewriting is actually served a resource from http://www.example.com/product?id=123. Note that, there is no need for the address displayed in the browser to change.
The address can be changed if so desired. For this, a similar mapping as above happens on the server, but rather than render the resource back to the client, the server sends a redirect (301 or 302 HTTP code) back to the client for the rewritten URL.
For the example above this might look like:
Client request
GET /product/123 HTTP/1.1
Host: www.example.com
Server response
HTTP/1.1 302 Found
Location: http://www.example.com/product?id=123
At this point, the browser will issue a new GET request for the URL in the Location header.

Related

nginx redirect to hashbang URL without URL encoding

I have a rewrite rule that looks like the following
rewrite ^/([a-zA-Z0-9_]+)$ /mysite/#!/$1/login;
The idea is that a shortcode like
/foo
gets redirected to
/mysite/#!/foo/login
However nginx is redirecting to:
/mysite/%23!/foo/login
How do I prevent the URL encoding from happening in the rewrite?

I can reproduce this issue by using a reverse proxy.
Nginx is actually doing the right thing, as # is a reserved character for URIs and identifies the start of the fragment identifier.
The fragment identifier is for the browser's use only and is not usually received by the server in the requested URL. I am not sure how your Tomcat server is receiving requests containing a naked # in the first place.

ASP.NET Core 2.0 unauthorized redirect using path only

I have an application which is accessed via HTTPS, but is "reverse proxied" to the server using plain HTTP. It is set up on AWS as follows:
[BROWSER] --(https)--> [ELB] --(http)--> [SERVER]
Everything works fine except when a page is being accessed by an unauthenticated user, the server responds with a HTTP 302 redirect using the whole protocol://server/path string. Like so:
Location: http://my.server.com/Account/Login?ReturnUrl=%2F
The problem is, it specifies HTTP as the protocol (presumably because it is being connected to by the ELB using HTTP. So the browser redirects the request using HTTP and now an error occurs. Is there a way to customize the redirect such that it redirects using just the path, so irregardless of protocol or hostname, it is redirected properly? Like so:
Location: /Account/Login?ReturnUrl=%2F
If this is not advisable, what can be done?
(note: I've checked other solutions posted on SO. All I've seen so far involve customizing the Path, not removing the protocol://hostname)

Web. Some nuances of difference between forward and redirect

I'm starting to learn web-programming. I've read about the difference between forward and redirect. But two questions not fully understood still:
In which case does the process access to a server-side and in which case without server-side?
When does URL change and when doesn't change? Does URL changes always when redirecting? Does URL changes never when forwarding?
I would be very grateful for the clear answers and explanations! Thanks in advance!

They are not hard and fast terms.
A redirect usually means an HTTP redirect, which is an HTTP response that instructs the client to make a new HTTP request to a different URI.
An internal redirect is a common description of a redirect that is handled internally by the webserver / web application / etc and doesn't send the browser to a different URI.
Forward is not a particularly common term, but when I've encountered it it usually means a form of internal redirect.

Forward happens on serverside, server forwards the same request to another resource. whereas redirect happens on the browser side, server sends http status code 302 to browser so browser makes new request.
Redirect requires one more round trip from browser to server.
One more difference is redirect reflects in browser address bar forward doesnt.

Is there such thing as a HTTP URL re-write without 301 or 302 redirect?

Is there such a thing?
A way it might be used:
Many locations have forms that post to http://www.example.com/wally/app/receiver.aspx
Managements decides they want a cleaner URL and there is no reason to pretend you are using aspx (you didn't really think I was using aspx for that did you?)
They say it should be http://example.com/receiver
Easy enough! Just put a 301 redirect. No need to update all those forms that exist all over..,, but wait.. You can't do that for POST.
Perhaps you can receive and handle the request and then re-write the URL without causing a subsequent request? Perhaps this will not strip the www (cross domain), but can it shorten the pathname like that without a separate request?
Even in GET requests, this would indeed be a performance boost if one could re-write the URL and send the response body at the same. Can this be done?

You cannot send content to user and do 301/302 etc redirect at the same time -- browser interprets the HTTP Response code and acts accordingly to the code received. If 301/302 -- it will do redirect, if 200 -- will display it to the customer.
Is there such thing as a HTTP URL re-write without 301 or 302 redirect?
Yes -- it's called rewrite (internal redirect). For example -- customer requests http://example.com/receiver. You rewrite URL to point to /wally/app/receiver.aspx (e.g. RewriteRule ^receiver$ /wally/app/receiver.aspx [L] -- that's if you have an Apache, which you most likely not (considering receiver.aspx)). This will do internal redirect when URL remains unchanged in browser address bar (works fine with POST and GET methods).

Well, I guess rewriting url suggested by LazyOne is not the answer to the question as he himself states that
This will do internal redirect when URL remains unchanged in browser
address bar
(http://www.example.com/wally/app/receiver.aspx). Still, the question asks for
(...) it should be http://example.com/receiver
I think the solution is to redirect old url to the new one using 307 status code introduced in RFC 2616. User agents which handle version 1.1 of HTTP protocol (I guess all popular browsers for some time now) should make the new request using the same http method (POST in this case) as in the original request.

HTTP redirect: 301 (permanent) vs. 302 (temporary)

Is the client supposed to behave differently? How?

Status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on.
Status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url.

When a search engine spider finds 301 status code in the response header of a webpage, it understands that this webpage no longer exists, it searches for location header in response pick the new URL and replace the indexed URL with the new one and also transfer pagerank.
So search engine refreshes all indexed URL that no longer exist (301 found) with the new URL, this will retain your old webpage traffic, pagerank and divert it to the new one (you will not lose you traffic of old webpage).
Browser: if a browser finds 301 status code then it caches the mapping of the old URL with the new URL, the client/browser will not attempt to request the original location but use the new location from now on unless the cache is cleared.
When a search engine spider finds 302 status for a webpage, it will only redirect temporarily to the new location and crawl both of the pages. The old webpage URL still exists in the search engine database and it always attempts to request the old location and crawl it. The client/browser will still attempt to request the original location.
Read more about how to implement it in asp.net c# and what is the impact on search engines -
http://www.dotnetbull.com/2013/08/301-permanent-vs-302-temporary-status-code-aspnet-csharp-Implementation.html

Mostly 301 vs 302 is important for indexing in search engines, as their crawlers take this into account and transfer PageRank when using 301.
See Peter Lee's answer for more details.

301 redirects are cached indefinitely (at least by some browsers).
This means, if you set up a 301, visit that page, you not only get redirected, but that redirection gets cached.
When you visit that page again, your Browser* doesn't even bother to request that URL, it just goes to the cached redirection target.
The only way to undo a 301 for a visitor with that redirection in Cache, is re-redirecting back to the original URL**. In that case, the Browser will notice the loop, and finally really request the entered URL.
Obviously, that's not an option if you decided to 301 to facebook or any other resource you're not fully under control.
Unfortunately, many Hosting Providers offer a feature in their Admin Interface simply called "Redirection", which does a 301 redirect. If you're using this to temporarily redirect your domain to facebook as a coming soon page, you're basically screwed.
*at least Chrome and Firefox, according to How long do browsers cache HTTP 301s?. Just tried it with Chrome 45.
Edit: Safari 7.0.6 on Mac also caches, a browser restart didn't help (Link says that on Safari 5 on Windows it does help.)
**I tried javascript window.location = '', because it would be the solution which could be applied in most cases - it doesn't work. It results in an undetected infinite Loop. However, php header('Location: new.url') does break the loop
Bottom Line: only use 301s if you're absolutely sure you're never going to use that URL again. Usually never on the root dir (example.com/)

301 is that the requested resource has been assigned a new permanent URI and any future references to this resource should be done using one of the returned URIs.
302 is that the requested resource resides temporarily under a different URI.
Since the redirection may be altered on occasion, the client should continue to use the Request-URI for future requests.
This response is only cachable if indicated by a Cache-Control or Expires header field.

The main issue with 301 is browser will cache the redirection even if you disabled the redirection from the server level.
It's always better to use 302 if you are enabling the redirection for a short maintenance window.

There have already been plenty of good answers, but none tells pitfalls or when to use one over the other from a plain browsers perspective.
Use 302 over a 301 HTTP Status whenever you need to keep dynamic server side control about the final URL. Using a 301 http status will make your browser always load the final URL from its own cache, without fetching anything of any previous URL (totally skipping the first time request). That may have unpredictable results in case you need to keep server side control about the redirected URL.
As an example, in case you need to do URL redirection on behalf of a users ip-geo-position (geo-ip-switching) use 302. If you would use a 301 in such a scenario, the final redirected page will always come directly from the browsers cache, giving incorrect/false content to the user.

301 is a permanent redirect, and 302 is a temporary redirect.
The browser is allowed to cache the 301 but 302 means it has to hit our system every time. assuming that we want to minimize the load on our system, 301 is the right decision. Imagine creating URL shortening service for a big company, we try to get as less hit to our servers by the clients
But if the user wants to edit their short URLs, it might take more time than usual for the browser to pick up the change because the browser has the old one cached. Also, if you want to offer users metrics on how often their URL is getting hit, 301 would mean we would not necessarily see every hit from the client. So if you want analytics as a feature later on and a smooth user experience for editing URLs, 302 is a better choice.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex