When doing a redirect, Akamai is redirecting to orignial server, Why? - http

Please bear with me because I am not very familiar how akamai works.
I am having issues with redirection. We are redirecting links domain.com/a/b to domain.com/c/d. However, akamai does not respect domain.com and the 301 goes to the original server. To illustrate:
http://akamai.ex.example.com/a/b
Redirects to
http://original.ex.example.com/b/c
When it is supposed to redirect to
http://akamai.ex.example.com/a/b
What is going on and how can this be resolved?

I can't tell why, but I can tell you how to work around it:
Path-absolute (or host-relative?) redirect URLs of the form "/b/c" completely bypass the issue, and will make the browser go to the same host (and port, and protocol.)
(There's a typo in the RFC, relative URIs are allowed in the Location header, all browers support it anyway: http://trac.tools.ietf.org/wg/httpbis/trac/ticket/185)

Related

Browser caches HTTP site as HTTPS resulting in Cert Error when navigating from site to an HTTPS destination

I have a company website that's hosted as https://foo.bar.com.
However, it was incorrectly conveyed to a lot of users that the URL would be www.foo.bar.com. Until this can be rectified, we are putting through an interim solution by setting up a proxy site www.foo.bar.com that will redirect any users coming to it to https://foo.bar.com.
This works... but only the first time the user navigates to the page. The next time I try to access www.foo.bar.com, due to caching, the browser takes me to https://www.foo.bar.com. We don't have a certificate set up for https://www.foo.bar.com and as a result are given a NET::ERR_CERT_COMMON_NAME_INVALID error.
Is there a way to work around this without needing a certificate?
To test, I've even tried returning a webpage when the I navigate to www.foo.bar.com with a link that navigates to https://foo.bar.com. However, the same issue happens even in this case. I'm guessing HSTS is at play here but not sure how to go about it.
I'd appreciate any insight into this matter, thank you in advance.
I belive the only solution to your problem is to obtain a valid certificate for www.foo.bar.com. Due to the certificate error the browsers will not attempt to communicate with your server so there's no way for you to issue a redirect away from wrong domain to the correct domain.
Why only the second time?
You mention HSTS so I am assuming https://foo.bar.com is sending a Strict-Transport-Security header as part of it's response. This header likely is being sent with the includeSubDomains option which instructs the browser to not only enforce HTTPS on foo.bar.com but also all subdomains of that main domain. As a result, when trying to request www.foo.bar.com the browser matches that HSTS rule and automatically re-writes it to use HTTPS.
Once this HSTS rule has been set in the browser it cannot be removed except by expiring, either by exceeing the original max-age time or by issuing another Strict-transport-security header with max-age=0 on https://foo.bar.com

Serving 404 directly

So I have an Nginx server set up which is supposed to redirect all http to https (and non-www to www) using 4 server blocks.
The issue is that any 404 or non existent http URL first get a 301 redirect to what could have been an https version if it hypothetically existed (hence creating an extra URL and redirect).
See example:
1) http://example.com/thisurldoesntexit
301 Redirect
2) https://example.com/thisurldoesntexit
404
3) https://example.com/notfound
Is there a way to redirect user directly to a https 404 (URL 3)?
First of all, as already been pointed out, doing a 301 redirect from a non-existent page to a single /notfound moniker, is a really bad practice, and is likely against the RFCs.
What if the user simply mistyped a single character of a long URL? Modern browsers make it non-trivial to go back to what has been typed in order to correct it. The user would have to decide whether your site is worth a retyping from scratch, or whether your competitor might have thought of a better experience.
What if the user simply followed a broken link, which is broken in a very obvious way, and could be easily fixed? E.g., http://www.example.org/www.example.com/page, where an absolute URL was mistyped by the creator to be a relative one, or maybe a URI like /page.html., with an extra dot in the end. Likewise, you'll be totally confusing the user with what's going on, and offering a terrible user experience, where if left alone, the URL could easily have been corrected promptly.
But, more importantly, what real problem are you actually trying to solve?!
For better or worse, it's a pretty common practice to indiscriminately redirect from http to https scheme, without an account of whether a given page may or may not exist. In fact, if you employ HSTS, then content served over http effectively becomes meaningless; the browser with a policy would never even be requesting anything over http from there on out.
Undoubtedly, in order to know whether or not a given page exists, you must consult with the backend. As such, you might as well do the redirect from http to https from within your backend; but it'll likely tie up your valuable server resources for little to no extra benefit.
Moreover, the presence or absence of the page may be dictated by the contents of the cookies. As such, if you require that your backend must discern whether a page does or does not exist for an http request, then you'll effectively be leaking private information that was meant to be protected by https in the first place. (In turn, if your site has no such private information, then maybe you shouldn't be using https in the first place.)
So, overall, the whole approach is just a REALLY, REALLY bad idea!
Consider instead:
Do NOT do a 301 redirect from all non-existent pages to a single /notfound page. Very bad practice, very bad UX.
It is totally OK to do an indiscriminate redirect from http to https, without accounting for whether or not the page exists. In fact, it's not only okay, but it's the way God intended, because an adversary should not be capable of discerning whether or not a given page exists for an https-based site, so, if you do find and implement a solution for your "problem", then you'll effectively create a security vulnerability and a data leak.
Use https://www.drupal.org/project/fast_404 module for serving 404 pages directly without much overload.
I'd suggest redirecting to a 404 page is a poor choice, and you should instead serve the 404 on the incorrect URL.
My reasons for stating this are:
By redirecting away from the page, you are issuing headers that implicitly say "The content does not exist on this URL, but it does over here". I'm not sure how the various search engines would react to being redirected to a 404
I can speak from my own experience as a user when I say that having the URL change on me when I've mis-typed by a single character can be very frustrating. I then need to spend the time to type out the entire URL again.
You can avoid having logic in your .htaccess file or whatever to judge a page as a 404. This will greatly simplify your initial logic (which by-the-by gets computed on every single page load) - and will remove far more redirects than just the odd one of http://badurl to https://badurl to https://404

Schemeless URLS and misbehaving crwalers

I am using schemeless urls for loading few external libraries.
//ajax.googleapis.com/ajax/libs/jquery.....
The problem i am facing is that few crawlers are treating them as relative urls
www.mydomain.com//ajax.googleapis.com/ajax/libs/jquery.....
How can i handle such links for crawlers.
I am using Nginx server but i am fairly new to Nginx.
is some kind of rewrite possible?
Your URL is actually valid, it's the crawler's fault for not handling this case while crawling, I would just ignore it.
Also the 404 response is also valid from your server, because the crawler is requesting www.example.com//ajax.googleapis.com/.. which really doesn't exist.

Domain in cookies

I'm not sure so please explaine me if you know. I have problem with domain in Cookies.
According to the newewst rfc 6265 document it doesn't matter if domain in cookie starting with 'dot' or not.
For example:
Set-Cookie: example.com means the same as .example.com and it's valid for all subdomains like something.example.com and of course example.com .
So I have a question. How about "www"... If client have cookie setting to www.example.com -> client shouldn't send cookie for "example.com"??? But should send to: www.example.com???
Or maybe "www." is ignored too?
Could you explain me that? I can't find the answer.
Thank you.
No, there's nothing special about www; it works the same as any other subdomain. www.example.com is different to example.com as far as cookies are concerned. Regardless of your opinion, that is the fact. www as a subdomain is not special.
This is an important point, and does catch a lot of beginners out, particularly when they write links in their sites with the full domain name. If you navigate from www.example.com/index.html to example.com/nextpage.html, your cookies may not be visible on the second page if the cookies were created on the index page in the www subdomain.
You say you've already read the RFC, but you might find that this answer gives a little more clarification on it.
Hope that helps.

How does url rewrite works?

How does web server implements url rewrite mechanism and changes the address bar of browsers?
I'm not asking specific information to configure apache, nginx, lighthttpd or other!
I would like to know what kind of information is sent to clients when servers want rewrite url?
There are two types of behaviour.
One is rewrite, the other is redirect.
Rewrite
The server performs the substitution for itself, making a URL like http://example.org/my/beatuful/page be understood as http://example.org/index.php?page=my-beautiful-page
With rewrite, the client does not see anything and redirection is internal only. No URL changes in the browser, just the server understands it differently.
Redirect
The server detects that the address is not wanted by the server. http://example.org/page1 has moved to http://example.org/page2, so it tells the browser with an HTTP 3xx code what the new page is. The client then asks for this page instead. Therefore the address in the browser changes!
Process
The process remains the same and is well described by this diagram:
Remark Every rewrite/redirect triggers a new call to the rewrite rules (with exceptions IIRC)
RewriteCond %{REDIRECT_URL} !^$
RewriteRule .* - [L]
can become useful to stop loops. (Since it makes no rewrite when it has happened once already).
Are you talking about server-side rewrites (like Apache mod-rewrite)? For those, the address bar does not generally change (unless a redirection is performed).
Or are you talking about redirections? These are done by having the server respond with an HTTP code (301, 302 or 307) and the location in the HTTP header.
There are two forms of "URL rewrite": those done purely within the server and those that are redirections.
If it's purely within the server, it's an internal matter and only matters with respect to the dispatch mechanism implemented in the server. In Apache HTTPD, mod_rewrite can do this, for example.
If it's a redirection, a status code implying a redirection is sent in the response, along with a Location header indicating to which URL the browser should be redirected (this should be an absolute URL). mod_rewrite can also do this, with the [R] flag.
The status code is usually 302 (found), but it could be configured for other codes (e.g. 301 or 307).
Another quite common use (often unnoticed because it's usually on by default in Apache HTTPD) is the redirection to the the URL with a trailing slash on a directory. This is implemented by mod_dir:
A "trailing slash" redirect is issued
when the server receives a request for
a URL http://servername/foo/dirname
where dirname is a directory.
Directories require a trailing slash,
so mod_dir issues a redirect to
http://servername/foo/dirname/.
Jeff Atwood had a great post about this: http://www.codinghorror.com/blog/2007/02/url-rewriting-to-prevent-duplicate-urls.html
How web server implements url rewrite mechanism and changes the address bar of browsers?
URL rewriting and forwarding are two completely different things. A server has no control over your browser so it can't change the URL of your browser, but it can ask your browser to go to a different URL. When your browser gets a response from a server it's entirely up to your browser to determine what to do with that response: it can follow the redirect, ignore it or be really mean and spam the server until the server gives up. There is no "mechanism" that the server uses to change the address, it's simply a protocol (HTTP 1.1) that the server abides by when a particular resource has been moved to a different location, thus the 3xx responses.
URL rewriting can transform URLs purely on the server-side. This allows web application developers the ability to make web resources accessible from multiple URLs.
For example, the user might request http://www.example.com/product/123 but thanks to rewriting is actually served a resource from http://www.example.com/product?id=123. Note that, there is no need for the address displayed in the browser to change.
The address can be changed if so desired. For this, a similar mapping as above happens on the server, but rather than render the resource back to the client, the server sends a redirect (301 or 302 HTTP code) back to the client for the rewritten URL.
For the example above this might look like:
Client request
GET /product/123 HTTP/1.1
Host: www.example.com
Server response
HTTP/1.1 302 Found
Location: http://www.example.com/product?id=123
At this point, the browser will issue a new GET request for the URL in the Location header.

Resources