I was testing a page with Twitter Cards and for the provided URL (I'm simplifying the URL but the output is valid) using Twitters card preview tool:
https://example.net/hello/test-page/
It was pulling the card but with this warning:
INFO: Page fetched successfully
INFO: 16 metatags were found
INFO: twitter:card = summary_large_image tag found
INFO: Card loaded successfully
WARN: this card is redirected to http://example.net/hello/test-page/
The only difference here is that the HTTPS has been switched to HTTP. It is not just Twitter, but FB and LinkedIn too.
Using LinkedIn's link preview tool, it similarly reported:
URL redirect trail
1 301 Redirect https://example.net/hello/test-page/
2 200 Success http://example.net/hello/test-page/
And with Facebooks link debugger:
Response Code
200
Fetched URL
https://example.net/hello/test-page/
Canonical URL
http://example.net/hello/test-page/
Redirect Path
Input URL -> https://example.net/hello/test-page/
301 HTTP Redirect -> http://example.net/hello/test-page/
og:url Meta Tag -> https://example.net/hello/test-page/
So... I check the source of the generated web page and I can confirm:
All OG and Twitter Card related META and LINK tags are not using "http". They are all "https". The canonical LINK tag is also using "https".
If I manually go to the "http" version of the URL in the browser, it redirects me immediately to the "https" version.
Can anyone explain why this might happen, and places where I should start to dig?
One last example, when I run this curl command in Terminal, it also seems fine:
curl -sLD - https://example.net/hello/test-page/ -o /dev/null -w '%{url_effective}'
HTTP/2 200
server: nginx
date: Sun, 23 Aug 2020 16:27:28 GMT
content-type: text/html; charset=UTF-8
strict-transport-security: max-age=31536000
vary: Accept-Encoding
last-modified: Sun, 23 Aug 2020 16:23:11 GMT
cache-control: max-age=43, must-revalidate
x-nananana: Batcache-Hit
vary: Cookie
x-ac: 2.ord _atomic_dca
(In case its relevant; this is a WordPress site but I have no unusual plugins running)
The 'issue' was with hosting provider, in this case Pressable, and because we used Lets Encrypt for SSL certs.
Here are the relevant parts from the support chat:
The reason that this occurs is because on WordPress.com, there wasn't
always free SSL certificates via Let's Encrypt. So, in relation to
Facebook, for example, posts that displayed a Facebook like button
with counts would have a canonical URL at Facebook like
http://myblog.com/some/post. Turning on SSL and 301 redirecting the
http:// version of that page to https:// would wipe out those
like/share counts because Facebook sees the exact same URL with a
different protocol as two distinct things.
So, the implementation had to serve SSL certificates and force SSL for
the millions of sites that were now going to have a free Let’s Encrypt
SSL certificate, but also needed to give these crawlers a way to
access the content over http.
When doing the Pressable implementation of Let’s Encrypt, a lot of it
was copied from WordPress.com, and it was not incorrect to do so.
Pressable also didn’t always have free SSL certificates. If we started
to force the redirection of sites that were always http to https that
had “like” buttons and share counts, those counts would have been
lost.
Can't say I'm thrilled with the answer.
Related
I just added a feature on a website to allow users to log in with Facebook. As part of the authentication workflow Facebook forwards the user to a callback URL on my site such as below.
https://127.0.0.1?facebook-login-callback?code.....#_=_
Note the trailing #_=_ (which is not part of the authentication data, Facebook appears to add this for no clear reason)
Upon receiving the request in the backend I validate the credentials, create a session for the user, then forward them to the main site using a Location header in the HTTP response.
I've inspected the HTTP response via my browser developer tools and confirmed I have set the location header as.
Location: https://127.0.0.1/
The issue is that the URL that appears in the browser address bar after forwarding is https://127.0.0.1/#_=_
I don't want the user to see this trailing string. How can I ensure it is removed when redirecting the user to a new URL?
The issue happens in all browsers I have tested. Chrome, Firefox, Safari and a few others
I know a similar question has been answered in other threads however there is no jquery or javascript in this workflow as in the other threads. All the processing of the login callback happens in backend code exlusively.
EDIT
Adding a bounty. This has been driving up the wall for some time. I have no explanation and don't even have a guess as to what's going on. So I'm going to share some of my hard earned Stackbux with whoever can help me.
Just To be clear on a few points
There is no Javascript in this authentication workflow whatsoever
I have implemented my own Facebook login workflow without using their Javascript libraries or other third party tools, it directly interacts with the Facebook REST API using my own Python code in the backend exclusively.
Below are excerpts from the raw HTTP requests as obtained from Firefox inspect console.
1 User connects to mike.local/facebook-login and is forwarded to Facebook's authentication page
HTTP/1.1 302 Found
Server: nginx/1.19.0
Date: Sun, 28 Nov 2021 10:44:30 GMT
Content-Type: text/plain; charset="UTF-8"
Content-Length: 0
Connection: keep-alive
Location: https://www.facebook.com/v12.0/dialog/oauth?client_id=XXX&redirect_uri=https%3A%2F%2Fmike.local%2Ffacebook-login-callback&state=XXXX
2 User accepts and Facebook redirects them to mike.local/facebook-login-callback
HTTP/3 302 Found
location: https://mike.local/facebook-login-callback?code=XXX&state=XXX#_=_
...
Requested truncated here. Note the offending #_=_ in the tail of the Location
3 Backend processes the tokens Facebook provides via the user forwarding, and creates a session for the user then forwards them to mike.local. I do not add #_=_ to the Location HTTP header as seen below.
HTTP/1.1 302 Found
Server: nginx/1.19.0
Date: Sun, 28 Nov 2021 10:44:31 GMT
Content-Type: text/plain; charset="UTF-8"
Content-Length: 0
Connection: keep-alive
Location: https://mike.local/
Set-Cookie: s=XXX; Path=/; Expires=Fri, 01 Jan 2038 00:00:00 GMT; SameSite=None; Secure;
Set-Cookie: p=XXX; Path=/; Expires=Fri, 01 Jan 2038 00:00:00 GMT; SameSite=Strict; Secure;
4 User arrives at mike.local and sees a trailing #_=_ in the URL. I have observed this in Firefox, Chrome, Safari and Edge.
I have confirmed via the Firefox inspect console there are no other HTTP requests being sent. In other words I can confirm 3 is the final HTTP response sent to the user from my site.
According to RFC 7231 §7.1.2:
If the Location value provided in a 3xx (Redirection) response does
not have a fragment component, a user agent MUST process the
redirection as if the value inherits the fragment component of the
URI reference used to generate the request target (i.e., the
redirection inherits the original reference's fragment, if any).
If you get redirected by Facebook to an URI with a fragment identifier, that fragment identifier will be attached to the target of your redirect. (Not a design I agree with much; it would make sense semantically for HTTP 303 redirects, which is what would logically fit in this workflow better, to ignore the fragment identifier of the originator. It is what it is, though.)
The best you can do is clean up the fragment identifier with a JavaScript snippet on the target page:
<script async defer type="text/javascript">
if (location.hash === '#_=_') {
if (typeof history !== 'undefined' &&
history.replaceState &&
typeof URL !== 'undefined')
{
var u = new URL(location);
u.hash = '';
history.replaceState(null, '', u);
} else {
location.hash = '';
}
}
</script>
Alternatively, you can use meta refresh/the Refresh HTTP header, as that method of redirecting does not preserve the fragment identifier:
<meta http-equiv="Refresh" content="0; url=/">
Presumably you should also include a manual link to the target page, for the sake of clients that do not implement Refresh.
But if you ask me what I’d personally do: leave the poor thing alone. A useless fragment identifier is pretty harmless anyway, and this kind of silly micromanagement is not worth turning the whole slate of Internet standards upside-down (using a more fragile, non-standard method of redirection; shoving yet another piece of superfluous JavaScript the user’s way) just for the sake of pretty minor address bar aesthetics. Like The House of God says: ‘The delivery of good medical care is to do as much nothing as possible’.
Not a complete answer but a couple of wider architectural points for future reference, to add to the above answer which I upvoted.
AUTHORIZATION SERVER
If you enabled an AS to manage the connection to Facebook for you, your apps would not need to deal with this problem.
An AS can deal with many deep authentication concerns to externalize complexity from apps.
SPAs
An SPA would have better control over processing login responses, as in this code of mine which uses history.replaceState.
SECURITY
An SPA can be just as secure as a website with the correct architecture design - see this article.
We've noticed that for some users of our website, they have a problem that if they following links to the website from external source (specifically Outlook and MS Word) that they arrive at the website in such a way that User.IsAuthenticated is false, even though they are still logged in in other tabs.
After hours of diagnosis, it appears to be because the FormsAuthentication cookie is not sent sometimes when the external link is clicked. If we examine in Fiddler, we see different headers for links clicked within the website, versus the headers which are as a result of clicking a link in a Word document or Email. There doesn't appear to be anything wrong with the cookie (has "/" as path, no domain, and a future expiration date).
Here is the cookie being set:
Set-Cookie: DRYXADMINAUTH2014=<hexdata>; expires=Wed, 01-Jul-2015 23:30:37 GMT; path=/
Here is a request sent from an internal link:
GET http://domain.com/searchresults/media/?sk=creative HTTP/1.1
Host: domain.com
Cookie: Diary_SessionID=r4krwqqhaoqvt1q0vcdzj5md; DRYXADMINAUTH2014=<hexdata>;
Here is a request sent from an external (Word) link:
GET http://domain.com/searchresults/media/?sk=creative HTTP/1.1
Host: domain.com
Cookie: Diary_SessionID=cpnriieepi4rzdbjtenfpvdb
Note that the .NET FormsAuthentication token is missing from the second request. The problem doesn't seem to be affected by which browser is set as default and happens in both Chrome and Firefox.
Is this normal/expected behaviour, or there a way we can fix this?
Turns out this a known issue with Microsoft Word, Outlook and other MS Office products: <sigh>
See: Why are cookies unrecognized when a link is clicked from an external source (i.e. Excel, Word, etc...)
Summary: Word tries to open the URL itself (in case it's an Office document) but gets redirected as it doesn't have the authentication cookie. Due to a bug in Word, it then incorrectly tries to open the redirected URL in the OS's default browser instead of the original URL. If you monitor the the "process" column in Fiddler it's easy to see the exact behaviour from the linked article occurring:
This is how a Google bot views my site -
HTTP/1.1 302 Found
Connection: close
Pragma: no-cache
cache-control: no-cache
Location: /LageX/
What does it mean? Is it good or bad? Thanks.
It's bad.
The above indicates that the content of your site is temporarily available at another location. Unless you have a good reason to set up a temporary (302) redirect, you should either move your content to where it is expected or set up a permanent (301) redirect.
The Location: header which is expected to hold the URI where the content is available is itself invalid, because its value is expected to be an absolute URI — something like http://domain.com/LageX/.
I would like to kindly ask you for a suggestion regarding browser cache invalidation.
Let's assume we've got an index page that is returned to the client with http headers:
Cache-Control: public, max-age=31534761
Expires: Fri, 17 Feb 2012 18:22:04 GMT
Last-Modified: Thu, 17 Feb 2011 18:22:04 GMT
Vary: Accept-Encoding
Should the user try to hit that index page again, it is very likely that the browser won't even send a request to the server - it will just present the user with the cached version of the page.
My question is: is it possible to create a web resource (for instance at uri /invalidateIndex) such that when a user hits that resource he is redirected to the index page in a way that forces the browser to invalidate its cache and ask the server for fresh content?
I'm having similar problems with a project of my own, so I have a few suggestions, if you haven't already found some solution...
I've seen this as a way jQuery forces ajax requests not to be cached: it adds a HTTP parameter to the URL with a random value or name, so that each new request has essentialy a different URL and the browser then never uses the cache. You could actually have the /invalidateIndex URI redirect to such a URL. The problem of course is that the browser never actually invalidates the original index URL, and that the browser will always re-request your index.
You could of course change the http header Cache-Control with a smaller max-age, say down to an hour, so that the cache is invalidated every hour or so
And also, you could use ETags, wherein the cached data have a tag that will be sent with each request, essentially asking the server if the index has changed or not.
2, 3 can be even combined I think...
There is no direct way of asking a browser to purge its cache of a particular file, but if you have only a few systems like this and plenty of bandwidth, you could try returning large objects on the same protocol, host, and port so that the cache starts evicting old objects. See https://bugzilla.mozilla.org/show_bug.cgi?id=81640 for example.
I'm trying to read my stock portfolio into a script. The following works with NAB Online Trading but not Bell Direct.
install the Export Domain Cookies Firefox addon
log in to my online broker with Firefox
save the domain cookies to a file (eg cookies.txt)
wget --no-check-certificate --load-cookies=cookies.txt -O folio.htm https://...(portfolio URL)
-- The idea being to reuse the browser's login session. When I try it with Bell Direct, wget is redirected to the login page. I get the same results with curl. What am I missing? Is there some state that is stored in the browser besides in the cookies? Bell isn't using "basic authentication" because the login page is a form for username / password - it doesn't pop up the browser's built-in login dialog.
Here is what happens (under Windows XP with Cygwin):
$ wget --server-response --no-check-certificate --load-cookies=cookies-bell.txt -O folio-bell.htm https://www.belldirect.com.au/trade/portfoliomanager/
--2009-12-14 10:52:08-- https://www.belldirect.com.au/trade/portfoliomanager/
Resolving www.belldirect.com.au... 202.164.26.80
Connecting to www.belldirect.com.au|202.164.26.80|:443... connected.
WARNING: cannot verify www.belldirect.com.au's certificate, issued by '/C=ZA/ST=Western Cape/L=Cape Town/O=Thawte Consulting cc/OU=Certification Services Division/CN=Thawte Server CA/emailAddress=server-certs#thawte.com':
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Connection: keep-alive
Date: Sun, 13 Dec 2009 23:52:16 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: /account/login.html?redirect=https://www.belldirect.com.au/trade/portfoliomanager/index.html
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 229
Location: /account/login.html?redirect=https://www.belldirect.com.au/trade/portfoliomanager/index.html [following]
...
Perhaps the server is validating the session based on User-Agent as well as the cookie. Check what user-agent your Firefox install is using (perhaps use WhatIsMyUserAgent.com if you don't know it), and try using that exact same user agent in your Wget call (via the --user-agent="..." parameter).
You need to POST the login form variables, then, with those cookies, goto the inner page.
http://www.trap17.com/index.php/automatic-login-curl_t38162.html for some example code.
The login is encrypted over the HTTPS protocol, and you do not provide a certificate. Perhaps belldirect requires a valid certificate for client authentication.
You can export a certificate in Firefox by clicking the highlighted blue portion of the URL > More Information > Security Tab > View Certificate > Details > Export. Then, you can use the --certificate=filename option to specify the exported certificate in your wget command.
Maybe you need to set the referrer too.