Email links in Gmail make two requests - http

I've encountered a weird situation, after registration we're sending an email with a verification link, pretty standard stuff, but somehow clicking on the link seems to make the request twice, looking at the logs, the first time it comes from my IP and the second request comes from some Google IP: 66.102.8.60 (doing a reverse lookup shows google-proxy-66-102-8-60.google.com).
Any idea what's going on and how to prevent this?
The server is running Nginx and the site is Ruby on Rails if that helps.

I do not know the root cause but my best guess is same as Tripleee wrote above - most probably google is scanning urls. This happens in all browsers (well at least in Chrome and Firefox), but only under following circumstances:
the url is clicked from gmail (if you copy paste it to browser tab, the second request is not issued)
the url is clicked for the first time... Subsequent clicks from the same email do not trigger second request
I know it is probably not the answer you expected, but after giving it some thought I figured that operation like this should be handled on server side. In my case I am tracking information about confirmation urls anyways, so the first time the request comes to my backend I am deleting it and proceeding with confirmation normally. Since the confirmation entry is missing in the database for the second request it returns immediately with status 404, 422 or something whatever suits you.
Hope that helps anyone who gets here looking for an answer to this problem ;)

Related

404 error on my homepage although I can see my site

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.
Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

Symfony Not finding Routes

Have to admit I am a Symfony newB. Generally, I do know what I am doing but I am stumped with this problem.
I have been given a pair of packages to maintain. One side is a front end written with Angular.js. Then there is the backed written using Symfony.
After installing the backend portion, using composer. I am now trying to test.
The front end seems to be good when it fires up at xxx.yyy.com/app. Its first screen is a login screen where it asks for username and password. The submit button fires off a request of xxx.yyy.com/api/users/token. The username and password from the screen are stored in the http request as json.
Once the request is made I have determined that app.php in the symfony code fires off and starts the user authentication process. After a lot of work in trying to debug through the code, I can see that the symfony backend does have the user name and password and knows that the request is a POST request in good form. However, I keep getting the error "Route Not Found" and then am kicked out of symfony.
There is every reason to believe the code is written correctly and the problem lies in something I have done to install the code. When I run the debug:router process, I can find the route as a correct one. But, this route is never found. I have also tried other routes with the same result.
Can anyone suggest a reason why routes shown in the debug:router process do not work in actual use? I am really stumped and would appreciate some suggestions.
Think I am really on to the problem. It's the way that the front end javascript is creating the URL for the backend along with the way my server is configeured.
My front end is xxx.kjitx.com/app This code is then adding the base URL to the command for the backend to form a request of xxx.kjitx.com/api/users/token Then when my backend receives control it is stripping off the xxx.kjitx.com/api part of the url and sending the users/token string to the router. The router is looking for /api/users/token so the routing fails. In the handshake I lost the first piece, api, of the route. Found this out by forcing the front end to add an extra piece of api, i.e. xxx.kjitx.com/api/api/users/token and it works.
Now I just need to go back into my code to properly set up my addressing so I don't loose an important part of the address.
Does the app use CORS?
Perhaps you have to whitelist your dev domains

What happens if a 302 URI can't be found?

If I make an HTTP request to get index.html on http://www.example.com but that URL has a 302 re-direct in place that points to http://www.foo.com/index.html, what happens if the redirect target (http://www.foo.com/index.html) isn't available? Will the user agent try the original URL (http://www.example.com/index.html) or just return an error?
Background to the question: I manage a legacy site that supports a few existing customers but doesn't allow new signs ups. Pretty much all the pages are redirected (using 302s rather than 301s for some unknown reason...) to a newer site. This includes the sign up page. On one of the pages that isn't redirected there is still a link to the sign up page which itself links through to a third party payment page (i.e. on another domain). Last week our current site went down for a couple of hours and in that period someone successfully signed up through the old site. The only way I can imagine this happened is that if a 302 doesn't find its intended URL some (all?) user agents bypass the redirect and then go to originally requested URL.
By the way, I'm aware there are many better ways to handle the particular situation we're in with the two sites. We're on it! This is just one of those weird situations I want to get to the bottom of.
You should receive a 404 Not Found status code.
Since HTTP is a stateless protocol, there is no real connection between two requests of a user agent. The redirection status codes are just a way for servers to politely tell their clients that the resource they were looking for is somewhere else now. The clients, however, are in no way obliged to actually request the resource from that other URL.
Oh, the signup page is at that URL now? Well then I don't want it anymore... I'll go and look at some kittens instead.
Moreover, even if the client decides to do request the new URL (which it usually does ^^), this can be considered as a completely new communication between server and client. Neither server nor client should remember that there was a previous request which resulted in a redirection status code. Instead, the current request should be treated as if it was the first (and only) request. And what happens when you request a URL that cannot be found? You get a 404 Not Found status code.

How can a bot get the contents of subsequent pages in a category listing in WordPress?

I'm writing a bot to automatically download pages from my WordPress blog. The bot gets most of the pages without a problem. For example, it can easily get the first page of the article listing of a given tag: http://example.com/myblog/index.php/archives/tag/mytag. However, for some reason it can't get the subsequent pages, like http://example.com/myblog/index.php/archives/tag/mytag/page/2.
I've tried to figure out what was going on, and here's what I found: while the server answers normally to most requests, upon such requests it answers with a 301 permanent redirect. Peculiarly, the Location header is set to the exact same URL as the request! Basically, the server tells me to redirect my request of the page http://example.com/myblog/index.php/archives/tag/mytag/page/2 to... the very same page :P
When trying to access the page from the browser I get the page without a problem. I thought maybe the browser sends some headers (including cookies) that my bot doesn't send, so I copied the headers (including the cookies) from my browser's web console, but the behaviour didn't change.
I would appreciate any suggestions regarding what might be causing this strange behaviour, what I can do in order to understand what's going on better, and of course what I can do in order to fetch those pages automatically, just like I fetch their brethren.
Thanks!
It seems this post hasn't generated much public interest. However, in case somebody ever runs into the same problem and finds this post, here's the solution I used. Important note: I still don't understand the behaviour I witnessed, and would appreciate it if somebody could explain it.
So the solution I've found is basically to use the URL http://example.com/myblog/archives/tag/mytag?paged=2 instead of http://example.com/myblog/index.php/archives/tag/mytag/page/2. Funnily enough, this URL gets redirected to the original one when browsed to from a browser! But when the bot requested it it got the page without redirection or anything. (So I managed to do what I wanted to do, but I've got no idea what happened there, why there was a problem in the first place, and why this solution worked: for one URL the bot gets infinite redirection and the browser just gets the page, while for the other the browser gets redirected [finitely] and the bot gets the page. I am yet to figure this one out...)

How to know if the current Servlet request is the result of a redirect?

Is there a way to know if the request has been redirected or forwarded in the doGet method of a Servlet?
In my application, when a user (whose session has timed out) clicks on a file download link, they're shown the login page, which is good. When they login, they are immediately sent the file they requested, without updating the page they see, which is bad. Basically, they get stuck on the login screen (a refresh is required).
What I want to do is interrupt this and simply redirect to the page with the link, when a file is requested as a result of a redirect.
Perhaps there are better ways to solve this?
The redirect happens client-side. The browser is instructed by the previous request to send a new request, so to the server it does not make a difference. The Referer header might contain some useful information, but it's not certain.
When redirecting you can append some parameter, like ?targetPage=dowloadpage and then check if the parameter exists. You may have to put this in a hidden field on the login page if you want it to be transferred through multiple pages.
If you're using container managed authentication, then I don't believe you can detect this since the server will only involve your resource once authentication has been completed successfully.
If you're managing authentication differently, please explain.

Resources