Circular redirect path detected and wrong Open Graph data displayed - http

When sharing the following URL to Facebook
www.magicsoftware.com
You will get outdated information. Facebook refers to the site (magicsoftware.com/en) and takes all the information from the cache.
I tried to clear the cache by going to the dubugger-
https://developers.facebook.com/tools/debug/og/object?q=www.magicsoftware.com
But that didn't help much.
Someone has an idea what I can do?
P.S - if you checked the debugger link, you would see that there are two critical errors mentioned:
Could Not Follow Redirect: URL requested a HTTP redirect, but it could
not be followed. Errors That Must Be Fixed
Circular Redirect Path: Circular redirect path detected (see 'Redirect
Path' section for details).
What does that mean?

Your server is issuing redirect to the same URL as visited based on some condition, actually according to my tests on any requests that came without Accept-Language header get redirected.
See with Accept-Language header, and without any headers
Facebook linter doesn't seems to pass this header while crawling your OpenGraph meta and hung due to redirection loop.
You should avoid that redirection (or at least have some fallback) for Facebook linter to be able to collect updated data and update the cached version.

Same thing is happening to me now. I have no redirect in place. but I am getting this message " there was an error following the redirect path." when using the debugger on this URL http://www.mmaid.co/cleaning-services/offers/coupons/social-discount.php I will give it time and see if it fixes itself.

I found the solution myself - and it's only patience :)
Facebook just needs time to remove their cache files. So the solution is simply to use the Facebook Debugger to enter your URL and then to wait. Facebook will automatically refresh this URL cache.

Related

How to bypass google's cookie consent redirect when making a request with Guzzle

I am using Guzzle to pull data from content that the end location of a google rss feed link. e.g.
https://news.google.com/rss/articles/CBMiVWh0dHBzOi8vd3d3LmxvbmRvbi1maXJlLmdvdi51ay9pbmNpZGVudHMvMjAyMy9qYW51YXJ5L21haXNvbmV0dGUtZmlyZS1zdHJlYXRoYW0taGlsbC_SAQA?oc=5
When using curl with -L (location) flag it appears to bypass the consent redirect and pulls through the end location content.
I am using Drupal 10 with httpclient available which I understand uses Guzzle 7. How do I do the same there?
When enabling 'track redirects' guzzle feature I can see it appears to be getting stuck redirecting to google consent page and not redirecting to the end location?
e.g.
An AJAX HTTP error occurred. HTTP Result Code: 200 Debugging information follows. Path: /batch?id=328&op=do_nojs&op=do StatusText: parsererror ResponseText: Redirecting https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc=5 to https://consent.google.com/m?continue=https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc%3D5&gl=GB&m=0&pc=n&hl=en-US&src=1 Redirecting https://consent.google.com/m?continue=https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc%3D5&gl=GB&m=0&pc=n&hl=en-US&src=1 to https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc=5&ucbcb=1 Redirecting https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc=5&ucbcb=1 to https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1ONIBUmh0dHBzOi8vd3d3Lm15bG9uZG9uLm5ld3MvbmV3cy9wcm9wZXJ0eS9pbS1lc3RhdGUtYWdlbnQtcmVudGluZy1zb3V0aC0yNjA2MDI1OC5hbXA?oc=5&ucbcb=1&hl=en-GB&gl=GB&ceid=GB:en
This appeared to be working fine prior to updating to d10 that also includes symfony 4-6 update behind the scenes, so not sure if that is related?
After looking into this a little more I believe the issue I was having is more to do with Google using Javascript to handle the redirect. I have tested this by turning javascript off in the browser and doing so the redirect does not work. This is in combination with all News links in rss feeds now linking to Google first rather than the final source.
So to overcome this I have had to add an additional step that extracts the url from this middle page which I can then use to do the final lookup.

WordPress: First login of a new browser session always fails

I'm working on a WordPress website: https://samarazakaz.ru/
The client discovered a strange bug. After newly opening a browser the first login always fails, second one succeeds.
I tracked down the issue to a strange cookie with the name RCPC that is being set when the login form is submitted. If the cookie is missing then the login fails regardless of proper credentials.
I searched high and wide for any information about this cookie but could not find anything useful. The only thing remotely resembling my case was on some discussion on a site called https://codeforces.com/ . But nothing on that mentioned anything related to WordPress.
The site has a bare-bones setup with Elementor and my own plugin. And nothing in my code messes with cookies or the login process. I downloaded all website files and search in all files for "RCPC" but found nothing.
The site is behind an Nginx proxy, but I could not find any connection with this cookie and Nginx either.
I noticed that the value of this cookie is constant. So, as a workaround I jerry-rigged my plugin to set this cookie any time when it's not set. But, of course, I'm not very happy with that solution because I don't know if this will just stop working one day.
Update:
I verified that this is coming from the hosting. I renamed the /wp-login.php file and made a request to it, and it didn't return a 404 error but a 200 page with the same redirect code and the header to set the cookie. The hosting is reg.ru .
As far as I can figure this is a counter measure against automated password guessing. Any request (POST, GET, etc) to the /wp-login.php will get the redirect script with the cookie setting header. Only requests containing the correct RCPC cookie will get forwarded.
Upon further testing found that the value of the RCPC cookie is some kind of hash generated from the request's IP address. Because all of our computers got the same one but from other locations its different.
This does not cause any problem if the standard WordPress login form is used because that lives at the /wp-login.php address, so the first GET request will generate the cookie. However, we had a custom login page which didn't access /wp-login.php until the form was submitted.
Based on these discoveries I made a workaround, which is simply adding a one line JS script to the login page which makes a (fetch) request to the /wp-login.php page and simply discards the result. This is enough to set the cookie in the browser so that the form will work at the first try.
Need on hosting disable test-cookie-module

404 error on my homepage although I can see my site

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.
Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

How can a bot get the contents of subsequent pages in a category listing in WordPress?

I'm writing a bot to automatically download pages from my WordPress blog. The bot gets most of the pages without a problem. For example, it can easily get the first page of the article listing of a given tag: http://example.com/myblog/index.php/archives/tag/mytag. However, for some reason it can't get the subsequent pages, like http://example.com/myblog/index.php/archives/tag/mytag/page/2.
I've tried to figure out what was going on, and here's what I found: while the server answers normally to most requests, upon such requests it answers with a 301 permanent redirect. Peculiarly, the Location header is set to the exact same URL as the request! Basically, the server tells me to redirect my request of the page http://example.com/myblog/index.php/archives/tag/mytag/page/2 to... the very same page :P
When trying to access the page from the browser I get the page without a problem. I thought maybe the browser sends some headers (including cookies) that my bot doesn't send, so I copied the headers (including the cookies) from my browser's web console, but the behaviour didn't change.
I would appreciate any suggestions regarding what might be causing this strange behaviour, what I can do in order to understand what's going on better, and of course what I can do in order to fetch those pages automatically, just like I fetch their brethren.
Thanks!
It seems this post hasn't generated much public interest. However, in case somebody ever runs into the same problem and finds this post, here's the solution I used. Important note: I still don't understand the behaviour I witnessed, and would appreciate it if somebody could explain it.
So the solution I've found is basically to use the URL http://example.com/myblog/archives/tag/mytag?paged=2 instead of http://example.com/myblog/index.php/archives/tag/mytag/page/2. Funnily enough, this URL gets redirected to the original one when browsed to from a browser! But when the bot requested it it got the page without redirection or anything. (So I managed to do what I wanted to do, but I've got no idea what happened there, why there was a problem in the first place, and why this solution worked: for one URL the bot gets infinite redirection and the browser just gets the page, while for the other the browser gets redirected [finitely] and the bot gets the page. I am yet to figure this one out...)

using customErrors for vanity URLs / asp.net url redirection

So, from here...
In ASP.NET, you have a choice about how to respond to that - it's in the web.config as CustomErrors. Turn that on, then redirect to a fancy 404 page (maybe you already do). The fancy 404 page, then, could be checking the requested querystring (which gets passed over to the custom error page as yet another querystring) to see if it's a valid redirect, lives in your database, etc. Just do a Response.Redirect() from there.
Then schooner writes:
Thanks, we do have a 404 now but we would prefer this not to be detected as a 404 in the process. We would like ot handle it directly and seperately if possible.
..and I'd like to know just how bad a practice this is. I don't expect to put my "pretty" URLs on the internet (just business cards) and I have a sample of 404-redirecting-to-a-helpful-site code working, but I don't want to get to production and have an issue with a browser that takes the initial 404 too seriously. Can anyone help me understand more about why I wouldn't want to use customErrors / 404 to flow users to the page they actually wanted?
The main problem with using customeErrors as your 404 error handler is that every time customErrors picks up an errored request rather than throwing a 404 error back to your browser and letting your browser know there was a bad request, it instead returns a 302 which indicates that a page has been relocated to whatever your customErrors page is. This isn't bad for most users because they don't know or even notice the difference, the problem comes from the fact that web crawlers DO know the difference and the status code they receive directly affects how their indexing works.
Consider the scenario where you have a page at http://mysite.com/MyAwesomePageAboutStuff.aspx for some period of time and then one day you decide you no longer need it and delete the file. If Google or some other crawler has already indexed that URL and goes back to it after you delete it the crawler will get a 302 status code instead of a 404 error and because of this status code the crawler will update the page's url to point to your error page rather deleting the non-existent link. Now, whenever someone finds that url by way of a search engine they'll end up at your error page.
It's not really a huge issue, but you can definitely see the headaches this can create for your users in the long run.
Look here for some corroborating data.
I created a vanity url system using the 404 as the handler. There's no need for a 302 on my side as the 404 dynamically loads the content and returns that. I am fully able to handle any and all POST / GET and SERVER data.
Works great. If you are interested TarantulaHawk is up on SourceForge.

Resources