404 error on my homepage although I can see my site - wordpress

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian

The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.

Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

Related

HTTP to HTTPS issues

I have a question, I am a bit confused, I don't really understand why this is happening.
I have a website which works well over http. When I force redirect to https something happens. Even if I replace all my urls in my code, only GET request will work. Anybody has any idea why is this happening?
I also have admin part of the website. it works to login into the admin but it doesn't work to make any requests on it. I am trying to post or delete but I receive a 401 err, even if I am logged in and set the token right...
So bottom line is:
On Https, the website works, it shows all the resources from the db, I can login in the Admin but I can not post or delete.
On Http everything works.
I am in a huge need of advice or ideas.
thanks.
From my experience you cannot serve mixed content, that's my first suggestion is to call all your scripts/dependencies without the prefix; ie: script src="https://blahblah" to "script src="//blahblah"; you're going to make sure you are sticking consistently to one serving source; so that's the first thing I'd check (also look at console logs, they often give hints as to what failed);
Secondly I am unsure of the response or how the server handles traffic from non https, possibly there's a rule in htaccess or some form of redirection trying to force the call via https so http fails? these are all steps in debugging right you need to troubleshoot and play process of eliminations; first though I'd make sure we are serving everything from // or https; when on http I would look at console logs for clues but even more so I would force a redirect to use https exclusively (as most sites do now)
Check for mixed content issues first though, this is something that can have a multitude of solutions based on the many variations of what could be causing this issue.

What HTTP Status Code should be used for redirecting to a correct URL?

I am not so sure if the the title of the question is perfect to ask but here is what I am trying figure out. Of course in relation to SEO and such things.
You might have seen URLs with numbers at the end in addition to the title/slug in the URL. Such as
/topic/new-topic-at-this-website-1234
The web application actually takes that number (at the end) as an identity to the resource to deliver. So changing that number in the URL brings up another resource but in that case not all but some websites changes the full URL and redirect to the correct resource with correct URL. Even though some websites do not change the URL and just deliver the content.
While I want to redirect the user to correct URL if the numeric values changes, what status code should be used in that case?
Hope I have asked the question correctly.
OK, Until I find a suggestion, I am going use status code 301 for such redirection because I just found that in fact stackoverflow is handling the same case with 301. As requesting this page with following link,
/questions/33775526
redirects to
/questions/33775526/what-http-status-code-should-be-used-for-redirecting-to-a-correct-url
with status code 301.
Advices are still welcomed but

GoogleTag Manager transforming root URL

A client of ours currently uses an error reporting service which logs a number of errors thrown across the site and alerts us when a certain threshold is hit. This morning we received a large number of errors to do with the GoogleTag Manager service, however I'm unsure how or why these 404's are occurring and was wondering if anyone had seen similar behavior before.
What appears to be happening is that the GoogleTag iframe URL is being appended to the sites root URL and is thus causing a 404.
To illustrate this more clearly, we have the following iframe code
<iframe src="//www.googletagmanager.com/ns.html?id=GTM-KJHJ"
and we are seeing the following URL returned as a 404
www.website.com//www.googletagmanager.com/ns.html?id=GTM-KJHJ
Has anyone had an experience of this happening? Or would be able to indicate or think of any reason why this would have started to occur suddenly?
As a side note, this has been happening repeatedly from various IPs for the last few hours.
Thanks.
Scheme-relative urls work fine in all common browsers, so I think that these errors might be caused by some crawlers that do not parse such urls correctly. Have you checked the IP addresses in your logs to see where they come from?
Independent of that - if you need to get rid of these 404 errors then adding the protocol should do the trick. So change the GTM code such that both urls start with the protocol (http:// or https://) and the 404 errors should go away.

How can a bot get the contents of subsequent pages in a category listing in WordPress?

I'm writing a bot to automatically download pages from my WordPress blog. The bot gets most of the pages without a problem. For example, it can easily get the first page of the article listing of a given tag: http://example.com/myblog/index.php/archives/tag/mytag. However, for some reason it can't get the subsequent pages, like http://example.com/myblog/index.php/archives/tag/mytag/page/2.
I've tried to figure out what was going on, and here's what I found: while the server answers normally to most requests, upon such requests it answers with a 301 permanent redirect. Peculiarly, the Location header is set to the exact same URL as the request! Basically, the server tells me to redirect my request of the page http://example.com/myblog/index.php/archives/tag/mytag/page/2 to... the very same page :P
When trying to access the page from the browser I get the page without a problem. I thought maybe the browser sends some headers (including cookies) that my bot doesn't send, so I copied the headers (including the cookies) from my browser's web console, but the behaviour didn't change.
I would appreciate any suggestions regarding what might be causing this strange behaviour, what I can do in order to understand what's going on better, and of course what I can do in order to fetch those pages automatically, just like I fetch their brethren.
Thanks!
It seems this post hasn't generated much public interest. However, in case somebody ever runs into the same problem and finds this post, here's the solution I used. Important note: I still don't understand the behaviour I witnessed, and would appreciate it if somebody could explain it.
So the solution I've found is basically to use the URL http://example.com/myblog/archives/tag/mytag?paged=2 instead of http://example.com/myblog/index.php/archives/tag/mytag/page/2. Funnily enough, this URL gets redirected to the original one when browsed to from a browser! But when the bot requested it it got the page without redirection or anything. (So I managed to do what I wanted to do, but I've got no idea what happened there, why there was a problem in the first place, and why this solution worked: for one URL the bot gets infinite redirection and the browser just gets the page, while for the other the browser gets redirected [finitely] and the bot gets the page. I am yet to figure this one out...)

using customErrors for vanity URLs / asp.net url redirection

So, from here...
In ASP.NET, you have a choice about how to respond to that - it's in the web.config as CustomErrors. Turn that on, then redirect to a fancy 404 page (maybe you already do). The fancy 404 page, then, could be checking the requested querystring (which gets passed over to the custom error page as yet another querystring) to see if it's a valid redirect, lives in your database, etc. Just do a Response.Redirect() from there.
Then schooner writes:
Thanks, we do have a 404 now but we would prefer this not to be detected as a 404 in the process. We would like ot handle it directly and seperately if possible.
..and I'd like to know just how bad a practice this is. I don't expect to put my "pretty" URLs on the internet (just business cards) and I have a sample of 404-redirecting-to-a-helpful-site code working, but I don't want to get to production and have an issue with a browser that takes the initial 404 too seriously. Can anyone help me understand more about why I wouldn't want to use customErrors / 404 to flow users to the page they actually wanted?
The main problem with using customeErrors as your 404 error handler is that every time customErrors picks up an errored request rather than throwing a 404 error back to your browser and letting your browser know there was a bad request, it instead returns a 302 which indicates that a page has been relocated to whatever your customErrors page is. This isn't bad for most users because they don't know or even notice the difference, the problem comes from the fact that web crawlers DO know the difference and the status code they receive directly affects how their indexing works.
Consider the scenario where you have a page at http://mysite.com/MyAwesomePageAboutStuff.aspx for some period of time and then one day you decide you no longer need it and delete the file. If Google or some other crawler has already indexed that URL and goes back to it after you delete it the crawler will get a 302 status code instead of a 404 error and because of this status code the crawler will update the page's url to point to your error page rather deleting the non-existent link. Now, whenever someone finds that url by way of a search engine they'll end up at your error page.
It's not really a huge issue, but you can definitely see the headaches this can create for your users in the long run.
Look here for some corroborating data.
I created a vanity url system using the 404 as the handler. There's no need for a 302 on my side as the 404 dynamically loads the content and returns that. I am fully able to handle any and all POST / GET and SERVER data.
Works great. If you are interested TarantulaHawk is up on SourceForge.

Resources