Google URL Crawl error 404 - domain appending to end of URL - wordpress

I recently built and published my Wordpress site at www.kernelops.com and submitted it to the google index and webmaster tools. Today I logged into webmaster tools and found 60 URL errors all with the same type of issue. The base domain address www.kernelops.com is being appended to all my sites page, category, and post URLs. An example of the failed URL looks like this:
http://www.kernelops.com/blog/www.kernelops.com
Google Webmaster Tools indicates that this weird link is originating from the base url "http://www.kernelops.com/blog" which obviously means the issue is on my end. My Wordpress permalink settings are set to use the post-name; I'm not sure if that could be causing this, i.e.:
http://www.kernelops.com/sample-post/
I can't seem to find any help resolving this weird issue with google searches and thought someone here may be able to point me in the right direction.
The Wordpress plugins that would potentially affect the site's URLs are the following:
All in One SEO
XML-Sitemap
But I can't see any sort of setting within these plugins that would be causing this type of issue.
Any ideas would be greatly appreciated - thanks in advance!

This is a long shot, but it may be happening if the Google crawler picks up a link that seems like a relative path and attempts to append it to the current directory. It's highly unlikely that Google would have such a bug, but it's not impossible either.
The closes thing I could find that may be considered a relative path is this:
<div class="copyright">
...
Kernel, Inc.
...
</div>
I doubt that this is the problem, but it may be worth fixing it.
Now, there is yet another possibility and that's if the website serves slightly different content depending on the User Agent string. When Google presents your website with a User Agent string, the SEO plugins detects it and tries to optimize things in order to improve your ranking (not familiar with that plugins, so I don't know what it does exactly). There may be a bug in the SEO plugin that will cause the www.kernelops.com URL to look like a relative path or to actually construct that faulty URL somehow.
You can possibly test this by setting the user-agent string in your browser (e.g. FireFox's user-agent switcher) to Googlebot's user-agent string and test what happens when you visit your website. Look at the page source that you receive and look for any links that might look like the one Google is finding.
However, if the SEO tool is smart enough, it will "realize" that your IP doesn't match one of the valid IPs for Googlebot and it will not make the modifications.

Related

LinkedIn sharing doesn't work as expected

I'm looking for an easy way to share through LinkedIn without all that hassle with OAuth 2.0 which I don't see required when I see other pages that use this kind of sharing (and they didn't required anything from - I can straight out share).
Straight to the issue:
this one works: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me
this one doesn't: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me%2Fjob%2F494
Seems like beyond main domain I can't get sharing working. For instance from other site a link that goes deeper and is still shareable: https://www.linkedin.com/shareArticle?mini=true&url=https://bulldogjob.pl/companies/jobs/2043-programista-java-warszawa-bms-sp-z-o-o&title=Programista+Java&summary=&source=https://bulldogjob.pl
I also tested with and without source and summary query params. Anyone had that issue?
LinkedIn uses the Open Graph protocol (http://ogp.me/) to determine how pages are shared in LinkedIn.
You may also use the LinkedIn Post Inspector (https://www.linkedin.com/post-inspector/) tool to debug how various pages would be shared in LinkedIn.
I decoded your URL so I could get a cleaner look...
https://www.linkedin.com/sharing/share-offsite/?url=https://refair.me/job/494
So, let's try to visit your URL: https://refair.me/job/494 . The webpage you are sharing DOES NOT LOAD.
Is your site down for everyone? Yes, your site is down for everyone.
In order to share a URL on LinkedIn, you must fulfill the following minimum requirements:
The URL must load.
If you just want to test out the API, try using wikipedia.org or google.com as test pages.
Surprisingly, the old refair.me URL by itself works fine in LinkedIn, but that could be from some internal cache, from way back in the day when the page once did work. It certainly does not do so anymore.

Making your site shareable on LinkedIn

I'm having a few issues with making our site shareable on linked in and I'm at a loss. The og: meta tags all look fine, the facebook scraper picks it up fine, but the linkedIn scraper does not... and the img etc are not on a protected folder or anything like that.
When inspecting the developer tools the get request to the url-preview?url= link shows that the img etc.. aren't there.
The image is less than 1mb, all og: meta tags are obeyed. The only think that may not be 100% is the image ratio is not 1/4 or 4/1 (it's 2/1)... But that is only a recommendation and not a hard and fast rule.
Does LinkedIn provide something similar to FB (https://developers.facebook.com/tools/debug/) where you can test the scraper and re-run it? Or is there another way to debug this? Any help appreciated.
https://www.hipla.co.uk (is the page i'm trying to share).
cheers
It transpires linked in doesn't offer a similar facility to FB or twitter to test the OG meta tags and re-scrape the page. They cache a page for 7 days and then re-scrape again. However, you can refresh the linkedIn crawler cache simply by appending GET params to the URL, i.e. https://www.hipla.co.uk?123.
I eventually figured out what our issue was. We were using a wildcard cert (for multi domain, so we could have a single ssl cert for multiple subdomains) which meant we had to set the server name in the apache default-ssl.conf file, but we had a typo in it for the www instance ... which meant it gave an SSL error (for the linkedIn crawler) which isn't debuggable (if that's a word) using linkedIn but was spotted as we got an SSL error when testing the twitter metadata tags using the twitter card validator. Hope this helps anyone else who has a typo in their ssl settings. Note that the ssl error was not visible using a browser(s) as all looked fine.

Google listed a blog post with https and I don't know why?

Two days ago we posted a new blog on a site with the aim of being picked up for the search term "live comedy in chippenham". It’s been indexed by Google and we’re now 2nd in the results for the search query. The bad news is that for some reason the post has been indexed as a https URL so all browsers give a warning when the link is clicked.
Firefox gives this error:
The owner of www.neeld.co.uk has configured their website improperly. To protect your information from being stolen, Firefox has not connected to this website.
The host has confirmed that it's not a server config error and we have other posts and pages on the site that are being indexed correctly. We're using WordPress and the Yoast plugin. I can't see anywhere in Webmaster Tools that could be causing the problem.
Can anyone offer any advice please? If you search Google for "live comedy in chippenham" you'll see the issue (it's the link https://www.neeld.co.uk/live-comedy-in-chippenham/)?
It's a really strange one but something I've experienced before.
It has mostly likely been caused by an external link to the page using https protocol which Google has followed before indexing the page. Google are very keen to index https pages at the moment so we might start seeing this kind of issue more often.
There's not a lot you can do other than wait for Google to realise their mistake and list the correct URL in the SERPS. You can help speed this along with a canonical link (which I can see is there), XML sitemap (which you've got) and a server level redirect of https to http.
Do not try to remove the page in Webmaster Tools as this won't have the desired effect and will stop Google reindexing the page properly.
Hope this helps.

Google analytic shows me wired links for one of my visitors

I have a website wich is registered with google analytic so I can see the statistics of it The problem is that sometime it shows me this link :
website.com/www.bndv521.cf/
or:
website.com/admin
I do not know if this is a hacker trying to hack me or something but I think nobody will try to access my admin for good
Can you help me to know what is this link refers to ?
Consider checking for a malicious code included on your pages. And yes it's likely that some one is trying to access those pages but it may not execute because it's invalid path. You should consider blocking such ip addresses after checking in logs.
Although trying to reach an admin page seems a suspicious action, in our website we come accross this issue every one in ten thousand requests.
We think that a browser extension or a virus like program tries to change URL or trying to add this keyword to URL. Not for a hacking purpose, but to redirect to their advertisement website.
Very similar issue here: Weird characters in URL

yahoo and bing search results caching old version of my website [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I just put up a new site for my friends company.
If you search on google the link takes you to the correct home page www.durangoshadecompany.com.
However, if you search using yahoo or bing the link that comes up goes to a cached version of the page www.durangoshadecompany.com/index.html. This worked for the old site because it was static.
The new site is dynamically built on Wordpress so the index.html file brings up an error. Can I fix this or will I just have to wait until yahoo caches the correct home page.
I've tried searching for a remedy, but can't find anything that solves this problem.
Just install the redirection plugin and create the rule for the index.html file to point to /. That should fix the issue immediately without coding knowledge.
Caching is done on the search engine's servers, so there isn't much you can do about it.
If I had to hazard a guess at the methodology, most search engines probably only re-cache a site after a certain amount of change has occurred. Therefore, your best bet might be restructuring your site's code in a way that common different utilities like diff would see as large changes.
But that's just a guess.
Best of luck.
As the other answer mentioned, it is outside your direct control since you are at the mercy of specific search engines re-indexing you.
Each search engine is going to have its own rules, but I would suggest that your switching to WordPress will help in getting it updated. If you went with the standard install of WordPress, it hits ping-o-matic when you publish posts. That is a way to notify various services (although not Bing and Yahoo directly, I believe) that you have an update.
You can submit to ping-o-matic directly as well. Just go there and fill out the form.
You can (and should) sign up for Bing and Google webmaster tools (Yahoo is part of Bing's tools). This will give you an opportunity to let them know you've updated and that you would like to be crawled. It will also give you a chance to know when they have crawled you and what errors they may have encountered (so that you can correct them).
To make yourself even more friendly for being crawled, you should have an XML sitemap. You can submit the location of your sitemap through those tool sites for indexing. If you do not already have an XML sitemap, there are plugins for WordPress that will build it for you. Then all you need to worry about is submitting it.
For your index.html issue, if the site gets reindexed, that should remedy itself. However, if you want to be sure, what you want to use is a 301 redirect message. This tells the bot that it has been moved permanently and they will note that (i.e. you want to let them know that mysite.com/index.html has been permanently moved to mysite.com or something like that).
There are different ways to do that. You could create an index.html that delivers a 301 redirect message; or you could do it with .htaccess. I would lean toward the .htaccess method.

Resources