yahoo and bing search results caching old version of my website [closed] - wordpress

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I just put up a new site for my friends company.
If you search on google the link takes you to the correct home page www.durangoshadecompany.com.
However, if you search using yahoo or bing the link that comes up goes to a cached version of the page www.durangoshadecompany.com/index.html. This worked for the old site because it was static.
The new site is dynamically built on Wordpress so the index.html file brings up an error. Can I fix this or will I just have to wait until yahoo caches the correct home page.
I've tried searching for a remedy, but can't find anything that solves this problem.

Just install the redirection plugin and create the rule for the index.html file to point to /. That should fix the issue immediately without coding knowledge.

Caching is done on the search engine's servers, so there isn't much you can do about it.
If I had to hazard a guess at the methodology, most search engines probably only re-cache a site after a certain amount of change has occurred. Therefore, your best bet might be restructuring your site's code in a way that common different utilities like diff would see as large changes.
But that's just a guess.
Best of luck.

As the other answer mentioned, it is outside your direct control since you are at the mercy of specific search engines re-indexing you.
Each search engine is going to have its own rules, but I would suggest that your switching to WordPress will help in getting it updated. If you went with the standard install of WordPress, it hits ping-o-matic when you publish posts. That is a way to notify various services (although not Bing and Yahoo directly, I believe) that you have an update.
You can submit to ping-o-matic directly as well. Just go there and fill out the form.
You can (and should) sign up for Bing and Google webmaster tools (Yahoo is part of Bing's tools). This will give you an opportunity to let them know you've updated and that you would like to be crawled. It will also give you a chance to know when they have crawled you and what errors they may have encountered (so that you can correct them).
To make yourself even more friendly for being crawled, you should have an XML sitemap. You can submit the location of your sitemap through those tool sites for indexing. If you do not already have an XML sitemap, there are plugins for WordPress that will build it for you. Then all you need to worry about is submitting it.
For your index.html issue, if the site gets reindexed, that should remedy itself. However, if you want to be sure, what you want to use is a 301 redirect message. This tells the bot that it has been moved permanently and they will note that (i.e. you want to let them know that mysite.com/index.html has been permanently moved to mysite.com or something like that).
There are different ways to do that. You could create an index.html that delivers a 301 redirect message; or you could do it with .htaccess. I would lean toward the .htaccess method.

Related

How to mechanically identify all broken links in a drupal site

We have just moved to drupal and are trying to pro-actively identify all broken external web (http://, https://) links.
I've seen some references to validation of links but wasn't sure if it only meant validation of the syntax of the link as opposed to whether these web links work or not (e.g. 404).
What is the easiest way to go through all web links in a drupal site and identify all of the broken external web links? This is something we'd like to automate and schedule every day/week.
As someone else mentioned, use Link Checker module. It's a great tool.
In addition, you can check the Crawl errors in Google Webmaster tools for 404'd links like this:
Clicking any URL from there will show you where the URL was linked from so you can update any internal broken links. Be sure to use canonical URLs to avoid that.
Make sure you're using a proper internal linking strategy to avoid broken internal links in the first place, too: http://www.daymuse.com/blogs/drupal-broken-internal-link-path-module-tutorial
Essentially: use canonical, relative links to avoid broken internal links in the future when you change aliases. In simple Drupal terms, be sure you're linking to "node/23" instead of "domain.ext/content/my-node-title" since multiple parts of that might change in the future.
I have not found a Drupal based approach for this. The best, free piece of software I've found for finding bad links on sites is Screaming Frog SEO Spider Tool.
http://www.screamingfrog.co.uk/seo-spider/

Google analytic shows me wired links for one of my visitors

I have a website wich is registered with google analytic so I can see the statistics of it The problem is that sometime it shows me this link :
website.com/www.bndv521.cf/
or:
website.com/admin
I do not know if this is a hacker trying to hack me or something but I think nobody will try to access my admin for good
Can you help me to know what is this link refers to ?
Consider checking for a malicious code included on your pages. And yes it's likely that some one is trying to access those pages but it may not execute because it's invalid path. You should consider blocking such ip addresses after checking in logs.
Although trying to reach an admin page seems a suspicious action, in our website we come accross this issue every one in ten thousand requests.
We think that a browser extension or a virus like program tries to change URL or trying to add this keyword to URL. Not for a hacking purpose, but to redirect to their advertisement website.
Very similar issue here: Weird characters in URL

Google URL Crawl error 404 - domain appending to end of URL

I recently built and published my Wordpress site at www.kernelops.com and submitted it to the google index and webmaster tools. Today I logged into webmaster tools and found 60 URL errors all with the same type of issue. The base domain address www.kernelops.com is being appended to all my sites page, category, and post URLs. An example of the failed URL looks like this:
http://www.kernelops.com/blog/www.kernelops.com
Google Webmaster Tools indicates that this weird link is originating from the base url "http://www.kernelops.com/blog" which obviously means the issue is on my end. My Wordpress permalink settings are set to use the post-name; I'm not sure if that could be causing this, i.e.:
http://www.kernelops.com/sample-post/
I can't seem to find any help resolving this weird issue with google searches and thought someone here may be able to point me in the right direction.
The Wordpress plugins that would potentially affect the site's URLs are the following:
All in One SEO
XML-Sitemap
But I can't see any sort of setting within these plugins that would be causing this type of issue.
Any ideas would be greatly appreciated - thanks in advance!
This is a long shot, but it may be happening if the Google crawler picks up a link that seems like a relative path and attempts to append it to the current directory. It's highly unlikely that Google would have such a bug, but it's not impossible either.
The closes thing I could find that may be considered a relative path is this:
<div class="copyright">
...
Kernel, Inc.
...
</div>
I doubt that this is the problem, but it may be worth fixing it.
Now, there is yet another possibility and that's if the website serves slightly different content depending on the User Agent string. When Google presents your website with a User Agent string, the SEO plugins detects it and tries to optimize things in order to improve your ranking (not familiar with that plugins, so I don't know what it does exactly). There may be a bug in the SEO plugin that will cause the www.kernelops.com URL to look like a relative path or to actually construct that faulty URL somehow.
You can possibly test this by setting the user-agent string in your browser (e.g. FireFox's user-agent switcher) to Googlebot's user-agent string and test what happens when you visit your website. Look at the page source that you receive and look for any links that might look like the one Google is finding.
However, if the SEO tool is smart enough, it will "realize" that your IP doesn't match one of the valid IPs for Googlebot and it will not make the modifications.

Wordpress website is automatically redirecting after load [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I asked the following question on SuperUser.com and the question was closed. Maybe it should be asked on ServerFault.com. Not sure.
But here it is on SO hoping it will get some traction.
Hello,
I have a wordpress website. It is NOT a wordpress.com website. This website is hosted at godaddy.com This weekend whenever I fired up my browser and loaded the landing (or any other page) there it would load (firefox would say "Done") and then after a 1 second pause the browser would redirect to some seemingly random website.
Unfortunately (or fortunately?) this is an intermittent problem.
I use difficult to break passwords for my wordpress admin.
Any ideas on how to troubleshoot or what the problem is?
Seth
EDIT
Yes, the url is http://www.meeting-minutes.org. For the record, the reason I did not include the url when I reposted my question here is because I thought that someone might think that I am just trying to promote the software that I reference on the website. That is genuinely not my purpose.
EDIT Thanks for the help. I have taken the site down by simply renaming the hosting folder (so it now returns a 404 which is fine.) I will clean it up and redeploy after cleaning it up.
For the life of me I don't know how this could have happened.
Seth
Your Weblog has definitely been hacked. I can see very evil-looking JavaScript code in the source code of your blog:
<script language=javascript>document.write(unescape('%3C%73%63%72%69......
It is probably code to redirect to other sites, as you say. Your Blog's security must have been compromised somehow, this is definitely in your template's source code.
You should download everything and take the site down immediately to protect your visitors, and your site's reputation (to prevent it from getting on any malware blacklist). Check out the "Getting your site off line" chapter in the 2nd link. I don't know which version of Wordpress you're using, maybe WP's forums can be helpful in finding out how the break-in occurred. Maybe it's also a good idea to inform the hosting company and see whether they can provide any additional information. If you have access to any log files, fetch a copy and look whether they tell you anything.
Links:
Google Webmaster Tools: My Site has been hacked
My site's been hacked - now what? Very nice article on Google Webmaster Central
For later maybe:
Hardening WordPress
Specific to Wordpress (and linked numerous times in the Wordpress forums): FAQ: My site was hacked « WordPress Codex and how-to-completely-clean-your-hacked-wordpress-installation.
I would try accessing the site with JavaScript turned off. That would be a quick way of verifying if someone had put that in an onLoad. It certainly could have been written to fire intermittently.
If you have file access to the server, I would look at the .htaccess file, which might have rewrite rules in it.
Lastly, I would try accessing the website by IP address to detect DNS problems, but I find it highly unlikely it would work that way.
Don't forget to look closely at changes to your theme, which is the most likely avenue of attack.

how to completely Hide website from search engines?

Whats the best recommended way yo hide my staging website from search engines, i Googled it and found some says that i should put a metatag, and some said that i should put a text file inside my website directory, i want to know the standard way.
my current website is in asp.net, while i believe that it must be a common way for any website whatever its programming language.
Use a robots.txt file.
see here http://www.robotstxt.org/robotstxt.html
You could also use your servers robots.txt:
User-agent: *
Disallow: /
Google's crawler actually respects these settings.
Really easy answer; password protect it. If it’s a staging site then it quite likely is not intended to be publicly facing (private audience only most likely). Trying to keep it out of search engines is only treating a symptom when the real problem is that you haven’t appropriately secured it.
Keep in mind that you can't hide a public-facing unprotected web site from a search engine. You can ask that bots not index it (through the robots.txt that my fine colleagues have brought up), and the people who write the bots may choose not to index your site based on that, but there's got to be at least one guy out there who is indexing all the things people ask him not to index. At the very least one.
If this is a big requirement, keeping automated crawlers out, some kind of CAPCHA solution might work for you.
http://www.robotstxt.org/robotstxt.html
There are search engines / book marking services which do not use robots.txt. If you really don't want it to turn up ever I'd suggest using capcha's just to navigate to the site.
Whats the best recommended way yo hide my staging website from search engines
Simple: don't make it public. If that doesn't work, then only make it public long enough to validate that it is ready to post live and then take it down.
However, all that said, a more fundamental question is, "Why care?". If the staging site is really supposed to be the live site one step before pushing live, then it shouldn't matter if it is indexed.

Resources