How to let or restrict Google Bot index or crawl certain things in Wordpress?

Well, I have problem with Google Bot. Taking 700MB of bandwidth daily. This is for those which will obviously ask why I want to do this.
I know about robots.txt and that I can stop bots to index some folders.
But what in WordPress, I am using post-name permalinks, so permalinks for posts and pages are just /page or /post.
Searched for any plugin to restrict bot on indexing only few tags and few categories, didn't found it.
Want to allow sticky posts, few categories, few tags.
Can be done? How?
I have update on this question.
I decided to go with robots.txt rulles.
User-agent: *
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: AhrefsBot/3.1
Disallow: /
User-agent: Yahoo-slurp
Disallow: /
User-agent: Msnbot
Disallow: /
User-agent: Googlebot
Allow: /
Disallow: /category
Disallow: /video
Disallow: /author
Disallow: /?s=
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /tag
Allow: /tag/marry
Allow: /tag/john
Will last two tags be indexed?
And is there something more to hide in WordPress?

If you want to allow particular posts but disallow everything else, then use Allow tags. For example:
User-agent: Googlebot
Allow: /post/foo
Allow: /page/bar
Disallow: *
So the bot can crawl the pages you specify, but not anything else.


Google index: robots.txt to stop wp uploads indexing

I have a Wordpress site that is being indexed by google, but google is picking up images as search results - ie if I do I see loads of results which, when clicked on, just go to images from wp-content/uploads/
How do I stop these from coming up in search results, whilst still allowing them in google images?
I've made changes to my robots.txt so the first bit reads:
Noindex: /product-tag/*
Noindex: /product-tag/
Noindex: /wp-content/uploads/*
Noindex: /forum/profile/*
Noindex: /my-account/*
Noindex: /my-account/
Noindex: /?s=*
Noindex: /tag/*
Disallow: /wp-admin/
Disallow: /wp-content/uploads/*
Disallow: /product-tag/*
Disallow: /product-tag/
Disallow: /forum/profile/*
Disallow: /my-account/*
Disallow: /my-account/
Disallow: /?s=*
Disallow: /tag/*
Allow: /shop/*
Allow: /product-category/*
User-agent: Googlebot-image
Allow: /
Disallow: /wp-admin/
I guess my question is, is this ok or am I doing something wrong? If it is right, how do I get google to realize that some results shouldn't be in the index any more?
I'm aware that I can request removal of pages individually but there is a large amount so I'd rather re-index my entire site if that's the right way to go.
Answer :
User-agent: Googlebot-Image
Disallow: /*.gif$
Disallow: /*.png$
Error is in your code, you allowed Googlebot-image to index your images
User-agent: Googlebot-image
Allow: /
Disallow: /wp-admin/
Refer this :

I facing an issue with robot.txt file

I am using WordPress. Google not crawl all resource of my page. it shows "Page partially loaded". I had all ready tried too many times to solve this issue with robots.txt file. My website return bad gateway error.
Here's screenshot
My website link :
My robots.txt File
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /linkout/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Delete this line Disallow: /index.php. It's blocking the whole website for bots. More information about robots.txt file you can get here

Google Bot Robots.txt tester not working

txt tester not working in my case. I have the below lines in robots.txt.
But in the Tester if i test wp-admin the tools showing allowed. I dont know why? please help me how to disallow wp-admin
User-Agent: Googlebot
Allow: *.css*
Allow: *.js*
Allow: /*.jpg
Allow: /*.gif
Allow: /*.png
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /category
Disallow: /tag
Disallow: /page
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*?*
Disallow: /*.html/$
Disallow: /*feed*
# Google Image
User-agent: Googlebot-Image
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Allow: /*
If you remove the trailing slash, you'll pass, or if you put a page after the wp-admin in the tester, you'd also see your rule would pass (block the bots) like /wp-admin/admin.php
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php

Website Duplicate content detected with google webmaster

We have a website that is based on codeigniter with a wordpress blog in a sub directory. /blog.
Through using google webmaster tools and search results - we are seeing duplicate content mainly for our home page with the following shown after the domain name.
So for example a search on google for on google shows:
These appear to be generated all from the generated from the wordpress blog and we are not sure how to fix?
You could use a robots.txt file to tell Google what they should (and shouldn't) be looking for on your site.
A robots.txt file should live here:
An example robots.txt as taken from the WordPress Codex:
# Google Image
User-agent: Googlebot-Image
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google
# digg mirror
User-agent: duggmirror
Disallow: /
# global
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category/*/*
Disallow: */trackback/
Disallow: */feed/
Disallow: */comments/
Disallow: /*?
Allow: /wp-content/uploads/
Background reading:

How to set up robots.txt file for WordPress

[UPDATE 2013]
I can't find an authoritative page with a format for robots.txt file for WordPress. I promise to maintain one on my site but I want one here on stack overflow.
If you know what your doing please check current draft here:
Everyone else comment on this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Crawl-delay: 4
User-agent: *
Allow: /
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /e/
Disallow: /show-error-*
Disallow: /xmlrpc.php
Disallow: /trackback/
Disallow: /comment-page-
Allow: /wp-content/uploads/
User-agent: Mediapartners-Google
Allow: /
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Image
Allow: /
User-agent: Googlebot-Mobile
Allow: /
I think this code is very authentic for robots.txt file, Just go to Public_HTML and create file with robots.txt and paste above code.
You can make in your Notepad, just copy above code and paste into notpad but remember file name should robots.txt and upload to your public_HTML.
As with all things SEO, things change. I think that the current advice is to have a a very minimal robots.txt file.
Ignoring wp-admin, wp-includes, wp-content, etc. may prevent Google from rendering pages correctly, which it doesn't like.
Check out this article by Yoast:
Create in notepad robots.txt and upload it to public_html in CPANEL .
*remember rename your file notepad to robots before you upload it to public_html
It's not safe to block much in your robots.txt nowadays single Google tries to load all assets to determine "mobile friendliness." At minimum you can block /wp-admin. Here's a more detailed, current answer to the question at the StackExchange forum for WordPress.
