how to noindex a few images with the robots.txt file - wordpress

I'd like to noindex a few images that are on my website, how to disallow the robots from indexing them?
I edited the robots.txt, here's what it looks like:
User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: Googlebot-Image
Disallow: /wp-content/uploads/2016/06/image4.jpg
Disallow: /wp-content/uploads/2016/06/image3.jpg
Disallow: /wp-content/uploads/2016/05/image2.jpg
Disallow: /wp-content/uploads/2016/06/image1.jpg
One of the images that wasn't supposed to be indexed, appears in the Google Image results.
Thank you

You are doing it right, but I would presume Google already indexed your images before you disallowed them in your robots.txt.
Remove them from Googles index in Search Console:
https://www.google.com/webmasters/tools/url-removal
After that, fetch and submit your site again.

Related

Google index: robots.txt to stop wp uploads indexing

I have a Wordpress site that is being indexed by google, but google is picking up images as search results - ie if I do site:mysite.com I see loads of results which, when clicked on, just go to images from wp-content/uploads/
How do I stop these from coming up in search results, whilst still allowing them in google images?
I've made changes to my robots.txt so the first bit reads:
User-agent:*
Noindex: /product-tag/*
Noindex: /product-tag/
Noindex: /wp-content/uploads/*
Noindex: /forum/profile/*
Noindex: /my-account/*
Noindex: /my-account/
Noindex: /?s=*
Noindex: /tag/*
Disallow: /wp-admin/
Disallow: /wp-content/uploads/*
Disallow: /product-tag/*
Disallow: /product-tag/
Disallow: /forum/profile/*
Disallow: /my-account/*
Disallow: /my-account/
Disallow: /?s=*
Disallow: /tag/*
Allow: /shop/*
Allow: /product-category/*
User-agent: Googlebot-image
Allow: /
Disallow: /wp-admin/
I guess my question is, is this ok or am I doing something wrong? If it is right, how do I get google to realize that some results shouldn't be in the index any more?
I'm aware that I can request removal of pages individually but there is a large amount so I'd rather re-index my entire site if that's the right way to go.
Answer :
User-agent: Googlebot-Image
Disallow: /*.gif$
Disallow: /*.png$
Error is in your code, you allowed Googlebot-image to index your images
User-agent: Googlebot-image
Allow: /
Disallow: /wp-admin/
Refer this : https://support.google.com/webmasters/answer/35308?hl=en

Yoast SEO how to allow crawler bot

I got a job to get rid of "No information is available for this page" in a website. The website uses Yoast SEO, but it was disabled so I reenabled it and then I got a basic robots.txt like this
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
I have applied that settings for about six hours ago, tried to search it in Google, and still nothing changed. I feel anxious now.
Is this enough for the crawlers to read the website? Do I miss something? Do I need to mess with .htaccess? I have zero exp in SEO, so any help would be very appreciated.
copy and paste it in your robot.txt
User-agent: Googlebot
Disallow:
User-agent: googlebot-image
Disallow:
User-agent: googlebot-mobile
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Robozilla
Disallow:
User-agent: Nutch
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: baiduspider
Disallow:
User-agent: naverbot
Disallow:
User-agent: yeti
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: *
Disallow:
Sitemap: https://www.yoursitename.com/sitemap.xml

I facing an issue with robot.txt file

I am using WordPress. Google not crawl all resource of my page. it shows "Page partially loaded". I had all ready tried too many times to solve this issue with robots.txt file. My website return bad gateway error.
Here's screenshot
My website link : https://www.alphaclick.in
My robots.txt File
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /linkout/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Sitemap: https://www.alphaclick.in/sitemap_index.xml
Sitemap: https://www.alphaclick.in/post-sitemap.xml
Delete this line Disallow: /index.php. It's blocking the whole website for bots. More information about robots.txt file you can get here

Correct Syntax for Robot.txt File?

What's below is in my Robot.txt file.
If I want a particular Search engine to have access to the site, but not a few key areas, such as the admin section, the wp-content area, and a folder that is non-existent, is the syntax that I have below correct for google, msn, bing, yahoo, duckduckbot, but to disallow everyone else ?
User-agent: Googlebot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: MSNBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Bingbot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Slurp
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: DuckDuckBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Google (+https://developers.google.com/+/web/snippet/)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Image/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Video/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: *
Disallow: *
Syntax is correct, but approach is wrong.
1. Never block your content
Google (and many other search engines) fully renders your page. If you block access to images, Google drops down your position is search results, just for a case. Googlebot cannot understand if your page is full of broken links to images, or not.
This is a quote from Maile Ohye, Google Developer Programs Tech Lead:
“We recommend making sure Googlebot can access any embedded resource that meaningfully contributes to your site’s visible content or its layout”
2. Do not block /wp-admin/admin-ajax.php
When you block access to /wp-admin/ entirely, no ajax content is available for robots. That is why standard robots.txt generated by WordPress on the fly is as follows:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
3. Do not block other bots
List of search bots are wider than shown in your question, and grows occasionally. In your list, for example, Googlebot-Mobile does not present. The last statement in your file blocks access to this bot with evident results for mobile search.
It is better not to invent a bicycle, but use standard WordPress robots.txt settings shown above or even wider settings by Yoast SEO plugin (1+ million installs).

How to let or restrict Google Bot index or crawl certain things in Wordpress?

Well, I have problem with Google Bot. Taking 700MB of bandwidth daily. This is for those which will obviously ask why I want to do this.
I know about robots.txt and that I can stop bots to index some folders.
But what in WordPress, I am using post-name permalinks, so permalinks for posts and pages are just /page or /post.
Searched for any plugin to restrict bot on indexing only few tags and few categories, didn't found it.
Want to allow sticky posts, few categories, few tags.
Can be done? How?
I have update on this question.
I decided to go with robots.txt rulles.
User-agent: *
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: AhrefsBot/3.1
Disallow: /
User-agent: Yahoo-slurp
Disallow: /
User-agent: Msnbot
Disallow: /
User-agent: Googlebot
Allow: /
Disallow: /category
Disallow: /video
Disallow: /author
Disallow: /?s=
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /tag
Allow: /tag/marry
Allow: /tag/john
Will last two tags be indexed?
And is there something more to hide in WordPress?
If you want to allow particular posts but disallow everything else, then use Allow tags. For example:
User-agent: Googlebot
Allow: /post/foo
Allow: /page/bar
Disallow: *
So the bot can crawl the pages you specify, but not anything else.

Resources