I got a job to get rid of "No information is available for this page" in a website. The website uses Yoast SEO, but it was disabled so I reenabled it and then I got a basic robots.txt like this
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
I have applied that settings for about six hours ago, tried to search it in Google, and still nothing changed. I feel anxious now.
Is this enough for the crawlers to read the website? Do I miss something? Do I need to mess with .htaccess? I have zero exp in SEO, so any help would be very appreciated.
copy and paste it in your robot.txt
User-agent: Googlebot
Disallow:
User-agent: googlebot-image
Disallow:
User-agent: googlebot-mobile
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Robozilla
Disallow:
User-agent: Nutch
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: baiduspider
Disallow:
User-agent: naverbot
Disallow:
User-agent: yeti
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: *
Disallow:
Sitemap: https://www.yoursitename.com/sitemap.xml
Related
I am using WordPress. Google not crawl all resource of my page. it shows "Page partially loaded". I had all ready tried too many times to solve this issue with robots.txt file. My website return bad gateway error.
Here's screenshot
My website link : https://www.alphaclick.in
My robots.txt File
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /linkout/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Sitemap: https://www.alphaclick.in/sitemap_index.xml
Sitemap: https://www.alphaclick.in/post-sitemap.xml
Delete this line Disallow: /index.php. It's blocking the whole website for bots. More information about robots.txt file you can get here
What's below is in my Robot.txt file.
If I want a particular Search engine to have access to the site, but not a few key areas, such as the admin section, the wp-content area, and a folder that is non-existent, is the syntax that I have below correct for google, msn, bing, yahoo, duckduckbot, but to disallow everyone else ?
User-agent: Googlebot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: MSNBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Bingbot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Slurp
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: DuckDuckBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Google (+https://developers.google.com/+/web/snippet/)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Image/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Video/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: *
Disallow: *
Syntax is correct, but approach is wrong.
1. Never block your content
Google (and many other search engines) fully renders your page. If you block access to images, Google drops down your position is search results, just for a case. Googlebot cannot understand if your page is full of broken links to images, or not.
This is a quote from Maile Ohye, Google Developer Programs Tech Lead:
“We recommend making sure Googlebot can access any embedded resource that meaningfully contributes to your site’s visible content or its layout”
2. Do not block /wp-admin/admin-ajax.php
When you block access to /wp-admin/ entirely, no ajax content is available for robots. That is why standard robots.txt generated by WordPress on the fly is as follows:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
3. Do not block other bots
List of search bots are wider than shown in your question, and grows occasionally. In your list, for example, Googlebot-Mobile does not present. The last statement in your file blocks access to this bot with evident results for mobile search.
It is better not to invent a bicycle, but use standard WordPress robots.txt settings shown above or even wider settings by Yoast SEO plugin (1+ million installs).
txt tester not working in my case. I have the below lines in robots.txt.
But in the Tester if i test wp-admin the tools showing allowed. I dont know why? please help me how to disallow wp-admin
User-Agent: Googlebot
Allow: *.css*
Allow: *.js*
Allow: /*.jpg
Allow: /*.gif
Allow: /*.png
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /category
Disallow: /tag
Disallow: /page
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*?*
Disallow: /*.html/$
Disallow: /*feed*
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
If you remove the trailing slash, you'll pass, or if you put a page after the wp-admin in the tester, you'd also see your rule would pass (block the bots) like /wp-admin/admin.php
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
I'd like to noindex a few images that are on my website, how to disallow the robots from indexing them?
I edited the robots.txt, here's what it looks like:
User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: Googlebot-Image
Disallow: /wp-content/uploads/2016/06/image4.jpg
Disallow: /wp-content/uploads/2016/06/image3.jpg
Disallow: /wp-content/uploads/2016/05/image2.jpg
Disallow: /wp-content/uploads/2016/06/image1.jpg
One of the images that wasn't supposed to be indexed, appears in the Google Image results.
Thank you
You are doing it right, but I would presume Google already indexed your images before you disallowed them in your robots.txt.
Remove them from Googles index in Search Console:
https://www.google.com/webmasters/tools/url-removal
After that, fetch and submit your site again.
Well, I have problem with Google Bot. Taking 700MB of bandwidth daily. This is for those which will obviously ask why I want to do this.
I know about robots.txt and that I can stop bots to index some folders.
But what in WordPress, I am using post-name permalinks, so permalinks for posts and pages are just /page or /post.
Searched for any plugin to restrict bot on indexing only few tags and few categories, didn't found it.
Want to allow sticky posts, few categories, few tags.
Can be done? How?
I have update on this question.
I decided to go with robots.txt rulles.
User-agent: *
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: AhrefsBot/3.1
Disallow: /
User-agent: Yahoo-slurp
Disallow: /
User-agent: Msnbot
Disallow: /
User-agent: Googlebot
Allow: /
Disallow: /category
Disallow: /video
Disallow: /author
Disallow: /?s=
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /tag
Allow: /tag/marry
Allow: /tag/john
Will last two tags be indexed?
And is there something more to hide in WordPress?
If you want to allow particular posts but disallow everything else, then use Allow tags. For example:
User-agent: Googlebot
Allow: /post/foo
Allow: /page/bar
Disallow: *
So the bot can crawl the pages you specify, but not anything else.