Correct Syntax for Robot.txt File? - wordpress

What's below is in my Robot.txt file.
If I want a particular Search engine to have access to the site, but not a few key areas, such as the admin section, the wp-content area, and a folder that is non-existent, is the syntax that I have below correct for google, msn, bing, yahoo, duckduckbot, but to disallow everyone else ?
User-agent: Googlebot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: MSNBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Bingbot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Slurp
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: DuckDuckBot
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Google (+https://developers.google.com/+/web/snippet/)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Image/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Googlebot-Video/1.0
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Allow: *
Disallow: /wp-admin/*
Disallow: /wp-content/*
Disallow: /docs/*
User-agent: *
Disallow: *

Syntax is correct, but approach is wrong.
1. Never block your content
Google (and many other search engines) fully renders your page. If you block access to images, Google drops down your position is search results, just for a case. Googlebot cannot understand if your page is full of broken links to images, or not.
This is a quote from Maile Ohye, Google Developer Programs Tech Lead:
“We recommend making sure Googlebot can access any embedded resource that meaningfully contributes to your site’s visible content or its layout”
2. Do not block /wp-admin/admin-ajax.php
When you block access to /wp-admin/ entirely, no ajax content is available for robots. That is why standard robots.txt generated by WordPress on the fly is as follows:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
3. Do not block other bots
List of search bots are wider than shown in your question, and grows occasionally. In your list, for example, Googlebot-Mobile does not present. The last statement in your file blocks access to this bot with evident results for mobile search.
It is better not to invent a bicycle, but use standard WordPress robots.txt settings shown above or even wider settings by Yoast SEO plugin (1+ million installs).

Related

Yoast SEO how to allow crawler bot

I got a job to get rid of "No information is available for this page" in a website. The website uses Yoast SEO, but it was disabled so I reenabled it and then I got a basic robots.txt like this
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
I have applied that settings for about six hours ago, tried to search it in Google, and still nothing changed. I feel anxious now.
Is this enough for the crawlers to read the website? Do I miss something? Do I need to mess with .htaccess? I have zero exp in SEO, so any help would be very appreciated.
copy and paste it in your robot.txt
User-agent: Googlebot
Disallow:
User-agent: googlebot-image
Disallow:
User-agent: googlebot-mobile
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Robozilla
Disallow:
User-agent: Nutch
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: baiduspider
Disallow:
User-agent: naverbot
Disallow:
User-agent: yeti
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: *
Disallow:
Sitemap: https://www.yoursitename.com/sitemap.xml

I facing an issue with robot.txt file

I am using WordPress. Google not crawl all resource of my page. it shows "Page partially loaded". I had all ready tried too many times to solve this issue with robots.txt file. My website return bad gateway error.
Here's screenshot
My website link : https://www.alphaclick.in
My robots.txt File
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /linkout/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Sitemap: https://www.alphaclick.in/sitemap_index.xml
Sitemap: https://www.alphaclick.in/post-sitemap.xml
Delete this line Disallow: /index.php. It's blocking the whole website for bots. More information about robots.txt file you can get here

google robot only show HTML structure for mobile device not apply CSS. why?

I have created a wordpress website with responsive, but when we have done mobile friendly test in google, it has shown me "your/my site is not mobile friendly".
I have observed that the CSS is not render by google robot. I have also allow google robot to search CSS file in our site by using robot.text file.
I'm putting my robot.txt file code below:
User-Agent: *
Allow: /
Sitemap: http://example.com/sitemap.xml
Disallow: #Any folders we should not be allowing search bots to crawl.
Disallow: /wp-admin/
Disallow: /wp-content/cache
Disallow: /category/*/*
Disallow: /staging/
Disallow: /.hcc.thumbs/
Disallow: /10finsbury/
Disallow: /_db_backups/
Disallow: /affemailprdxn/
Disallow: /ajax/
Disallow: /assets_old/
Disallow: /assets_unk/
Disallow: /build_unk/
Disallow: /cgi_unk/
Disallow: /ciheropractice/
Disallow: /dekadesign-galaxy/
Disallow: /dekadesign-seroquel/
Disallow: /demowp/
Disallow: /dpulp/
Disallow: /eddynamics/
Disallow: /eddynamics730/
Disallow: /empwpstaging/
Disallow: /facebook-api/
Disallow: /furiousminds/
Disallow: /galaxy-cms/
Disallow: /galaxy-image-testing/
Disallow: /galaxy-test/
Disallow: /handt/
Disallow: /icehouse/
Disallow: /inxpress/
Disallow: /lead-usa/
Disallow: /liebhauserhome-ps/
Disallow: /maintenance/
Disallow: /ngbyliebhauser/
Disallow: /ngrebuild/
Disallow: /PIEFiles/
Disallow: /themefiletrans/
Disallow: /thememedwards/
Disallow: /repository/
Disallow: /staging/
Disallow: /staging/prdxn/
Disallow: /stats/
Disallow: /timeclock/
Disallow: /qcoal/
Disallow: /zhero-palma/
Disallow: /zherokappl-cms/
Disallow: /wp-content/themes/theme-wp/ANZBAIJuly152013.php/ANZBAI
Disallow: /wp-login.php/
Disallow: /wp-register.php/
Disallow: /*.php$
Disallow: /*.inc$
Disallow: /p13n_*
Disallow: /*.dll
Disallow: /servicetechnologies/
Disallow: /servicetechnologies
Disallow: /servicetechnologies/*
Disallow: /work-projects
Disallow: /work-casestudy
Disallow: /testimonial
Disallow: /testimonial/
Disallow: /testimonial/*
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
Allow: /*.css$
User-agent: Adsbot-Google
Allow: /
Allow: /*.css$
User-agent: Googlebot-Mobile
Allow: /
Allow: /*.css$
the screen short given below:
Screen short of mobile not friendly
I have noticed that the CSS file blocks from the .htaccess file.
So I have removed those line from .htaccess file now it is working fine. The issue has resolved.
Thanks for your comment.
As far as bots are concerned, CSS files are not a thing. When any bot(including) GoogleBot tries to index your site, it is pretty irrelevant what color your header is. This is why css files are never(even though you have specified it in your robots.txt) indexed. So don't worry about it, your search listings won't be affected at all.

Google Bot Robots.txt tester not working

txt tester not working in my case. I have the below lines in robots.txt.
But in the Tester if i test wp-admin the tools showing allowed. I dont know why? please help me how to disallow wp-admin
User-Agent: Googlebot
Allow: *.css*
Allow: *.js*
Allow: /*.jpg
Allow: /*.gif
Allow: /*.png
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /category
Disallow: /tag
Disallow: /page
Disallow: /author
Disallow: /trackback
Disallow: /*trackback
Disallow: /*trackback*
Disallow: /*/trackback
Disallow: /*?*
Disallow: /*.html/$
Disallow: /*feed*
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*
# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
If you remove the trailing slash, you'll pass, or if you put a page after the wp-admin in the tester, you'd also see your rule would pass (block the bots) like /wp-admin/admin.php
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php

How to let or restrict Google Bot index or crawl certain things in Wordpress?

Well, I have problem with Google Bot. Taking 700MB of bandwidth daily. This is for those which will obviously ask why I want to do this.
I know about robots.txt and that I can stop bots to index some folders.
But what in WordPress, I am using post-name permalinks, so permalinks for posts and pages are just /page or /post.
Searched for any plugin to restrict bot on indexing only few tags and few categories, didn't found it.
Want to allow sticky posts, few categories, few tags.
Can be done? How?
I have update on this question.
I decided to go with robots.txt rulles.
User-agent: *
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: AhrefsBot/3.1
Disallow: /
User-agent: Yahoo-slurp
Disallow: /
User-agent: Msnbot
Disallow: /
User-agent: Googlebot
Allow: /
Disallow: /category
Disallow: /video
Disallow: /author
Disallow: /?s=
Disallow: /feed/
Disallow: /xmlrpc.php
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /tag
Allow: /tag/marry
Allow: /tag/john
Will last two tags be indexed?
And is there something more to hide in WordPress?
If you want to allow particular posts but disallow everything else, then use Allow tags. For example:
User-agent: Googlebot
Allow: /post/foo
Allow: /page/bar
Disallow: *
So the bot can crawl the pages you specify, but not anything else.

Resources