robots.txt disallow /variable_dir_name/directory - directory

I need to disallow /variable_dir_name/directory via robots.txt
I use:
Disallow: */directory
Noindex: */directory
is that correct?

The following should work in your robots.txt:
User-Agent: *
Disallow: /*/directory
Further reading from Google: Block or remove pages using a robots.txt file

Indeed, there was the opportunity of GoogleBot that allowed to use these in the Robots.txt:
Noindex
Nofollow
Crawl-delay
But seen on the GoogleBlog-News they will no longer support those (0,001% used) commands anymore from September 2019 on. So you should only use meta tags anymore for these on your page to be safe for the future.
What you really should do, is the following:
Disallow via robots.txt and
Noindex already indexed documents via Google Search Console

Related

Robots.txt Disallow file - should this be left empty?

Need help on this robots.txt question. My default file looks something like this
User-agent: *
Disallow:
Sitemap: https://mywebsite.com/sitemap_index.xml
Problem is that with this configuration, Google deindexed almost all (at the time of this writing) of my URLs.
Is it correct to leave the disallow field blank?
Yes, it's technically correct.
This means that all user agents, including search engines, can access to your website pages.
The asterisk near user-agent means that it applies to all user agents.
Nothing is listed after disallow, this means there are no restrictions at all.

Keeping robots.txt blank

I have couple of wordpress sites and with the current google seo algorithm update a site should be mobile friendly (here)
My query here is as follows, Currently I have written a rule in robots.txt to disallow crawling the url's with wp-
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /feed
Disallow: /*/feed
Disallow: /wp-login.php
I don't want google to crawl the above url's. Earlier it was working fine but now with the recent google algorithm update, when I disallow these url's It will start giving errors in the mobile friendly test (here). As all my CSS and JS are behind the wp- url's. I am wondering how can I fix this one.
Any suggestions appreciated.
If you keep the crawler away from those files your page may look and work different to Google than it looks to your visitors. This is what Google wants to avoid.
There is no problem in allowing Google to access the CSS or JS files as anyone else who can open your HTML-source and read links can access them either.
Therefore Google definitely wants to access the CSS and JS files used on your page:
https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
Those files are needed to render your pages.
If your site’s robots.txt file disallows crawling of these assets, it directly harms how well our algorithms render and index your content. This can result in suboptimal rankings.
If you are dependent on mobile rankings you must follow Googles guidelines. If not, feel free to block the crawler.

Robots.txt: ALLOW Google Fonts

I've been testing my website with Google Webmaster Tools and when I tried to "fetch it as Googlebot" I got a "Partial" status and a note that three EXTERNAL css files, namely 3 Google fonts, had been blocked for some reason by robots. txt.
Now, here's my file:
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: http://example.com/sitemapindex.xml
Is there something wrong with it that might be preventing access to said files?
Thanks!
If robots.txt is blocking external CSS files, then it will be the robots.txt for the server hosting those files, not the one for your main hostname.
I don't know why you would worry about Googlebot being unable to read your stylesheets though.

How to block redirect search in robots in wordpress

In my wordpress site I've redirected my search from
domain.com/?s=search_term
to
domain.com/search/search_term
In robots.text should I change ?
Disallow: /*?
to
Disallow: /search/
How can I test that is working properly ?
If you want to disallow search then, yes, you should have the line:
Disallow: /search/
Google's Webmaster tools has a robots.txt checker. You can use that to test your robots.txt file for validity.

Why is Google Webmaster Tools completely misreading my robots.txt file?

Below is the entire content of my robots.txt file.
User-agent: *
Disallow: /marketing/wp-admin/
Disallow: /marketing/wp-includes/
Sitemap: http://mywebsite.com/sitemap.xml.gz
It is the one apparently generated by Wordpress. I haven't manually created one.
Yet when I signed up for Google Webmaster tools today. This is the content of that Google Webmasters tools is seeing:
User-agent: *
Disallow: /
... So ALL my urls are blocked!
In Wordpress, settings > reading > search engine visibility: "Discourage search engines from indexing this site" is not checked. I unchecked it fairly recently. (Google Webmaster tools is telling me it downloaded my robots.txt file on Nov 13, 2013.)
...So why is it still reading the old version where all my pages are disallowed, instead of the new version?
Does it take a while? Should I just be patient?
Also what is the ".gz" on the end of my sitemap line? I'm using the Yoast All-in-One SEO pack plugin. I'm thinking the plugin added the ".gz", whatever that is.
You can ask Googlebot to crawl again after you've changed your robots.txt. See Ask Google to crawl a page or site for information.
The Sitemap file tells Googlebot more about the structure of your site, and allows it to crawl more effectively. See About Sitemaps for more info.
The .gz is just telling Googlebot that the generated sitemap file is compressed.
A WordPress discussion on this topic can be found here: https://wordpress.org/support/topic/robotstxt-wordpress-and-google-webmaster-tools?replies=5

Resources