Can anyone please help? There are thousands of spammy links indexed by google, almost 300k of them; it is not possible to scrap all those URLs and then submit them one by one via the Google Search Console removal function; it will take months to complete; I am wondering if any anyone can guide how to deindex them all at once
format of the URLs is as the following
domain.com/module.php?na57785546tq.html
domain.com/315kfpgsfv.html
please help
Someone suggested me the following code. Would that work if I added it in .htaccess?
# deindex all pages with .php or .html extension and set the response code to 410
RewriteEngine On
RewriteRule ^(.+)\.(php|html)$ - [R=410,L]
Related
Still a bit new at this so please go easy on me and I have tried searching for an answer but have been unable to find one that specifically helps me. I have a wordpress site that I have built a custom php page to deliver results from a database to display information and have manually inserted code to display the header and footer of my wordpress theme.
Also, in my .htaccess I have used the
rewriterule ^used-boats-for-sale/(.*)/(.*)/(.*)/(.*)/(.*)/$ /boatdetails.php?recordnumber=$5 [L]
to forward the friendly url so the php page can process the correct record number to process and display the results. This was the only way I could figure out how to do this in wordpress and bypass wordpress's subfolder killer.
My issue arises as google webmaster tools is telling me I have duplicate title tags and descriptions, even though I have added a canonical reference to the friendly url.
Examples of both can be seen here:
http://www.unitedyacht.com/used-boats-for-sale/hinckley/downeast/cruiser/2000/1807/
and
http://www.unitedyacht.com/boatdetails.php?recordnumber=1807
So with the canonical, why is google still showing it as a dupe and what can I do to correct it? If I put a r=301 in, it redirects the url completely and removes the friendly url from the browser.
I am new to Stackoverflow but a friend gave me a tip to ask my question over here since he couldn't help me as well. I have google's for multiple days now and I see that my rankings are dropping again in google because of all the crawl errors. My main site is build in serif webplus X5. I have added a wordpress blog to it which can be found at www.sitename .com/blog
Google has found more than 150 crawling errors and this is growing on a daily base, the point is that google ads behind all my blog url's /default.htm
I was wondering if someone can write me a htaccess 301 code for all these url's so it will instant redirect?
Today I started with manually redirecting some url's but this will not solve my problem because everytime I add another post and new tags all these new page's will also have the same default issue.
As you can imagine this is really frustrating grrr...
I have tried a lot of code's that I had found during my search but none of them did what I would like to achieve, other tips to get rid of the default page's are also very welcome.
Thank all of you who would like to fix this problem with me
Place this rule just below RewriteEngine On rule in main WP .htaccess:
RewriteCond %{THE_REQUEST} /default\.htm [NC]
RewriteRule ^(.*?)default\.htm$ /blog/$1 [L,R=301,NC,NE]
I have a WordPress blog which is functioning just fine - the URLs are set to Month/Day/Year and everything on the front-end looks and functions fine.
However, when checking my stats and Google Webmaster Tools, there's tons of 404s that look like this:
http://theURL.com/normal-wordpress-url/index.htm
Of course, index.htm does not exist at the end of the WordPress URL, so the search engine is given a 404.
I have no idea what's causing this, as everything works fine for humans.
So basically, I need a way to tell search engines to forget about the index.htm at the end of the URL.
I've tried this in the .htacess with no luck:
RewriteCond %{REQUEST_URI} /index\.htm?$ [NC]
RewriteRule ^(.*)index\.htm?$ "/$1" [NC,R=301,NE,L]
Does anybody have any suggestions?
Maybe there are different problems in here that may need solution:
Problem 1: If the crawler is the one pointing to this page there are two things that you might need to do:
Try to go to Webmaster and delete "index.htm"
Try to create a robot that will disallow "index.htm" from being seen on Google Crawler.
Problem 2: If you have distributed your urls to point to this url, Google Webmaster can tell you which webpage it is coming from exactly.
So, try to make sure that all the links pointing to "index.htm" are removed from all other urls.
Could anyone please help me? I am at the last chance saloon and losing a lot of traffic. Any help would be greatfully received.
After a year based on my permalink structure, all posts were in the root so have been picked up by Google as:
snowmenu.com/postname
Since changing my categories and permalink structure, I need the years worth of posts on Google to be redirected to:
snowmenu.com/ski-snowboard-winter-sports-news/postname
Is there a way to make this happen via .htaccess?
Thank you very much to anyone who's able to help me.
Just had a look at the website and I am afraid from my knowledge their is no easy way to do this type of forwarding with .htaccess.
This is because there is no way to tell the difference in link structure from a "normal link" like (eg http://www.snowmenu.com/ski-resorts/) and what you want to be redirected to (eg http://www.snowmenu.com/ski-snowboard-winter-sports-news/latest-ski-news/). If you redirect all requests you will end up having links like http://www.snowmenu.com/ski-snowboard-winter-sports-news/ski-resorts/ which if I am right is not desirable?
The long solution would be to create a htaccess redirect for EVERY URL.
The only other solution that comes to mind is using PHP (or simular) to do a redirect within your 404 document.
EDIT
This will redirect ALL requests to the page you want. But as I said before I dont think this is what you want?
RewriteRule ^(?!ski-snowboard-winter-sports-news)(.*)$ /ski-snowboard-winter-sports-news/$1 [L,R=301]
EDIT 2
Having given it some thought I think I have have come up with a viable option. This will check to see if the requested file exists, if so it will redirect to your new directory (in theory :P).
RewriteCond %{DOCUMENT_ROOT}/ski-snowboard-winter-sports-news/$0 -f
RewriteRule ^(.*)$ /ski-snowboard-winter-sports-news/$1 [R=301,L]
You can use this plugin to avoid messing with .htaccess file directly:
http://wordpress.org/extend/plugins/redirection/
It has a nice interface for you to configure the redirection rules.
The plugin mentioned by #Wordpress Hardcore works best.
My (wordpress) site was recently the victim of an attack...ended up with around 20,000 injected URLs. I've since cleaned up the site completely, plugged all the holes, and have installed further hardened the files, but I'm still left with all these URLs in the google index & a message on Google that says "This site may be hacked" because of all these spammy URLs. It's just not realistic to be able to go through & add them to the Webmasters URL Remove tool. I've heard the best way is to get them to display 404 (or 403) and they'll naturally fall out of the index.
Here's what I'd like to do, but haven't figured out how to do it yet: I'd like to come up with a way to force any URL with a certain parameter to display a 404 or 403. For example, the below URL is a good representation of the URLs that are currently indexed:
http://mysiteurlhere.com/index.php?free-online-games-with-cash-prizes.html&items=2&pidnum=1568
Both "items" & "pidnum" are parameters that are used in every single indexed URL that I've seen. My question is: would it be possible to single out one of those parameters with some sort of .htaccess statement, and block or force the URL to 404?
(note: I did go through the robots.txt to disallow any further URLs with parameters like these from being indexed...I just don't know how to do the .htaccess method)
Try with .htaccess:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)items=.*pidnum= [NC]
RewriteRule ^ - [R=404,L]