I am dealing with a client who has been blacklisted by Google Adwords because the Googlebot crawlers that are crawling their site is finding tons of weird links that redirect via a 301 code. Some examples:
216.244.66.238 - - [15/Mar/2022:00:22:33 +0000] "GET /ffdd1g/hytera-phone.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help#moz.com)"
216.244.66.238 - - [15/Mar/2022:00:22:34 +0000] "GET /ffdd1g/od-tools-houdini.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help#moz.com)"
216.244.66.238 - - [15/Mar/2022:00:22:42 +0000] "GET /7oh5yny/fujifilm-classic-chrome.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help#moz.com)"
216.244.66.238 - - [15/Mar/2022:00:22:45 +0000] "GET /7oh5yny/fusion-360-join-line-segments.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help#moz.com)"
But when I recreate any of the requests in my browser or with curl, it 404s correctly. The fact that Google is seeing 301s is what caused them to be blacklisted by Google AdWords. Why could this be happening and how can I make sure that all invalid links always return 404 instead of 301.
This is a WordPress website by the way in-case it makes a difference. Thank you.
Related
I was checking my nginx access.log and I found bellow info.
From there is showing my redirection task: redirect to my site webcovid19.live from another portal sme.sk
How is this possible ? Is it hidden somewhere in HTML protocol ?
How webserver knows about redirection ?
Use Google Analytics the same logic ? direct vs referral
85.216.x.x - - [24/May/2020:08:50:52 +0000] "GET / HTTP/1.1" 200 1358 "https://domov.sme.sk/diskusie/3671287/2/koronavirus-
slovensko-minuta-po-minute-23-maj-2020.html" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0"
85.216.x.x - - [24/May/2020:08:50:52 +0000] "GET /styles.css HTTP/1.1" 200 725 "https://webcovid19.live/" "Mozilla/5.0 (X11
; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0"
Many Thanks Incognito :)
Issue solved:
en.wikipedia.org/wiki/HTTP_referer
I'm running an e-commerce store on top of Wordpress/Woo-commerce and I'm wondering whether it's normal to have an almost non-stop GET request log in apache's access log.
My website is hosted on Amazon EC2 running on Wordpress Bitnami's image.
Here's part of the log:
172.31.33.229 - - [09/May/2020:14:18:10 +0000] "POST /wp-cron.php?doing_wp_cron=1589033890.9472939968109130859375 HTTP/1.1" 200 -
172.31.33.229 - - [09/May/2020:14:18:10 +0000] "GET /product-category/printable-templates/wedding-templates/wedding-invitation-templates?query_type_color=or&filter_color=bluebrowncoralgreenturquoise&product_orderby=rating HTTP/1.1" 301 -
172.31.33.229 - - [09/May/2020:14:18:11 +0000] "GET /product-category/printable-templates/wedding-templates/wedding-invitation-templates/?query_type_color=or&filter_color=bluebrowncoralgreenturquoise&product_orderby=rating HTTP/1.1" 200 17499
172.31.33.229 - - [09/May/2020:14:18:15 +0000] "GET /product-category/printable-templates/wedding-templates/wedding-invitation-templates?query_type_color=or&filter_color=purpleredturquoise&product_view=list&product_count=45 HTTP/1.1" 301 -
172.31.33.229 - - [09/May/2020:14:18:16 +0000] "GET /product-category/printable-templates/wedding-templates/wedding-invitation-templates/?query_type_color=or&filter_color=purpleredturquoise&product_view=list&product_count=45 HTTP/1.1" 200 17390
172.31.33.229 - - [09/May/2020:14:18:21 +0000] "GET /product-category/printable-templates/wedding-templates?query_type_color=or&filter_color=black%2Cblue%2Ccoral%2Cmagenta%2Corange%2Cpeach%2Cturquoise HTTP/1.1" 301 -
172.31.33.229 - - [09/May/2020:14:18:22 +0000] "GET / HTTP/1.1" 301 230
What's weird is that eventually, it logs 100% CPU usage causing my server to go frozen. If I restart the EC2 instance, everything will be back to normal again until after around more than 12hours on the average.
Note that 172.x.x.x is part of my subnet, I don't understand why I have this log.
Another clue would be in the top, what's eating my CPU is numerous entries of
php-fpm: pool wordpress.
The URL is https://templatesandvectors.com.
Update
After digging through my Apache access logs I believe the issue is related to hosting, not Google Analytics. Clicking a link to my site sometimes results in the Referer information getting dropped.
Here are a few example entries. Some contain a valid referer (Twitter, for example) while others just contain a -. For some of these cases, I was tailing the access log while clicking a link from another site so I know it should have a valid referer.
X.X.X.X - - [10/Jun/2019:03:06:10 +0000] "GET / HTTP/1.0" 200 12153 "https://twitter.com/PxJVBrEB7T" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0"
X.X.X.X - - [10/Jun/2019:03:12:37 +0000] "GET / HTTP/1.0" 200 13535 "https://twitter.com/PTVIpLWqE9" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0"
X.X.X.X - - [10/Jun/2019:03:50:39 +0000] "GET / HTTP/1.0" 200 12308 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0"
This is a Plesk managed hosting account using Apache + Nginx cache. Is it possible Apache is misconfigured and dropping the referer information? Or is more likely something to do with the referring website?
There are multiple referring websites where links aren't showing up properly in the access log, so I suspect it's on the server side but I'm not sure what more to check.
I have a VPS running Nginx, which currently serves only static content.
Once I was looking at the log and noticed some strange requests:
216.244.66.239 - - [03/Jan/2019:15:04:26 +0100] "GET /en/profile/Souxy HTTP/1.1" 200 4650 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help#moz.com)"
216.244.66.239 - - [03/Jan/2019:15:04:28 +0100] "GET /en/view/8gIi2vad8Y HTTP/1.1" 200 4650 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help#moz.com)"
this is crawler. On this link is descriptin https://moz.com/help/moz-procedures/crawlers/dotbot. Maybe it is indexing your website.
You can block this requests on firewall or add file robots.txt with content
User-agent: dotbot
Disallow: /
when i try to add route with path /config, it shows 404 not found. Strange thing is that its not regular 404 symfony error that shows when i enter non existing route Here is apache access log:
127.0.0.1 - - [03/Feb/2015:11:32:26 +0100] "GET /config HTTP/1.1" 404 499 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0"
127.0.0.1 - - [03/Feb/2015:11:35:00 +0100] "GET /configasd HTTP/1.1" 404 743 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0"
127.0.0.1 - - [03/Feb/2015:11:36:42 +0100] "GET /configasdasd HTTP/1.1" 404 743 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0"
You can see that accessing /config generates 404 499 error code while accessing another non existing route generates 404 743 error code.
My question is: is "config" an reserved word for using in routes ? Is there an complete list of such words in symfony ?
EDIT: Route configuration:
in app/config/routing.yml:
myapp_config:
resource: "#MyappConfigBundle/Resources/config/routing.yml"
prefix: /config
in MyappConfigBundle/Resources/config/routing.yml:
myapp_config:
path: /
defaults: { _controller: MyappConfigBundle:Config:index }
The status code for response is only 404, 499 and 743 - are sizes of response.
Your server is configured to have at /config path some other resource. It may be configured with some global alias or you can just have file/folder or symlink/hardlink with name config in your web directory.
Check all the cases and you will solve your problem.