How can I prevent unwanted GET requests with URLs added to parameters - nginx

I have a small ecommerce site, (LEMP stack) and I had used a route like
my.domain.com/makecart?return_url=....
as a means of returning to a point in the previous page to assist selection for the cart.
Over a period of months I started getting thousands of GET requests with unwanted domain links appended to the ?return_url parameter.
I have now reprogrammed this route without the use of any parameters, but my site is still getting the unwanted hits.
e.g. 76.175.182.173 - - [14/Nov/2018:19:36:08 +0000] "GET /makecart?return_url=http://www.nailartdeltona.com/ HTTP/1.0" 302 364 "http://danielcraig.2bb.ru/click.php
I am redirecting such requests to an error page, and have it 'under control' with fail2ban but I am gradually filling up memory with banning information.
Is there a way to prevent these hits before they are plucked back out of the access log?
Furthermore what are they doing anyway?

Related

WordPress: First login of a new browser session always fails

I'm working on a WordPress website: https://samarazakaz.ru/
The client discovered a strange bug. After newly opening a browser the first login always fails, second one succeeds.
I tracked down the issue to a strange cookie with the name RCPC that is being set when the login form is submitted. If the cookie is missing then the login fails regardless of proper credentials.
I searched high and wide for any information about this cookie but could not find anything useful. The only thing remotely resembling my case was on some discussion on a site called https://codeforces.com/ . But nothing on that mentioned anything related to WordPress.
The site has a bare-bones setup with Elementor and my own plugin. And nothing in my code messes with cookies or the login process. I downloaded all website files and search in all files for "RCPC" but found nothing.
The site is behind an Nginx proxy, but I could not find any connection with this cookie and Nginx either.
I noticed that the value of this cookie is constant. So, as a workaround I jerry-rigged my plugin to set this cookie any time when it's not set. But, of course, I'm not very happy with that solution because I don't know if this will just stop working one day.
Update:
I verified that this is coming from the hosting. I renamed the /wp-login.php file and made a request to it, and it didn't return a 404 error but a 200 page with the same redirect code and the header to set the cookie. The hosting is reg.ru .
As far as I can figure this is a counter measure against automated password guessing. Any request (POST, GET, etc) to the /wp-login.php will get the redirect script with the cookie setting header. Only requests containing the correct RCPC cookie will get forwarded.
Upon further testing found that the value of the RCPC cookie is some kind of hash generated from the request's IP address. Because all of our computers got the same one but from other locations its different.
This does not cause any problem if the standard WordPress login form is used because that lives at the /wp-login.php address, so the first GET request will generate the cookie. However, we had a custom login page which didn't access /wp-login.php until the form was submitted.
Based on these discoveries I made a workaround, which is simply adding a one line JS script to the login page which makes a (fetch) request to the /wp-login.php page and simply discards the result. This is enough to set the cookie in the browser so that the form will work at the first try.
Need on hosting disable test-cookie-module

facebookexternalhit/1.1 bot Excessive Requests, Need to Slow Down

facebookexternalhit/1.1 hitting my WooCommerce site badly, causing 503 error. So many requests per second. I tried to slow it down using robot txt and set wordfence rate limit. Nothing works, is there any way to slow down without blocking the bot?
Here's few example of raw access logs.
GET /item/31117/x HTTP/1.0" 301 - "-" "facebookexternalhit/1.1
(+http://www.facebook.com/externalhit_uatext.php) GET
/?post_type=cms_block&p=311 HTTP/1.0" 503 607 "-"
"facebookexternalhit/1.1
GET /item/31117/xiaomi-redmi-router HTTP/1.1" 200 48984 "-"
If someone shares a link to your site on Facebook, on occasion when someone views the link (on Facebook, they don't have to click it), Facebook will reach out and grab the rich embed data (opengraph image etc..). This is a very well known problem if you search around the net.
The solution is to rate limit any useragent containing this text:
facebookexternalhit
The crawler does not respect robots.txt and it can actually be leveraged in DDOS attacks, there are many articles about it:
https://thehackernews.com/2014/04/vulnerability-allows-anyone-to-ddos.html
https://devcentral.f5.com/s/articles/facebook-exploit-is-not-unique
They do not respect robots.txt so you have to rate limit. I'm not familiar with Woocommerce but you can search for "Apache rate limiting" or "nginx rate limiting" depending on which you use, and find many good articles.
I recently received a DDOS attack and they combined it with this Facebook attack method at the same time, the Facebook ASN AS32934, hit 1,060 URLs in 1 second.
I just banned the entire ASN, problem solved.

What do these strange headers mean and what is this hacker trying to do?

I've recently deployed a public website and looking at the nginx access logs I see hackers trying to access different php admin pages (which is fine, I don't use php), but I also see requests like this:
85.239.221.75 - - [27/Dec/2019:14:52:42 +0000] "k\xF7\xE9Y\xD3\x06)\xCF\xA92N\xC7&\xC4Oq\x93\xDF#\xBF\x88:\xA9\x97\xC0N\xAC\xFE>)9>\x0Cs\xC1\x96RB,\xE1\xE2\x16\xB9\xD1_Z-H\x16\x08\xC8\xAA\xAF?\xFB4\x91%\xD9\xDD\x15\x16\x8E\xAB\xF5\xA6'!\xF8\xBB\xFBBx\x85\xD9\x8E\xC9\x22\x176\xF0E\x8A\xCDO\xD1\x1EnW\xEB\xA3D|.\xAC\x1FB\xC9\xFD\x89a\x88\x93m\x11\xEB\xE7\xA9\xC0\xC3T\xC5\xAEF\xF7\x8F\x9E\xF7j\x03l\x96\x92t c\xE4\xB5\x10\x1EqV\x0C5\xF8=\xEE\xA2n\x98\xB4" 400 182 "-" "-"
What is this hacker sending and what are they trying to do? And what should I do to stay ahead of this type of attack?
The data you are having is hex formatted. It is more likely showed because of making HTTPS request to an HTTP request endpoint. Because HTTP expects plain text data and you are giving it HTTPS data which is encrypted, that's why you are seeing bunch of gibberish regarding that log.

What happens if a 302 URI can't be found?

If I make an HTTP request to get index.html on http://www.example.com but that URL has a 302 re-direct in place that points to http://www.foo.com/index.html, what happens if the redirect target (http://www.foo.com/index.html) isn't available? Will the user agent try the original URL (http://www.example.com/index.html) or just return an error?
Background to the question: I manage a legacy site that supports a few existing customers but doesn't allow new signs ups. Pretty much all the pages are redirected (using 302s rather than 301s for some unknown reason...) to a newer site. This includes the sign up page. On one of the pages that isn't redirected there is still a link to the sign up page which itself links through to a third party payment page (i.e. on another domain). Last week our current site went down for a couple of hours and in that period someone successfully signed up through the old site. The only way I can imagine this happened is that if a 302 doesn't find its intended URL some (all?) user agents bypass the redirect and then go to originally requested URL.
By the way, I'm aware there are many better ways to handle the particular situation we're in with the two sites. We're on it! This is just one of those weird situations I want to get to the bottom of.
You should receive a 404 Not Found status code.
Since HTTP is a stateless protocol, there is no real connection between two requests of a user agent. The redirection status codes are just a way for servers to politely tell their clients that the resource they were looking for is somewhere else now. The clients, however, are in no way obliged to actually request the resource from that other URL.
Oh, the signup page is at that URL now? Well then I don't want it anymore... I'll go and look at some kittens instead.
Moreover, even if the client decides to do request the new URL (which it usually does ^^), this can be considered as a completely new communication between server and client. Neither server nor client should remember that there was a previous request which resulted in a redirection status code. Instead, the current request should be treated as if it was the first (and only) request. And what happens when you request a URL that cannot be found? You get a 404 Not Found status code.

Bad requests for WordPress RSS and author URLs

On a popular WordPress site, I'm getting a constant stream of requests for these paths (where author-name is the first and last name of one of the WordPress users):
GET /author/index.php?author=author-name HTTP/1.1
GET /index.rdf HTTP/1.0
GET /rss HTTP/1.1
The first two URLs don't exist, so the server is constantly returning 404 pages. The third is a redirect to /feed.
I suspect the requests are coming from RSS readers or search engine crawlers, but I don't know why they keep using these specific, nonexistent URLs. I don't link to them anywhere, as far as I can tell.
Does anybody know (1) where this traffic is coming from and (2) how I can stop it?
Check Apache logs to get the "where" part.
Stopping random internet traffic is hard. Maybe serve them some other error codes and it will stop. It probably wont tho.
Most my sites have these, most of the time I track them to Asia or the americas, blocking the ip works but if they are few and far between that would be just wasting resources.

Resources