Website nginx flooded with no such file or directory request with a strange request, how do I block them? - wordpress

so checking the website logs I've found 2000 requests x day, with different base URL but 2 different type of ending string, here the examples:
*var/www/vhosts/domain.com/httpdocs/random-slug/*],thor-cookies,div.cookie-alert,div.cookie-banner,div.cookie-consent,div.cookie-content,div.cookie-layer,div.cookie-notice,div.cookie-notification,div.cookie-overlay,div.cookieHolder,div.cookies-visible,div.gdpr,div.js-disclaimer,div.privacy-notice,div.with-cookie,.as-oil-content-overlay
Second one:
*var/www/vhosts/domain.com/httpdocs/random-slug/*],sibbo-cmp-layout,thor-cookies,div.cookie-alert,div.cookie-banner,div.cookie-consent,div.cookie-consent-popup,div.cookie-content,div.cookie-layer,div.cookie-notice,div.cookie-notification,div.cookie-overlay,div.cookie-wrapper,div.cookieHolder,div.cookies-modal-container,div.cookies-visible,div.gdpr,div.js-disclaimer,div.privacy-notice,div.v-cookie,div.with-cookie,.as-oil-content-overlay,
I tried to google them, and I found random website like Binance, from the content the string it seems to be referring to an overlay for cookie consent, but I don't have one on my website, so I'm wondering why I get this many requests all failed (2: No such file or directory)
So I'm wondering if anyone knows what is this, and if I can block directly requests like those 2 to avoid getting the nginx errors flooded with them.
I tried to search around for a solution, the only thing that came in mind was to do a nginx redirect that returns 410 error, but this case is particular because that ] that divide the slug and the file not found, I don't know how to do it, and actually if I go to that URL the page actually works, so a better redirect would be to the slug directly before the bracket.
Thanks.

It's some cybercriminal, or probably just a script kiddie, sending probe URLs to sites on every IP address they can think of, looking for servers that might be vulnerable to some exploit or other.
All public-facing web sites get some of this garbage. You can't make it go away, unfortunately. It's almost as old as the web, but considerably stupider.
You CAN keep your software up to date so it's not YOUR site where they find a vulnerability.

Related

Server Log Showing Many 'Unhandled Exceptions' From URL with &hash=

I've noticed a large increase in the number of events logged daily that have &hash= in the URL. The requested URL is the same every time but the number that follows the &hash= is always different.
I have no idea what the purpose of the &hash= parameter is, so I'm unsure if these attempts are malicious or something else. Can anyone provide insight as to what is being attempted with the requested URL? I have copied in one from a recent log below.
https://www.movinglabor.com:443/moving-services/moving-labor/move-furniture/&du=https:/www.movinglabor.com/moving-services/moving-labor/move.../&hash=AFD3C9508211E3F234B4A265B3EF7E3F
I have been seeing the same thing in IIS on Windows Server 2012 R2. They were mostly HEAD requests. I did see a few other more obvious attack attempts from the same ip address so I'm assuming the du/hash thing is also intended to be malicious.
Here's an example of another attempt which also tries some url encoding to bypass filters:
part_id=D8DD67F9S8DF79S8D7F9D9D%5C&du=https://www.examplesite.com/page..asp%5C?part...%5C&hash=DA54E35B7D77F7137E|-|0|404_Not_Found
So you may want to look through your IIS logs to see if they are trying other things.
In the end I simply created a blocking rule for it using the Url Rewrite extension for IIS.

Bad requests for WordPress RSS and author URLs

On a popular WordPress site, I'm getting a constant stream of requests for these paths (where author-name is the first and last name of one of the WordPress users):
GET /author/index.php?author=author-name HTTP/1.1
GET /index.rdf HTTP/1.0
GET /rss HTTP/1.1
The first two URLs don't exist, so the server is constantly returning 404 pages. The third is a redirect to /feed.
I suspect the requests are coming from RSS readers or search engine crawlers, but I don't know why they keep using these specific, nonexistent URLs. I don't link to them anywhere, as far as I can tell.
Does anybody know (1) where this traffic is coming from and (2) how I can stop it?
Check Apache logs to get the "where" part.
Stopping random internet traffic is hard. Maybe serve them some other error codes and it will stop. It probably wont tho.
Most my sites have these, most of the time I track them to Asia or the americas, blocking the ip works but if they are few and far between that would be just wasting resources.

ASP.Net Relative Redirects and Resource Paths

We are working on the conversion of an ASP site to ASP.Net and are running into problems with redirects and resource locations. Our issues are coming from a peculiarity of our set-up. Our site can be accessed in two ways:
Directly by URL: http://www.mysite.com - in this case everything works fine
Via a proxy server with a URL like: http://www.proxy.com/mysite_proxy/proxy/
In #2 "mysite_proxy" is a mapping on proxy.com that directs the request behind the scenes to www.mysite.com, "proxy" is a virtual sub-website that just redirects the request to the root of www.mysite.com. It essntially is meant to give us a convenient way of knowing if a request is hitting the site from the proxy or not.
We are running into two problems with this setup:
Using Response.Redirect either with the "~" or a plain relative path (Default.aspx) generates a 302 response with a location of "/proxy/rest_of_the_path.aspx." This causes the browser to request http://www.proxy.com/proxy/rest_of_the_path.aspx which isn't anything and doesn't even hit our server so we couldn't do an after the fact re-write.
Using "~" based URLs in our pages for links, images, style-sheets, etc. creates the same kind of path: "/proxy/path_to_resources.css." We could probably solve some of these by using relative paths for all these resources though that would be a lot of work and it would do nothing to address similar resource links generated by the framework and 3rd party components.
Ideally I want to find a global fix that will make these problems transparent to the developers working on the site. I have a few ideas at this point:
Getting rid of the proxy, it is not really needed and is there for administrative and not technical reasons. Easiest to accomplish technically, the hardest to accomplish in the real world.
Hand the problem off to the group that runs the proxy and say it is their problem they need to fix it.
Use a Response filter to modify the raw html before it is sent to the client. I know this could fix my resource links, but I am not certain about the headers (need to test it out) and there would be a performance hit to having to parse every response looking for and re-writing urls.
All of these solutions have big negatives in my mind and I was hoping someone might have another idea. So any thoughts?
Aside: there are a lot of posts up already that deal with the reverse of this issue: I have a relative URL, how do I may it absolute, but I didn't come across anything that fit the bill for the other direction.
As a fix, I'd go with a small detection routine at Global.asax:Session_Start (since i imagine that the proxy doesn't actually starts another application instance), set a session variable with the correct path, and use it instead of '~'.
In the case a different application instance is used, then use Application_Start instead of Session_Start and a static Global variable instead of a Session variable.

Redirect loop in ASP.NET app when used in America

I have a bunch of programs written in ASP.NET 3.5 and 4. I can load them fine (I'm in England) and so can my England based colleagues. My American colleagues however are suffering redirect loops when trying to load any of the apps. I have tried myself using Hide My Ass and can consistently recreate this issue.
I'm stumped. What could be causing a redirect loop for users in a specific country?!
The apps are hosted on IIS 6 on a dedicated Windows Server 2003. I have restarted IIS with no luck.
Edit
I should have made it clear that unfortunately I do not have access to the machines in the US to run Firefox Firebug/Fiddler. The message I get in Chrome is This webpage has a redirect loop..
When you say "a redirect loop", do you mean a redirect as in an http redirect? Or do you mean you have a TCP/IP routing loop?
A TCP/IP loop can be positively identified by performing a ping from one of the affected client boxes. If you get a "TTL expired" or similar message then this is routing and unlikely to be application related.
If you really meant an http redirect, try running Fiddler, or even better, HttpWatch Pro and looking at both the request headers, and the corresponding responses. Even better - try comparing the request/response headers from non-US working client/servers to the failing US counterparts
you could take a look with Live HTTP Headers in firefox and see what it's trying to redirect to. it could possibly be trying to redirect to a url based on the visitor's lang/country, or perhaps the dns is not fully propagated...
if you want to post the url, i could give you the redirect trace
What could be causing a redirect loop
for users in a specific country?!
Globalization / localization related code
Geo-IP based actions
Using different base URLs in each country, and then redirecting from one to itself. For example, if you used uk.example.com in the UK, and us.example.com in the US, and had us.example.com redirect accidentally to itself for some reason.
Incorrect redirects on 404 Not Found errors.
Spurious meta redirect tags
Incorrect redirects based on authentication errors
Many other reasons
I have tried myself using Hide My Ass
and can consistently recreate this
issue.
I have restarted IIS with no luck.
I do not have access to the machines
in the US to run Firefox
Firebug/Fiddler.
The third statement above don't make sense in light of the other two. If you can restart IIS or access the sites with a proxy, then you can run Fiddler, since it's a client-side application. Looking at the generated HTML and corresponding HTTP headers will be the best way to diagnose your problem.

Are there any safe assumptions to make about the availability of a URL?

I am trying to determine if there is a way to check the availability of a potentially large list of urls (> 1000000) without having to send a GET request to every single one.
Is it safe to assume that if http://www.example.com is inaccessible (as in unable to connect to server or the DNS request for the domain fails), or I get a 4XX or 5XX response, then anything from that domain will also be inaccessible (e.g. http://www.example.com/some/path/to/a/resource/named/whatever.jpg)? Would a 302 response (say for whatever.jpg) be enough to invalidate the first assumption? I imagine sub domains should be considered distinct as http://subdomain.example.com and http://www.example.com may not direct to the same ip?
I seem to be able to think of a counter example for each shortcut I come up with. Should I just bite the bullet and send out GET requests to every URL?
Unfortunately, no you cannot infer anything from 4xx or 5xx or any other codes.
Those codes are for individual pages, not for the server. It's quite possible that one page is down and another is up, or one has a 500 server-side error and another doesn't.
What you can do is use HEAD instead of GET. That retrieves the MIME header for the page but not the page content. This saves time server-side (because it doesn't have to render the page) and for yourself (because you don't have to buffer and then discard content).
Also I suggest you use keep-alive to accelerate responses from the same server. Many HTTP client libraries will do this for you.
A failed DNS lookup for a host (e.g. www.example.com) should be enough to invalidate all URLs for that host. Subdomains or other hosts would have to be checked separately though.
A 4xx code might tell you that a particular page isn't available, but you couldn't make any assumptions about other pages from that.
A 5xx code really won't tell you anything. For example, it could be that the page is there, but the server is just too busy at the moment. If you try it again later it might work fine.
The only assumption you should make about the availability of an URL is that "Getting an URL can and will fail".
It's not safe to assume that a sub domain request will fail when a parent one does. Namely because inbetween your two requests your network connection can go up, down or generally misbehave. It's also possible for the domains to be changed in between requests.
Ignoring all internet connection issues. You are still dealing with a live web site that can and will change constantly. What is true now might not be true in 5 minutes when they decide to alter their page structure or change the way the display a particular page. Your best bet is to assume any get will fail.
This may seem like an extreme view point. But these events will happen. How you handle them will determine the robustness of your program.
First don't assume anything based on a single page failing. I have seen many cases where IIS will continue to serve static content but not be able to serve any dynamic content.
You have to treat each host name as unique you cannot assume subdomain.example.com and example.com point to the same IP. Or even if they do there is no guarentee that are the same site. IIS again has host headers that allows you to run multiple sites using a single IP Address.
If the connection to the server actually fails, then there's no reason to check URLs on that server. Otherwise, you can't assume anything.
In addition to what everyone else is saying, use HEAD requests instead of GET requests. They function the same, but the response doesn't contain the message body, so you save everyone some bandwidth.

Resources