VCL: limit too many GET requests on same url - cdn

We are experiencing a problem where twice a day (when we send out a newsletter using a third service provider) the home page gets hit by thousands of requests from a particular bot for about 3-5 minutes. Such traffic seems to be generated by a legit bot and it creates fake data on our third parties analytics aggregators. We wonder if we could block/deny such traffic without penalizing the bot and its IP’s. We would like to be able to set in the VCL a sort of rule to block/deny/reject traffic when GET requests from the same user-agent hit the same exact URL too many times in a very short period of time. Do you have any suggestions?

You can do that using the vsthrottle VMOD: https://github.com/varnish/varnish-modules/blob/master/src/vmod_vsthrottle.vcc.

Related

JMeter: How to get assured that requests hits from JMeter has reached at server if there is no way to verify on DB or any analytic tool

I see no way to verify specific requests been reached at Server end after hitting VUsers from JMeter.
Consider "About Us" is the page where 10000 VUsers hits at once from JMeter and Server shows some activity at Perfmon. No, Lets says, in JMeter, VUsers have reached 10000/10000 to 0/10000 but there is no way to keep track of how many users hit as Analytic is not implemented in App.
I want to make sure all 10000 VUSers have hit at once. Is there any way I can find out how many VUsers have visited "About Us" page from 10000 if JMeter doesnot show any failed response?
You can monitor request rate with custom listeners available via JMeter Plugins project like:
Server Hits per Second
Active Threads Over Time
You can set desired request rate via JMeter Timers, i.e.
Constant Throughout Timer
Throughput Shaping Timer
To ensure you're really hitting your server, the best way is to check access logs of Web servers.
To then have some numbers in your report, use the new Web Report provided by JMeter 3.0:
https://jmeter.apache.org/usermanual/generating-dashboard.html
If you want live number:
https://jmeter.apache.org/usermanual/realtime-results.html

application insights unqiue requests

We have a REST API implemented as a Cloud Service, that sends telemetry to Application Insights. And we use commands like
POST /api/groups/GRP_75e0b852-ee21-45fb-b943-13aa465c62da/members.
POST /api/groups/GRP_75e0b852-ee21-45fb-b943-13aa465c62da/folders/FLD_080af364-ad37-4351-837e-4fb1d5f02e50/discussions
The sections of the command preceded by GRP_ and FLD_ are parameters.
This makes looking at the breakdown of requests in Application Insights difficult since those requests show up individually.
I’ve implemented an ITelemetryInitializer that “normalizes” the Context.Operation.Name (and the Request URL) in our requests. But I see that those request are showing up bucketed as “Other Values”.
Requests with Other Values
Is there any way to reset the "bucketing" of the top-level list, or do I need to get a new AppInsights instance?
Standard dimentions like request name should be reset after a week. So if you stopped collecting parameters in the names it should clear up after a week. Current limit is 1000.

What is the X-REQUEST-ID http header?

I have already googled a lot this subject, read various articles about this header, its use in Heroku, and projects based on Django.
However, it's still all confused in my head.
What is the purpose of this header?
Does it violate user privacy?
Can it help tracking a user?
When you're operating a webservice that is accessed by clients, it might be difficult to correlate requests (that a client can see) with server logs (that the server can see).
The idea of the X-Request-ID is that a client can create some random ID and pass it to the server. The server then include that ID in every log statement that it creates. If a client receives an error it can include the ID in a bug report, allowing the server operator to look up the corresponding log statements (without having to rely on timestamps, IPs, etc).
As this ID is generated (randomly) by the client it does not contain any sensitive information, and should thus not violate the user's privacy. As a unique ID is created per request it does also not help with tracking users.
Purpose: Idempotency
With an ID that changes for every request, but stays the same in case of a retry of a request, the receiver can ensure the request won't get processed more than once.
This is a quote from some API provider:
All POST, PUT, and PATCH HTTP requests should contain a unique
X-Request-Id header which is used to ensure idempotent message
processing in case of a retry
If you make it a random string, unique per request, it won't infringe on your privacy, nor enable tracking.
If you want to know more of what idempotency has to offer, read this insightful article.
N.B. As Stefan Kögl comments, this header is not standardized - hence the (deprecated) "X-" prefix.
Explanation using a story/analogy
You can think of X-Request-ID like your driver's license (some type of ID card).
Imagine visiting the DMV:
You present your ID card to gain admission, and then you
Stand in line, for 16 hours,
after 16 hours - the DMV tells you to go home. i.e. your request timed out. The petty tyrants at the DMV don't work a second past 4:31 pm.
An entire day wasted - you complain to the congressman - hey: I waited in line for 16 hours etc. The congressman replies:
"Buddy, we get 1000s of people visiting the DMV everyday - When I look through the DMV records, how am I meant to identify you - when you came etc.?
That's where the X-Request-ID comes in.
Application of story to HTTP
The same applies to http requests - it's an id used to help back end devs find out what went wrong. Clients submit requests with that id - and it's a ID that they create (i.e. some random number etc.). Now servers can keep track of it.
Story given to help you remember. Hopefully you're not confused further - post a comment if I have and i'll try to clear it up. thx.
This request header can be used for syncrhonization. Let's say you've built a ToDo list that offers offline capability. Your user creates 3 items and each of them are given a unique UUID on the offline application. When network connectivity is available, the records are POSTed to the server and the corresponding IDs auto-generated from the database are returned. You can then replace the IDs in your app (e.g. "id" attribute of HTML "li" element).

Measuring time between client starts request in browser until server gets it

Is there a way to measure this?
Certainly, for get requests, no available headers are being sent consistently from clients.
One idea I got is to get that from query string, but is that possible? Something like (pseudo-code follows)
http://server/default.aspx?t=(new Date().getTime())
Another one that would work is to have users hit a very small page that appends a query string as above, but wanted to avoid a redirection if possible.
(Overall goal is to gather per-request such statistics. The server processing time and server to client are more doable, under some assumptions.)
Thanks in advance.
I've done this through an AJAX request after the initial page load where you have control over the request from the very beginning. Pass the UNIX time in the query string and then when it reaches the server take the difference. I'm not familiar with iis7 so you'd have to make sure that timezone's are accounted for. This number could be very erratic since it's basically just calculating latency and DNS lookups which is different for every client.
Does the request start from an initial page that you have control over it ? in that case you can send the server time with the initial response, and then increment that time second-by-second with a javascript code while the user is on the page. this way you can have a server-synchronized time on the page and when a request goes to be sent from that page you can send that synced time with that request, then all you need is calculate difference on the server.

Restrict node with CAPTCHA

is there any quick solution to restrict access to one node (page) with a captcha module (or some other, similar way)?
If you mean to allow a user to access a node if he passes a CAPTCHA, then there isn't any module for that.
If I understood what you mean, the module should present a CAPTCHA, and if the answer is correct, then the node should be shown.
You can create a custom module using the CAPTCHA module.
If your purpose is to block bots, try these:
ddos
botbouncer
I have used "ddos" earlier just to block too many requests from an IP on a previous website. The usage is fairly simple: -
In your app.js, add
var Ddos = require('ddos')
var ddos = new Ddos({burst:10, limit:50,errormessage:'Maximum number of
requests exceeded from your system, please wait to regain access'})
app.use(ddos.express);
So, how ddos works is it maintains an internal count of the number of requests it receives from each IP. For every request it receives, it increments the counter. And for every second that passes without a request, the previous entries are deleted.
Now, if for a certain IP, the limit (here, 50) is exceeded, 429 error is thrown. From here on in, every subsequent request increments at the specified burst rate (here, 10) until the internal counter resets.
This is the next best thing to incorporating Cloudflare on your website. Hope that helps!

Resources