Denial-of-Service Attacks Using Range - http

https://www.rfc-editor.org/rfc/rfc7233#section-6.1
:
6.1. Denial-of-Service Attacks Using Range
... Servers ought to ignore, coalesce, or reject
egregious range requests, such as requests for more than two
overlapping ranges or for many small ranges in a single set,
particularly when the ranges are requested out of order for no
apparent reason. Multipart range requests are not designed to
support random access. ...
Are there any definitions of "many small ranges in a single set"?

In general, a sensible limit will depend on how expensive it is to serve ranges, and how likely clients are to benefit from ranged requests.
An initial mitigation guide from SpiderLabs suggests a limit of five ranges for practical traffic in the wild.
The implementation in Apache httpd allows as many as 200 ranges, but only 20 may overlap, or appear out of order. This addresses the main pathologies of the circulated exploit, which used around six hundred overlapping ranges.

It surely depends a lot on your service what your consider "out of range" and what not. What would be serious for a PC running as a service is surely not comparable with a large corporation network.
You basically need to set conditions for your specific service for what the normal usage is and what not, and reject anything out of that range.
Like anywhere in software you need to place "guards" against all sorts of invalid data or behaviour.

Related

Is fast data access related to the availability (A) in CAP theorem?

I realize that it will be a basic concept, but it would be helpful if anyone could explain if fast data access related to the availability (A) in CAP theorem. Fast data access is an important feature expected of Big Data systems. And does the various K-Access and K-grouping method all a part of it.
Availability in CAP theorem is about whether you can access your data even if there is failure in the hardware (e.g. network outage, node outage etc).
Fast access to large volume of data is important feature observed in most of the big data systems. But, it should not confused with availability as described above.
Availability in CAP theorem means that all your requests will receive a response, but does not specify when or how accurate. Nor does it specify what fast could mean.
Every request receives a (non-error) response – without guarantee that
it contains the most recent write
Keep in mind that this theorem enforces strict guarantees. For example, systems can guarantee C and A, and still be good at P most of the time.

Predicting/calculating congestion in telecom network

I have an application installed at my phone which is providing below details every minute: - Bandwidth , -Packet loss ,-signal strength,- RTT for google.com every minute.
I am trying to predict congestion based on these 4 attribute , but some how it doesn't look accurate to me , previously i have only used bandwidth .
I want predict congestion at any point more appropriately , appreciate any recommendations .
I think you are saying you are trying to measure network 'responsiveness', and from these measurements get a sense of how congested the network is. You also mention you want to predict which I guess means you want to make an estimate of the future 'responsiveness' based on your measurements and observations.
The items you are measuring look sensible, although you may want to include jitter if you are interested in VoIP or other real time streamed media.
The issue you have is that there are many variables which can effect your measurements, for example:
congestion in the radio cell you are in at the time
congestion in the backhaul network
delays in the server you are using to measure the RTT
congestion or faults with the particular APN your mobile is using to access data services
network faults
As some of these can be irregularly occurring but can have a large impact, it is quite hard to build up an accurate view of the overall network 'responsiveness' with a single handset. For example your local cell may be busy or have a problem but others users of Google.com in other cells will have perfectly good response, or Google.com may be busy or delayed and other users in your cell accessing a different server may again have perfectly good response.
It would likely be useful for you to look at some of the generally available web speedtest applications to see the type of information they provide - they have the advantage of being able to gather results from many thousands of users, and also generally have access to the servers to understand any issues on that side.
Depending on what you are trying to achieve it might be that a combination of measurements from one of the general speedtest services, combined with your own measurements will give you enough data to draw some sort of meaningful conclusions.

Asp.net guaranteed response time

Does anybody have any hints as to how to approach writing an ASP.net app that needs to have a guaranteed response time?
When under high load that would normally cause us to exceed our desired response time, we want to throw out an appropriate number of requests, so that the rest of the requests can return before the max response time. Throwing out requests based on exceeding a fixed req/s is not viable, as there are other external factors that will control response time that cause the max rps we can safely support to fiarly drastically drift and fluctuate over time.
Its ok if a few requests take a little too long, but we'd like the great majority of them to meet the required response time window. We want to "throw out" the minimal or near minimal number of requests so that we can process the rest of the requests in the allotted response time.
It should account for ASP.Net queuing time, ideally the network request time but that is less important.
We'd also love to do adaptive work, like make a db call if we have plenty of time, but do some computations if we're shorter on time.
Thanks!
SLAs with a guaranteed response time require a bit of work.
First off you need to spend a lot of time profiling your application. You want to understand exactly how it behaves under various load scenarios: light, medium, heavy, crushing.. When doing this profiling step it is going to be critical that it's done on the exact same hardware / software configuration that production uses. Results from one set of hardware have no bearing on results from an even slightly different set of hardware. This isn't just about the servers either; I'm talking routers, switches, cable lengths, hard drives (make/model), everything. Even BIOS revisions on the machines, RAID controllers and any other device in the loop.
While profiling make sure the types of work loads represent an actual slice of what you are going to see. Obviously there are certain load mixes which will execute faster than others.
I'm not entirely sure what you mean by "throw out an appropriate number of requests". That sounds like you want to drop those requests... which sounds wrong on a number of levels. Doing this usually kills an SLA as being an "outage".
Next, you are going to have to actively monitor your servers for load. If load levels get within a certain percentage of your max then you need to add more hardware to increase capacity.
Another thing, monitoring result times internally is only part of it. You'll need to monitor them from various external locations as well depending on where your clients are.
And that's just about your application. There are other forces at work such as your connection to the Internet. You will need multiple providers with active failover in case one goes down... Or, if possible, go with a solid cloud provider.
Yes, in the last mvcConf one of the speakers compares the performance of various view engines for ASP.NET MVC. I think it was Steven Smith's presentation that did the comparison, but I'm not 100% sure.
You have to keep in mind, however, that ASP.NET will really only play a very minor role in the performance of your app; DB is likely to be your biggest bottle neck.
Hope the video helps.

What are the tradeoffs when generating unique sequence numbers in a distributed and concurrent environment?

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.
Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):
Stay up under high-load.
Allow as many concurrent connections as possible.
Distributed: spread load across multiple machines.
Performance: run as fast as possible and have as much throughput as possible.
Correctness: numbers generated must:
not repeat.
be unique per request (must have a way break ties if any two request happens at the exact same time).
in (increasing) sequential order.
have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.
Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.
Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).
Software:
Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.
keyspace: a publicly accessible, unique 128-bit ID generator, whose IDs can be used for any purpose
RFC-4122 implementations exist in many languages. The RFC spec is probably a really good base, as it prevents the need for any inter-system coordination, the UUIDs are 128-bit, and when using IDs from software implementing certain versions of the spec, they include a time code portion that makes sorting possible, etc.
If you must be sequential (per machine) but can drop the gap/counter requirments look for an implementation of the Version 1 UUID as specified in RFC 4122.
If you're working in .NET and can eliminate the sequential and gap/counter requirements, just use System.Guids. They implement RFC 4122 Version 4 and are already unique (very low collision probability) across machines and requests. This could be easily implemented as a web service or just used locally.
Here's a high-level idea for an approach that may fulfill all the requirements, albeit with a significant caveat that may not match many use cases.
If you can tolerate having two sequence numbers - a logical one returned immediately; guaranteed unique and ordered but with gaps - and a separate physical one guaranteed to be in sequential order with no gaps and available a short while later - then the solution seems straightforward:
One distributed system that can serve up a high resolution clock + machine id as the logical sequence number
Stream all the logical sequence numbers into a separate distributed system that orders the logical sequence numbers and maps them to the physical sequence numbers.
The mapping from logical to physical can happen on-demand as soon as the second system is done with processing.

Why I cannot get equal upload and download speed on symmetrical channel?

I'm assigned to a project where my code is supposed to perform uploads and downloads of some files on the same FTP or HTTP server simultaneously. The speed is measured and some conclusions are being made out of this.
Now, the problem is that on high-speed connections we're getting pretty much expected results in terms of throughput, but on slow connections (think ideal CDMA 1xRTT link) either download or upload wins at the expense of the opposite direction. I have a "higher body" who's convinced that CDMA 1xRTT connection is symmetric and thus we should be able to perform data transfer with equivalent speeds (~100 kbps in each direction) on this link.
My measurements show that without heavy tweaking the code in terms of buffer sizes and data link throttling it's not possible to have same speeds in forementioned conditions. I tried both my multithreaded code and also created a simple batch file that automates Windows' ftp.exe to perform data transfer -- same result.
So, the question is: is it really possible to perform data transfer on a slow symmetrical link with equivalent speeds? Is a "higher body" right in their expectations? If yes, do you have any suggestions on what should I do with my code in order to achieve such throughput?
PS.
I completely re-wrote the question, so it would be obvious it belongs to this site.
CDMA 1x consists of up to 15 channels of 9.6kbps traffic. This results in a total throughput of 144kbps.
Two channels are used for command and control signals (talking to base stations, associating/disassociating, SMS traffic, ring signals, etc).
That leaves you with up to 124.8kbps.
--> Each channel is one way. <--
They are dynamically switched and allocated depending on the need.
Generally you'll get more download than upload because that's the typical cell phone modem usage. But you'll never get more than 120kbps total aggregate bandwidth.
In practise, due to overhead of 1xRTT encoding, error correction, resends, etc, you'll typically experience between 60kbps and 90kbps even if you have all the channels possible.
This means that you can probably only get 30kbps-60kbps of upload and download simultaneously.
Further, due to switching the channels dynamically (and the fact that the base station controls this more than your modem - they need to manage base station channels carefully to keep channels free for voice calls) you'll lose time when it switches channels - it's not an instantaneous process.
So - 1xRTT can, in theory, give you 124kbps one way, but due to overhead, switching times, base station capacity, or the phone company simply limiting such connections for other reasons, you can't depend on a symmetrical link.
NOTE:
This will vary to some degree based on the provider and the modem. For instance, some modems have 16 channels, and some providers support 16 channels. In some cases those modems and providers work well together and can provide a full 144kbps aggregate raw bandwidth to the application, with only one dedicated channel (which has to work pretty hard) to deal with control, switching, and other issues. Even then, though, with the overhead of the modem communications, then the overhead of PPP, then the overhead of IP, then the overhead of TCP, you're still looking at maybe 100-120kbps total bandwidth, both up and down.
Lastly, no provider yet supports transparent transfer of IP traffic. In other words if you're modem is moving, the modem will switch to a new base station, but you'll completely drop the PPP session and have to restart it, as well as all the TCP sessions and such. You typically won't get the same IP address, and so your TCP sessions will not recover gracefully.
The "fun" aspect to this twist is that this can happen even if you aren't moving. If one base station gets loaded down, you may be transferred to another base station if you are close enough - there are other things that may make your modem transfer even without you moving. So make sure you take this into account, since you seem to be keen on maintaining a full duplex, symmetric channel open. It's tough to write stuff that will recover gracefully, nevermind anticipate it and do it quickly. You would do well to work very closely with a modem manufacturer (such as Kyocera) on this - otherwise you won't get the documentation on how to control the modem chipset at the low level that you need.
-Adam
I think the whole drama with high equal speeds on both directions is because my higher body thinks that they have 144 kbps on uplink AND 144 kbps on DOWNLINK (== TWO pipes). Whereas in reality we have 144 kbps of ONE pipe which is switching directions when I transfer files.
Comment me if I right or wrong, please.

Resources